函数文档

_wp_utf8_encode_fallback()

💡 云策文档标注

概述

_wp_utf8_encode_fallback() 是一个 WordPress 内部函数,用于将 ISO-8859-1 编码的字符串转换为 UTF-8 编码,以向后兼容 PHP 标准库中已弃用的 utf8_encode() 函数。

关键要点

  • 函数作用:将 ISO-8859-1(latin1)字节字符串转换为 UTF-8 编码字符串。
  • 参数:$iso_8859_1_text(必需),作为 ISO-8859-1 字节处理的文本。
  • 返回值:转换后的 UTF-8 文本字符串。
  • 引入版本:WordPress 6.9.0。

代码示例

function _wp_utf8_encode_fallback( $iso_8859_1_text ) {
    $iso_8859_1_text = (string) $iso_8859_1_text;
    $at              = 0;
    $was_at          = 0;
    $end             = strlen( $iso_8859_1_text );
    $utf8            = '';

    while ( $at < $end ) {
        $ascii_byte_count = strspn(
            $iso_8859_1_text,
            "x00x01x02x03x04x05x06x07x08x09x0ax0bx0cx0dx0ex0fx10x11x12x13x14x15x16x17x18x19x1ax1bx1cx1dx1ex1f !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~x7f",
            $at
        );

        if ( $ascii_byte_count > 0 ) {
            $at += $ascii_byte_count;
            continue;
        }

        // All other bytes transform into two-byte UTF-8 sequences.
        $code_point = ord( $iso_8859_1_text[ $at ] );
        $byte1      = chr( 0xC0 | ( $code_point >> 6 ) );
        $byte2      = chr( 0x80 | ( $code_point & 0x3F ) );

        $utf8 .= substr( $iso_8859_1_text, $was_at, $at - $was_at );
        $utf8 .= "{$byte1}{$byte2}";

        ++$at;
        $was_at = $at;
    }

    if ( 0 === $was_at ) {
        return $iso_8859_1_text;
    }

    $utf8 .= substr( $iso_8859_1_text, $was_at );
    return $utf8;
}

📄 原文内容

Converts a string from ISO-8859-1 to UTF-8, maintaining backwards compatibility with the deprecated function from the PHP standard library.

Description

See also

Parameters

$iso_8859_1_textstringrequired
Text treated as ISO-8859-1 (latin1) bytes.

Return

string Text converted into UTF-8.

Source

function _wp_utf8_encode_fallback( $iso_8859_1_text ) {
	$iso_8859_1_text = (string) $iso_8859_1_text;
	$at              = 0;
	$was_at          = 0;
	$end             = strlen( $iso_8859_1_text );
	$utf8            = '';

	while ( $at < $end ) {
		// US-ASCII bytes are identical in ISO-8859-1 and UTF-8. These are 0x00–0x7F.
		$ascii_byte_count = strspn(
			$iso_8859_1_text,
			"x00x01x02x03x04x05x06x07x08x09x0ax0bx0cx0dx0ex0f" .
			"x10x11x12x13x14x15x16x17x18x19x1ax1bx1cx1dx1ex1f" .
			" !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~x7f",
			$at
		);

		if ( $ascii_byte_count > 0 ) {
			$at += $ascii_byte_count;
			continue;
		}

		// All other bytes transform into two-byte UTF-8 sequences.
		$code_point = ord( $iso_8859_1_text[ $at ] );
		$byte1      = chr( 0xC0 | ( $code_point >> 6 ) );
		$byte2      = chr( 0x80 | ( $code_point & 0x3F ) );

		$utf8 .= substr( $iso_8859_1_text, $was_at, $at - $was_at );
		$utf8 .= "{$byte1}{$byte2}";

		++$at;
		$was_at = $at;
	}

	if ( 0 === $was_at ) {
		return $iso_8859_1_text;
	}

	$utf8 .= substr( $iso_8859_1_text, $was_at );
	return $utf8;
}

Changelog

Version Description
6.9.0 Introduced.