_wp_utf8_encode_fallback()
云策文档标注
概述
_wp_utf8_encode_fallback() 是一个 WordPress 内部函数,用于将 ISO-8859-1 编码的字符串转换为 UTF-8 编码,以向后兼容 PHP 标准库中已弃用的 utf8_encode() 函数。
关键要点
- 函数作用:将 ISO-8859-1(latin1)字节字符串转换为 UTF-8 编码字符串。
- 参数:$iso_8859_1_text(必需),作为 ISO-8859-1 字节处理的文本。
- 返回值:转换后的 UTF-8 文本字符串。
- 引入版本:WordPress 6.9.0。
代码示例
function _wp_utf8_encode_fallback( $iso_8859_1_text ) {
$iso_8859_1_text = (string) $iso_8859_1_text;
$at = 0;
$was_at = 0;
$end = strlen( $iso_8859_1_text );
$utf8 = '';
while ( $at < $end ) {
$ascii_byte_count = strspn(
$iso_8859_1_text,
"x00x01x02x03x04x05x06x07x08x09x0ax0bx0cx0dx0ex0fx10x11x12x13x14x15x16x17x18x19x1ax1bx1cx1dx1ex1f !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~x7f",
$at
);
if ( $ascii_byte_count > 0 ) {
$at += $ascii_byte_count;
continue;
}
// All other bytes transform into two-byte UTF-8 sequences.
$code_point = ord( $iso_8859_1_text[ $at ] );
$byte1 = chr( 0xC0 | ( $code_point >> 6 ) );
$byte2 = chr( 0x80 | ( $code_point & 0x3F ) );
$utf8 .= substr( $iso_8859_1_text, $was_at, $at - $was_at );
$utf8 .= "{$byte1}{$byte2}";
++$at;
$was_at = $at;
}
if ( 0 === $was_at ) {
return $iso_8859_1_text;
}
$utf8 .= substr( $iso_8859_1_text, $was_at );
return $utf8;
}
原文内容
Converts a string from ISO-8859-1 to UTF-8, maintaining backwards compatibility with the deprecated function from the PHP standard library.
Description
See also
Parameters
$iso_8859_1_textstringrequired-
Text treated as ISO-8859-1 (latin1) bytes.
Source
function _wp_utf8_encode_fallback( $iso_8859_1_text ) {
$iso_8859_1_text = (string) $iso_8859_1_text;
$at = 0;
$was_at = 0;
$end = strlen( $iso_8859_1_text );
$utf8 = '';
while ( $at < $end ) {
// US-ASCII bytes are identical in ISO-8859-1 and UTF-8. These are 0x00–0x7F.
$ascii_byte_count = strspn(
$iso_8859_1_text,
"x00x01x02x03x04x05x06x07x08x09x0ax0bx0cx0dx0ex0f" .
"x10x11x12x13x14x15x16x17x18x19x1ax1bx1cx1dx1ex1f" .
" !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~x7f",
$at
);
if ( $ascii_byte_count > 0 ) {
$at += $ascii_byte_count;
continue;
}
// All other bytes transform into two-byte UTF-8 sequences.
$code_point = ord( $iso_8859_1_text[ $at ] );
$byte1 = chr( 0xC0 | ( $code_point >> 6 ) );
$byte2 = chr( 0x80 | ( $code_point & 0x3F ) );
$utf8 .= substr( $iso_8859_1_text, $was_at, $at - $was_at );
$utf8 .= "{$byte1}{$byte2}";
++$at;
$was_at = $at;
}
if ( 0 === $was_at ) {
return $iso_8859_1_text;
}
$utf8 .= substr( $iso_8859_1_text, $was_at );
return $utf8;
}
Changelog
| Version | Description |
|---|---|
| 6.9.0 | Introduced. |