_wp_utf8_codepoint_count()
云策文档标注
概述
_wp_utf8_codepoint_count() 函数用于计算给定 UTF-8 字符串中的码点数量,作为 mb_strlen() 的回退方法,并处理无效字节序列。
关键要点
- 函数返回 UTF-8 字符串中的码点数量,无效字节序列按最大子部分规则计为一个码点。
- 当 byte_offset 或 max_byte_length 为负值时,函数始终返回零。
- 参数包括 $text(必需)、$byte_offset(可选,默认为 0)和 $max_byte_length(可选,默认为 PHP_INT_MAX)。
- 函数在 WordPress 6.9.0 版本中引入。
代码示例
4 === _wp_utf8_codepoint_count( 'text' );
13 === _wp_utf8_codepoint_count( "testx90wpxE2x80xC0test" );注意事项
- 此函数是 mb_strlen( $text, 'UTF-8' ) 的回退方法,适用于不支持 mbstring 扩展的环境。
- 相关函数包括 _wp_scan_utf8(),用于查找字符串中的有效和无效 UTF-8 字节序列。
原文内容
Returns how many code points are found in the given UTF-8 string.
Description
Invalid spans of bytes count as a single code point according to the maximal subpart rule. This function is a fallback method for calling mb_strlen( $text, 'UTF-8' ).
When negative values are provided for the byte offsets or length, this will always report zero code points.
Example:
4 === _wp_utf8_codepoint_count( 'text' );
// Groups are 'test', "x90" as '�', 'wp', "xE2x80" as '�', "xC0" as '�', and 'test'.
13 === _wp_utf8_codepoint_count( "testx90wpxE2x80xC0test" );
Parameters
$textstringrequired-
Count code points in this string.
$byte_offset?intrequired-
Start counting after this many bytes in
$text. Must be positive. $max_byte_length?intoptional-
Stop counting after having scanned past this many bytes.
Default is to scan until the end of the string. Must be positive.Default:
PHP_INT_MAX
Source
function _wp_utf8_codepoint_count( string $text, ?int $byte_offset = 0, ?int $max_byte_length = PHP_INT_MAX ): int {
if ( $byte_offset < 0 ) {
return 0;
}
$count = 0;
$at = $byte_offset;
$end = strlen( $text );
$invalid_length = 0;
$max_byte_length = min( $end - $at, $max_byte_length );
while ( $at < $end && ( $at - $byte_offset ) < $max_byte_length ) {
$count += _wp_scan_utf8( $text, $at, $invalid_length, $max_byte_length - ( $at - $byte_offset ) );
$count += $invalid_length > 0 ? 1 : 0;
$at += $invalid_length;
}
return $count;
}
Changelog
| Version | Description |
|---|---|
| 6.9.0 | Introduced. |