_wp_utf8_codepoint_span()
云策文档标注
概述
_wp_utf8_codepoint_span() 是一个 WordPress 核心函数,用于计算字符串中从指定字节偏移量开始、最多包含指定数量代码点的字符跨度所占用的字节数。它基于 UTF-8 编码处理,并遵循最大子部分规则处理无效字节序列。
关键要点
- 函数返回从字节偏移量 $byte_offset 开始、最多 $max_code_points 个代码点所占用的字节数。
- 参数 $found_code_points 为可选引用参数,用于输出实际找到的代码点数量,可能小于 $max_code_points。
- 该函数是 strlen(mb_substr(substr($text, $at), 0, $max_code_points)) 的回退方法。
- 无效字节序列根据最大子部分规则计为一个代码点。
- 相关函数 _wp_scan_utf8() 可用于查找字符串中有效和无效 UTF-8 字节的跨度。
- 自 WordPress 6.9.0 版本引入。
代码示例
function _wp_utf8_codepoint_span( string $text, int $byte_offset, int $max_code_points, ?int &$found_code_points = 0 ): int {
$was_at = $byte_offset;
$invalid_length = 0;
$end = strlen( $text );
$found_code_points = 0;
while ( $byte_offset < $end && $found_code_points < $max_code_points ) {
// 处理 UTF-8 字节序列的逻辑
// 注意:此处省略了完整实现细节,实际代码在 WordPress 核心中
}
return $byte_offset - $was_at;
}
原文内容
Given a starting offset within a string and a maximum number of code points, return how many bytes are occupied by the span of characters.
Description
Invalid spans of bytes count as a single code point according to the maximal subpart rule. This function is a fallback method for calling strlen( mb_substr( substr( $text, $at ), 0, $max_code_points ) ).
Parameters
$textstringrequired-
Count bytes of span in this text.
$byte_offsetintrequired-
Start counting at this byte offset.
$max_code_pointsintrequired-
Stop counting after this many code points have been seen, or at the end of the string.
$found_code_points?intoptional-
Will be set to number of found code points in span, as this might be smaller than the maximum count if the string is not long enough.
Source
function _wp_utf8_codepoint_span( string $text, int $byte_offset, int $max_code_points, ?int &$found_code_points = 0 ): int {
$was_at = $byte_offset;
$invalid_length = 0;
$end = strlen( $text );
$found_code_points = 0;
while ( $byte_offset < $end && $found_code_points < $max_code_points ) {
$needed = $max_code_points - $found_code_points;
$chunk_count = _wp_scan_utf8( $text, $byte_offset, $invalid_length, null, $needed );
$found_code_points += $chunk_count;
// Invalid spans only convey one code point count regardless of how long they are.
if ( 0 !== $invalid_length && $found_code_points < $max_code_points ) {
++$found_code_points;
$byte_offset += $invalid_length;
}
}
return $byte_offset - $was_at;
}
Changelog
| Version | Description |
|---|---|
| 6.9.0 | Introduced. |