函数文档

_wp_utf8_codepoint_span()

💡 云策文档标注

概述

_wp_utf8_codepoint_span() 是一个 WordPress 核心函数,用于计算字符串中从指定字节偏移量开始、最多包含指定数量代码点的字符跨度所占用的字节数。它基于 UTF-8 编码处理,并遵循最大子部分规则处理无效字节序列。

关键要点

  • 函数返回从字节偏移量 $byte_offset 开始、最多 $max_code_points 个代码点所占用的字节数。
  • 参数 $found_code_points 为可选引用参数,用于输出实际找到的代码点数量,可能小于 $max_code_points。
  • 该函数是 strlen(mb_substr(substr($text, $at), 0, $max_code_points)) 的回退方法。
  • 无效字节序列根据最大子部分规则计为一个代码点。
  • 相关函数 _wp_scan_utf8() 可用于查找字符串中有效和无效 UTF-8 字节的跨度。
  • 自 WordPress 6.9.0 版本引入。

代码示例

function _wp_utf8_codepoint_span( string $text, int $byte_offset, int $max_code_points, ?int &$found_code_points = 0 ): int {
    $was_at            = $byte_offset;
    $invalid_length    = 0;
    $end               = strlen( $text );
    $found_code_points = 0;

    while ( $byte_offset < $end && $found_code_points < $max_code_points ) {
        // 处理 UTF-8 字节序列的逻辑
        // 注意:此处省略了完整实现细节,实际代码在 WordPress 核心中
    }
    return $byte_offset - $was_at;
}

📄 原文内容

Given a starting offset within a string and a maximum number of code points, return how many bytes are occupied by the span of characters.

Description

Invalid spans of bytes count as a single code point according to the maximal subpart rule. This function is a fallback method for calling strlen( mb_substr( substr( $text, $at ), 0, $max_code_points ) ).

Parameters

$textstringrequired
Count bytes of span in this text.
$byte_offsetintrequired
Start counting at this byte offset.
$max_code_pointsintrequired
Stop counting after this many code points have been seen, or at the end of the string.
$found_code_points?intoptional
Will be set to number of found code points in span, as this might be smaller than the maximum count if the string is not long enough.

Return

int Number of bytes spanned by the code points.

Source

function _wp_utf8_codepoint_span( string $text, int $byte_offset, int $max_code_points, ?int &$found_code_points = 0 ): int {
	$was_at            = $byte_offset;
	$invalid_length    = 0;
	$end               = strlen( $text );
	$found_code_points = 0;

	while ( $byte_offset < $end && $found_code_points < $max_code_points ) {
		$needed      = $max_code_points - $found_code_points;
		$chunk_count = _wp_scan_utf8( $text, $byte_offset, $invalid_length, null, $needed );

		$found_code_points += $chunk_count;

		// Invalid spans only convey one code point count regardless of how long they are.
		if ( 0 !== $invalid_length && $found_code_points < $max_code_points ) {
			++$found_code_points;
			$byte_offset += $invalid_length;
		}
	}

	return $byte_offset - $was_at;
}

Changelog

Version Description
6.9.0 Introduced.