_wp_utf8_codepoint_span()

💡 云策文档标注

概述

_wp_utf8_codepoint_span() 是一个 WordPress 核心函数，用于计算字符串中从指定字节偏移量开始、最多包含指定数量代码点的字符跨度所占用的字节数。它基于 UTF-8 编码处理，并遵循最大子部分规则处理无效字节序列。

关键要点

函数返回从字节偏移量 $byte_offset 开始、最多 $max_code_points 个代码点所占用的字节数。
参数 $found_code_points 为可选引用参数，用于输出实际找到的代码点数量，可能小于 $max_code_points。
该函数是 strlen(mb_substr(substr($text, $at), 0, $max_code_points)) 的回退方法。
无效字节序列根据最大子部分规则计为一个代码点。
相关函数 _wp_scan_utf8() 可用于查找字符串中有效和无效 UTF-8 字节的跨度。
自 WordPress 6.9.0 版本引入。

代码示例

function _wp_utf8_codepoint_span( string $text, int $byte_offset, int $max_code_points, ?int &$found_code_points = 0 ): int {
    $was_at            = $byte_offset;
    $invalid_length    = 0;
    $end               = strlen( $text );
    $found_code_points = 0;

    while ( $byte_offset < $end && $found_code_points < $max_code_points ) {
        // 处理 UTF-8 字节序列的逻辑
        // 注意：此处省略了完整实现细节，实际代码在 WordPress 核心中
    }
    return $byte_offset - $was_at;
}

📄 原文内容

Given a starting offset within a string and a maximum number of code points, return how many bytes are occupied by the span of characters.

Description

Invalid spans of bytes count as a single code point according to the maximal subpart rule. This function is a fallback method for calling strlen( mb_substr( substr( $text, $at ), 0, $max_code_points ) ).

Parameters

$textstringrequired: Count bytes of span in this text.
$byte_offsetintrequired: Start counting at this byte offset.
$max_code_pointsintrequired: Stop counting after this many code points have been seen, or at the end of the string.
$found_code_points?intoptional: Will be set to number of found code points in span, as this might be smaller than the maximum count if the string is not long enough.

Return

int Number of bytes spanned by the code points.

Source

function _wp_utf8_codepoint_span( string $text, int $byte_offset, int $max_code_points, ?int &$found_code_points = 0 ): int {
	$was_at            = $byte_offset;
	$invalid_length    = 0;
	$end               = strlen( $text );
	$found_code_points = 0;

	while ( $byte_offset < $end && $found_code_points < $max_code_points ) {
		$needed      = $max_code_points - $found_code_points;
		$chunk_count = _wp_scan_utf8( $text, $byte_offset, $invalid_length, null, $needed );

		$found_code_points += $chunk_count;

		// Invalid spans only convey one code point count regardless of how long they are.
		if ( 0 !== $invalid_length && $found_code_points < $max_code_points ) {
			++$found_code_points;
			$byte_offset += $invalid_length;
		}
	}

	return $byte_offset - $was_at;
}

View all references View on Trac View on GitHub

Uses	Description
_wp_scan_utf8()`wp-includes/compat-utf8.php`	Finds spans of valid and invalid UTF-8 bytes in a given string.

Changelog

Version	Description
6.9.0	Introduced.

云策 WordPress 开发者社区

函数文档

_wp_utf8_codepoint_span()

概述

关键要点

代码示例

Description

Parameters

Return

Source

Changelog

函数文档

概述

关键要点

代码示例

Description

Parameters

Return

Source

Related

Changelog