函数文档

_wp_utf8_codepoint_count()

💡 云策文档标注

概述

_wp_utf8_codepoint_count() 函数用于计算给定 UTF-8 字符串中的码点数量,作为 mb_strlen() 的回退方法,并处理无效字节序列。

关键要点

  • 函数返回 UTF-8 字符串中的码点数量,无效字节序列按最大子部分规则计为一个码点。
  • 当 byte_offset 或 max_byte_length 为负值时,函数始终返回零。
  • 参数包括 $text(必需)、$byte_offset(可选,默认为 0)和 $max_byte_length(可选,默认为 PHP_INT_MAX)。
  • 函数在 WordPress 6.9.0 版本中引入。

代码示例

4 === _wp_utf8_codepoint_count( 'text' );
13 === _wp_utf8_codepoint_count( "testx90wpxE2x80xC0test" );

注意事项

  • 此函数是 mb_strlen( $text, 'UTF-8' ) 的回退方法,适用于不支持 mbstring 扩展的环境。
  • 相关函数包括 _wp_scan_utf8(),用于查找字符串中的有效和无效 UTF-8 字节序列。

📄 原文内容

Returns how many code points are found in the given UTF-8 string.

Description

Invalid spans of bytes count as a single code point according to the maximal subpart rule. This function is a fallback method for calling mb_strlen( $text, 'UTF-8' ).

When negative values are provided for the byte offsets or length, this will always report zero code points.

Example:

4  === _wp_utf8_codepoint_count( 'text' );

// Groups are 'test', "x90" as '�', 'wp', "xE2x80" as '�', "xC0" as '�', and 'test'.
13 === _wp_utf8_codepoint_count( "testx90wpxE2x80xC0test" );

Parameters

$textstringrequired
Count code points in this string.
$byte_offset?intrequired
Start counting after this many bytes in $text. Must be positive.
$max_byte_length?intoptional
Stop counting after having scanned past this many bytes.
Default is to scan until the end of the string. Must be positive.

Default:PHP_INT_MAX

Return

int How many code points were found.

Source

function _wp_utf8_codepoint_count( string $text, ?int $byte_offset = 0, ?int $max_byte_length = PHP_INT_MAX ): int {
	if ( $byte_offset < 0 ) {
		return 0;
	}

	$count           = 0;
	$at              = $byte_offset;
	$end             = strlen( $text );
	$invalid_length  = 0;
	$max_byte_length = min( $end - $at, $max_byte_length );

	while ( $at < $end && ( $at - $byte_offset ) < $max_byte_length ) {
		$count += _wp_scan_utf8( $text, $at, $invalid_length, $max_byte_length - ( $at - $byte_offset ) );
		$count += $invalid_length > 0 ? 1 : 0;
		$at    += $invalid_length;
	}

	return $count;
}

Changelog

Version Description
6.9.0 Introduced.