函数文档

valid_unicode()

💡 云策文档标注

概述

valid_unicode() 函数用于判断一个 Unicode 码点是否有效,基于 XML 规范定义的有效字符范围。

关键要点

  • 函数接受一个整数参数 $i,表示 Unicode 码点
  • 返回布尔值,指示码点是否有效
  • 有效范围包括:制表符 (U+0009)、换行符 (U+000A)、回车符 (U+000D),以及 Unicode 和 ISO/IEC 10646 中的合法字符,排除代理块、FFFE 和 FFFF
  • 参考 XML 规范定义,具体范围见文档中的十六进制区间

代码示例

function valid_unicode( $i ) {
    $i = (int) $i;

    return (
        0x9 === $i || // U+0009 HORIZONTAL TABULATION (HT)
        0xA === $i || // U+000A LINE FEED (LF)
        0xD === $i || // U+000D CARRIAGE RETURN (CR)
        ( 0x20 <= $i && $i <= 0xD7FF ) ||
        ( 0xE000 <= $i && $i <= 0xFFFD ) ||
        ( 0x10000 <= $i && $i <= 0x10FFFF )
    );
}

📄 原文内容

Determines if a Unicode codepoint is valid.

Description

The definition of a valid Unicode codepoint is taken from the XML definition:

Characters

… Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646.
… Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

See also

Parameters

$iintrequired
Unicode codepoint.

Return

bool Whether or not the codepoint is a valid Unicode codepoint.

Source

function valid_unicode( $i ) {
	$i = (int) $i;

	return (
		0x9 === $i || // U+0009 HORIZONTAL TABULATION (HT)
		0xA === $i || // U+000A LINE FEED (LF)
		0xD === $i || // U+000D CARRIAGE RETURN (CR)
		/*
		 * The valid Unicode characters according to the XML specification:
		 *
		 * > any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.
		 */
		( 0x20 <= $i && $i <= 0xD7FF ) ||
		( 0xE000 <= $i && $i <= 0xFFFD ) ||
		( 0x10000 <= $i && $i <= 0x10FFFF )
	);
}

Changelog

Version Description
2.7.0 Introduced.