valid_unicode()
云策文档标注
概述
valid_unicode() 函数用于判断一个 Unicode 码点是否有效,基于 XML 规范定义的有效字符范围。
关键要点
- 函数接受一个整数参数 $i,表示 Unicode 码点
- 返回布尔值,指示码点是否有效
- 有效范围包括:制表符 (U+0009)、换行符 (U+000A)、回车符 (U+000D),以及 Unicode 和 ISO/IEC 10646 中的合法字符,排除代理块、FFFE 和 FFFF
- 参考 XML 规范定义,具体范围见文档中的十六进制区间
代码示例
function valid_unicode( $i ) {
$i = (int) $i;
return (
0x9 === $i || // U+0009 HORIZONTAL TABULATION (HT)
0xA === $i || // U+000A LINE FEED (LF)
0xD === $i || // U+000D CARRIAGE RETURN (CR)
( 0x20 <= $i && $i <= 0xD7FF ) ||
( 0xE000 <= $i && $i <= 0xFFFD ) ||
( 0x10000 <= $i && $i <= 0x10FFFF )
);
}
原文内容
Determines if a Unicode codepoint is valid.
Description
The definition of a valid Unicode codepoint is taken from the XML definition:
Characters
… Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646.
… Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
See also
Parameters
$iintrequired-
Unicode codepoint.
Source
function valid_unicode( $i ) {
$i = (int) $i;
return (
0x9 === $i || // U+0009 HORIZONTAL TABULATION (HT)
0xA === $i || // U+000A LINE FEED (LF)
0xD === $i || // U+000D CARRIAGE RETURN (CR)
/*
* The valid Unicode characters according to the XML specification:
*
* > any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.
*/
( 0x20 <= $i && $i <= 0xD7FF ) ||
( 0xE000 <= $i && $i <= 0xFFFD ) ||
( 0x10000 <= $i && $i <= 0x10FFFF )
);
}
Changelog
| Version | Description |
|---|---|
| 2.7.0 | Introduced. |