wp_kses_normalize_entities()
云策文档标注
概述
wp_kses_normalize_entities() 函数用于规范化和修复 HTML 实体,确保其正确编码。它支持 HTML 和 XML 上下文,通过特定顺序处理字符引用来避免双重编码问题。
关键要点
- 函数将内容中的 & 替换为 & 以禁用所有实体,然后按顺序解码数字和命名字符引用。
- 参数 $context 可设置为 'html'(默认)或 'xml',在 XML 上下文中会将 HTML 实体转换为代码点。
- 返回值为规范化实体后的字符串内容,确保输出与输入语义一致。
代码示例
function wp_kses_normalize_entities( $content, $context = 'html' ) {
// Disarm all entities by converting & to &
$content = str_replace( '&', '&', $content );
$content = preg_replace_callback( '/&#(0*[0-9]{1,7});/', 'wp_kses_normalize_entities2', $content );
$content = preg_replace_callback( '/&#[Xx](0*[0-9A-Fa-f]{1,6});/', 'wp_kses_normalize_entities3', $content );
if ( 'xml' === $context ) {
$content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_xml_named_entities', $content );
} else {
$content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_named_entities', $content );
}
return $content;
}注意事项
处理顺序至关重要:先解码数字字符引用(如 ),再解码命名字符引用(如 &),以防止双重编码导致的语义错误。例如,输入 . 和 . 应被正确区分和规范化。
原文内容
Converts and fixes HTML entities.
Description
This function normalizes HTML entities. It will convert AT&T; to the correct AT&T, : to :, &#XYZZY; to &#XYZZY; and so on.
When $context is set to ‘xml’, HTML entities are converted to their code points. For example, AT&T;…&#XYZZY; is converted to AT&T…&#XYZZY;.
Parameters
$contentstringrequired- Content to normalize entities.
$contextstringrequired- Context for normalization. Can be either
'html'or'xml'.
Default'html'.
Source
function wp_kses_normalize_entities( $content, $context = 'html' ) {
// Disarm all entities by converting & to &
$content = str_replace( '&', '&', $content );
/*
* Decode any character references that are now double-encoded.
*
* It's important that the following normalizations happen in the correct order.
*
* At this point, all `&` have been transformed to `&`. Double-encoded named character
* references like `&` will be decoded back to their single-encoded form `&`.
*
* First, numeric (decimal and hexadecimal) character references must be handled so that
* `	` becomes ` `. If the named character references were handled first, there
* would be no way to know whether the double-encoded character reference had been produced
* in this function or was the original input.
*
* Consider the two examples, first with named entity decoding followed by numeric
* entity decoding. We'll use U+002E FULL STOP (.) in our example, this table follows the
* string processing from left to right:
*
* | Input | &-encoded | Named ref double-decoded | Numeric ref double-decoded |
* | ------------ | ---------------- | ------------------------- | -------------------------- |
* | `.` | `.` | `.` | `.` |
* | `.` | `.` | `.` | `.` |
*
* Notice in the example above that different inputs result in the same result. The second case
* was not normalized and produced HTML that is semantically different from the input.
*
* | Input | &-encoded | Numeric ref double-decoded | Named ref double-decoded |
* | ------------ | ---------------- | --------------------------- | ------------------------ |
* | `.` | `.` | `.` | `.` |
* | `.` | `.` | `.` | `.` |
*
* Here, each input is normalized to an appropriate output.
*/
$content = preg_replace_callback( '/&#(0*[0-9]{1,7});/', 'wp_kses_normalize_entities2', $content );
$content = preg_replace_callback( '/&#[Xx](0*[0-9A-Fa-f]{1,6});/', 'wp_kses_normalize_entities3', $content );
if ( 'xml' === $context ) {
$content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_xml_named_entities', $content );
} else {
$content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_named_entities', $content );
}
return $content;
}
User Contributed Notes
You must log in before being able to contribute a note or feedback.