wp_kses_normalize_entities()

💡 云策文档标注

概述

wp_kses_normalize_entities() 函数用于规范化和修复 HTML 实体，确保其正确编码。它支持 HTML 和 XML 上下文，通过特定顺序处理字符引用来避免双重编码问题。

关键要点

函数将内容中的 & 替换为 & 以禁用所有实体，然后按顺序解码数字和命名字符引用。
参数 $context 可设置为 'html'（默认）或 'xml'，在 XML 上下文中会将 HTML 实体转换为代码点。
返回值为规范化实体后的字符串内容，确保输出与输入语义一致。

代码示例

function wp_kses_normalize_entities( $content, $context = 'html' ) {
    // Disarm all entities by converting & to &
    $content = str_replace( '&', '&', $content );

    $content = preg_replace_callback( '/&#(0*[0-9]{1,7});/', 'wp_kses_normalize_entities2', $content );
    $content = preg_replace_callback( '/&#[Xx](0*[0-9A-Fa-f]{1,6});/', 'wp_kses_normalize_entities3', $content );
    if ( 'xml' === $context ) {
        $content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_xml_named_entities', $content );
    } else {
        $content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_named_entities', $content );
    }

    return $content;
}

注意事项

处理顺序至关重要：先解码数字字符引用（如），再解码命名字符引用（如 &），以防止双重编码导致的语义错误。例如，输入 . 和 . 应被正确区分和规范化。

📄 原文内容

Converts and fixes HTML entities.

Description

This function normalizes HTML entities. It will convert AT&T; to the correct AT&T, : to :, &#XYZZY; to &#XYZZY; and so on.

When $context is set to ‘xml’, HTML entities are converted to their code points. For example, AT&T;…&#XYZZY; is converted to AT&T…&#XYZZY;.

Parameters

$contentstringrequired: Content to normalize entities.
$contextstringrequired: Context for normalization. Can be either 'html' or 'xml'.
Default 'html'.

Return

string Content with normalized entities.

Source

function wp_kses_normalize_entities( $content, $context = 'html' ) {
	// Disarm all entities by converting & to &
	$content = str_replace( '&', '&', $content );

	/*
	 * Decode any character references that are now double-encoded.
	 *
	 * It's important that the following normalizations happen in the correct order.
	 *
	 * At this point, all `&` have been transformed to `&`. Double-encoded named character
	 * references like `&amp;` will be decoded back to their single-encoded form `&`.
	 *
	 * First, numeric (decimal and hexadecimal) character references must be handled so that
	 * `&#09;` becomes `	`. If the named character references were handled first, there
	 * would be no way to know whether the double-encoded character reference had been produced
	 * in this function or was the original input.
	 *
	 * Consider the two examples, first with named entity decoding followed by numeric
	 * entity decoding. We'll use U+002E FULL STOP (.) in our example, this table follows the
	 * string processing from left to right:
	 *
	 * | Input        | &-encoded        | Named ref double-decoded  | Numeric ref double-decoded |
	 * | ------------ | ---------------- | ------------------------- | -------------------------- |
	 * | `.`     | `&#x2E;`     | `&#x2E;`              | `.`                   |
	 * | `&#x2E;` | `&amp;#x2E;` | `&#x2E;`              | `.`                   |
	 *
	 * Notice in the example above that different inputs result in the same result. The second case
	 * was not normalized and produced HTML that is semantically different from the input.
	 *
	 * | Input        | &-encoded        |  Numeric ref double-decoded | Named ref double-decoded |
	 * | ------------ | ---------------- | --------------------------- | ------------------------ |
	 * | `.`     | `&#x2E;`     | `.`                    | `.`                 |
	 * | `&#x2E;` | `&amp;#x2E;` | `&amp;#x2E;`            | `&#x2E;`             |
	 *
	 * Here, each input is normalized to an appropriate output.
	 */
	$content = preg_replace_callback( '/&#(0*[0-9]{1,7});/', 'wp_kses_normalize_entities2', $content );
	$content = preg_replace_callback( '/&#[Xx](0*[0-9A-Fa-f]{1,6});/', 'wp_kses_normalize_entities3', $content );
	if ( 'xml' === $context ) {
		$content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_xml_named_entities', $content );
	} else {
		$content = preg_replace_callback( '/&([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_named_entities', $content );
	}

	return $content;
}

View all references View on Trac View on GitHub

Used by	Description
esc_url()`wp-includes/formatting.php`	Checks and cleans a URL.
_wp_specialchars()`wp-includes/formatting.php`	Converts a number of special characters into their HTML entities.
wp_kses()`wp-includes/kses.php`	Filters text content and strips out disallowed HTML.

Changelog

Version	Description
5.5.0	Added `$context` parameter.
1.0.0	Introduced.

User Contributed Notes

You must log in before being able to contribute a note or feedback.

云策 WordPress 开发者社区

函数文档

wp_kses_normalize_entities()

概述

关键要点

代码示例

注意事项

Description

Parameters

Return

Source

Changelog

User Contributed Notes

函数文档

概述

关键要点

代码示例

注意事项

Description

Parameters

Return

Source

Related

Changelog

User Contributed Notes