函数文档

esc_xml()

💡 云策文档标注

概述

esc_xml() 是 WordPress 5.5.0 引入的函数,用于对 XML 块进行转义,确保输出内容的安全性和有效性。它通过处理无效 UTF-8 字符、区分 CDATA 和非 CDATA 部分,并应用适当的转义规则来生成安全的 XML 文本。

关键要点

  • 函数 esc_xml() 接受一个字符串参数 $text,返回转义后的字符串。
  • 内部使用 wp_check_invalid_utf8() 检查并清理无效 UTF-8 字符。
  • 通过正则表达式匹配 CDATA 和非 CDATA 部分,对非 CDATA 部分使用 _wp_specialchars() 进行 XML 实体转义。
  • 提供过滤器 'esc_xml',允许开发者自定义转义后的字符串。
  • 主要用于 WordPress 站点地图(sitemaps)相关功能,如 WP_Sitemaps_Renderer 和 WP_Sitemaps_Stylesheet 类。

代码示例

function esc_xml( $text ) {
    $safe_text = wp_check_invalid_utf8( $text );

    $cdata_regex = '\';
    $regex       = <<<EOF
/(?=(.*?)) # the "anything" matched by the lookahead
(?({$cdata_regex}))            # the CDATA Section matched by the lookahead
|                                      # alternative
(?(.*))                    # non-CDATA Section
/sx
EOF;

    $safe_text = (string) preg_replace_callback(
        $regex,
        static function ( $matches ) {
            if ( ! isset( $matches[0] ) ) {
                return '';
            }

            if ( isset( $matches['non_cdata'] ) ) {
                // escape HTML entities in the non-CDATA Section.
                return _wp_specialchars( $matches['non_cdata'], ENT_XML1 );
            }

            // Return the CDATA Section unchanged, escape HTML entities in the rest.
            return _wp_specialchars( $matches['non_cdata_followed_by_cdata'], ENT_XML1 ) . $matches['cdata'];
        },
        $safe_text
    );

    return apply_filters( 'esc_xml', $safe_text, $text );
}

注意事项

  • 此函数从 WordPress 5.5.0 版本开始可用,旧版本中不可用。
  • 转义过程会保留 CDATA 部分不变,仅对非 CDATA 部分进行转义,确保 XML 结构的正确性。
  • 开发者可以通过 'esc_xml' 过滤器修改转义行为,但需谨慎操作以避免安全风险。

📄 原文内容

Escaping for XML blocks.

Parameters

$textstringrequired
Text to escape.

Return

string Escaped text.

Source

function esc_xml( $text ) {
	$safe_text = wp_check_invalid_utf8( $text );

	$cdata_regex = '<![CDATA[.*?]]>';
	$regex       = <<<EOF
/
	(?=.*?{$cdata_regex})                 # lookahead that will match anything followed by a CDATA Section
	(?<non_cdata_followed_by_cdata>(.*?)) # the "anything" matched by the lookahead
	(?<cdata>({$cdata_regex}))            # the CDATA Section matched by the lookahead

|	                                      # alternative

	(?<non_cdata>(.*))                    # non-CDATA Section
/sx
EOF;

	$safe_text = (string) preg_replace_callback(
		$regex,
		static function ( $matches ) {
			if ( ! isset( $matches[0] ) ) {
				return '';
			}

			if ( isset( $matches['non_cdata'] ) ) {
				// escape HTML entities in the non-CDATA Section.
				return _wp_specialchars( $matches['non_cdata'], ENT_XML1 );
			}

			// Return the CDATA Section unchanged, escape HTML entities in the rest.
			return _wp_specialchars( $matches['non_cdata_followed_by_cdata'], ENT_XML1 ) . $matches['cdata'];
		},
		$safe_text
	);

	/**
	 * Filters a string cleaned and escaped for output in XML.
	 *
	 * Text passed to esc_xml() is stripped of invalid or special characters
	 * before output. HTML named character references are converted to their
	 * equivalent code points.
	 *
	 * @since 5.5.0
	 *
	 * @param string $safe_text The text after it has been escaped.
	 * @param string $text      The text prior to being escaped.
	 */
	return apply_filters( 'esc_xml', $safe_text, $text );
}

Hooks

apply_filters( ‘esc_xml’, string $safe_text, string $text )

Filters a string cleaned and escaped for output in XML.

Changelog

Version Description
5.5.0 Introduced.