esc_xml()
云策文档标注
概述
esc_xml() 是 WordPress 5.5.0 引入的函数,用于对 XML 块进行转义,确保输出内容的安全性和有效性。它通过处理无效 UTF-8 字符、区分 CDATA 和非 CDATA 部分,并应用适当的转义规则来生成安全的 XML 文本。
关键要点
- 函数 esc_xml() 接受一个字符串参数 $text,返回转义后的字符串。
- 内部使用 wp_check_invalid_utf8() 检查并清理无效 UTF-8 字符。
- 通过正则表达式匹配 CDATA 和非 CDATA 部分,对非 CDATA 部分使用 _wp_specialchars() 进行 XML 实体转义。
- 提供过滤器 'esc_xml',允许开发者自定义转义后的字符串。
- 主要用于 WordPress 站点地图(sitemaps)相关功能,如 WP_Sitemaps_Renderer 和 WP_Sitemaps_Stylesheet 类。
代码示例
function esc_xml( $text ) {
$safe_text = wp_check_invalid_utf8( $text );
$cdata_regex = '\';
$regex = <<<EOF
/(?=(.*?)) # the "anything" matched by the lookahead
(?({$cdata_regex})) # the CDATA Section matched by the lookahead
| # alternative
(?(.*)) # non-CDATA Section
/sx
EOF;
$safe_text = (string) preg_replace_callback(
$regex,
static function ( $matches ) {
if ( ! isset( $matches[0] ) ) {
return '';
}
if ( isset( $matches['non_cdata'] ) ) {
// escape HTML entities in the non-CDATA Section.
return _wp_specialchars( $matches['non_cdata'], ENT_XML1 );
}
// Return the CDATA Section unchanged, escape HTML entities in the rest.
return _wp_specialchars( $matches['non_cdata_followed_by_cdata'], ENT_XML1 ) . $matches['cdata'];
},
$safe_text
);
return apply_filters( 'esc_xml', $safe_text, $text );
}注意事项
- 此函数从 WordPress 5.5.0 版本开始可用,旧版本中不可用。
- 转义过程会保留 CDATA 部分不变,仅对非 CDATA 部分进行转义,确保 XML 结构的正确性。
- 开发者可以通过 'esc_xml' 过滤器修改转义行为,但需谨慎操作以避免安全风险。
原文内容
Escaping for XML blocks.
Parameters
$textstringrequired-
Text to escape.
Source
function esc_xml( $text ) {
$safe_text = wp_check_invalid_utf8( $text );
$cdata_regex = '<![CDATA[.*?]]>';
$regex = <<<EOF
/
(?=.*?{$cdata_regex}) # lookahead that will match anything followed by a CDATA Section
(?<non_cdata_followed_by_cdata>(.*?)) # the "anything" matched by the lookahead
(?<cdata>({$cdata_regex})) # the CDATA Section matched by the lookahead
| # alternative
(?<non_cdata>(.*)) # non-CDATA Section
/sx
EOF;
$safe_text = (string) preg_replace_callback(
$regex,
static function ( $matches ) {
if ( ! isset( $matches[0] ) ) {
return '';
}
if ( isset( $matches['non_cdata'] ) ) {
// escape HTML entities in the non-CDATA Section.
return _wp_specialchars( $matches['non_cdata'], ENT_XML1 );
}
// Return the CDATA Section unchanged, escape HTML entities in the rest.
return _wp_specialchars( $matches['non_cdata_followed_by_cdata'], ENT_XML1 ) . $matches['cdata'];
},
$safe_text
);
/**
* Filters a string cleaned and escaped for output in XML.
*
* Text passed to esc_xml() is stripped of invalid or special characters
* before output. HTML named character references are converted to their
* equivalent code points.
*
* @since 5.5.0
*
* @param string $safe_text The text after it has been escaped.
* @param string $text The text prior to being escaped.
*/
return apply_filters( 'esc_xml', $safe_text, $text );
}
Hooks
- apply_filters( ‘esc_xml’, string $safe_text, string $text )
-
Filters a string cleaned and escaped for output in XML.
Changelog
| Version | Description |
|---|---|
| 5.5.0 | Introduced. |