函数文档

_canonical_charset()

💡 云策文档标注

概述

_canonical_charset() 函数用于获取字符集的规范形式,适用于传递给 PHP 函数如 htmlspecialchars() 或 charset HTML 属性。它处理 UTF-8 和 ISO-8859-1 等常见字符集的标准化。

关键要点

  • 函数接受一个字符串参数 $charset,表示字符集名称,例如 "UTF-8"、"Windows-1252" 或 "SJIS"。
  • 返回字符集的规范形式字符串,例如 UTF-8 字符集返回 'UTF-8',ISO-8859-1 相关变体返回 'ISO-8859-1'。
  • 内部使用 is_utf8_charset() 检查是否为 UTF-8 字符集,并进行标准化处理。
  • 函数自 WordPress 3.6.0 版本引入,相关用途包括 _wp_die_process_input() 和 _wp_specialchars()。

代码示例

function _canonical_charset( $charset ) {
	if ( is_utf8_charset( $charset ) ) {
		return 'UTF-8';
	}

	/*
	 * Normalize the ISO-8859-1 family of languages.
	 *
	 * This is not required for htmlspecialchars(), as it properly recognizes all of
	 * the input character sets that here are transformed into "ISO-8859-1".
	 *
	 * @todo Should this entire check be removed since it's not required for the stated purpose?
	 * @todo Should WordPress transform other potential charset equivalents, such as "latin1"?
	 */
	if (
		( 0 === strcasecmp( 'iso-8859-1', $charset ) ) ||
		( 0 === strcasecmp( 'iso8859-1', $charset ) )
	) {
		return 'ISO-8859-1';
	}

	return $charset;
}

注意事项

  • 函数主要用于标准化字符集,以确保与 PHP 函数兼容,但部分检查(如 ISO-8859-1 标准化)可能非必需,代码中包含待办事项注释。
  • 开发者应确保传递有效的字符集名称,否则函数可能返回原始输入。

📄 原文内容

Retrieves a canonical form of the provided charset appropriate for passing to PHP functions such as htmlspecialchars() and charset HTML attributes.

Description

See also

Parameters

$charsetstringrequired
A charset name, e.g. “UTF-8”, “Windows-1252”, “SJIS”.

Return

string The canonical form of the charset.

Source

function _canonical_charset( $charset ) {
	if ( is_utf8_charset( $charset ) ) {
		return 'UTF-8';
	}

	/*
	 * Normalize the ISO-8859-1 family of languages.
	 *
	 * This is not required for htmlspecialchars(), as it properly recognizes all of
	 * the input character sets that here are transformed into "ISO-8859-1".
	 *
	 * @todo Should this entire check be removed since it's not required for the stated purpose?
	 * @todo Should WordPress transform other potential charset equivalents, such as "latin1"?
	 */
	if (
		( 0 === strcasecmp( 'iso-8859-1', $charset ) ) ||
		( 0 === strcasecmp( 'iso8859-1', $charset ) )
	) {
		return 'ISO-8859-1';
	}

	return $charset;
}

Changelog

Version Description
3.6.0 Introduced.