wp_check_invalid_utf8()
云策文档标注
概述
wp_check_invalid_utf8() 函数用于检查字符串中的无效 UTF-8 编码。仅在 blog_charset 设置为 UTF-8 时执行检查,否则返回原输入。可通过 $strip 参数控制是否替换无效字节序列为 Unicode 替换字符。
关键要点
- 函数仅在 blog_charset 为 UTF-8 时工作,否则直接返回输入文本
- 默认情况下,输入包含无效 UTF-8 序列时返回空字符串
- 设置 $strip 参数为 true 可替换无效字节序列为 Unicode 替换字符 (U+FFFD �)
- 建议考虑使用 wp_scrub_utf8() 替代,它不依赖 blog_charset 值
- 函数返回检查后的文本字符串
代码示例
// The `blog_charset` is `latin1`, so this returns the input unchanged.
$every_possible_input === wp_check_invalid_utf8( $every_possible_input );
// Valid strings come through unchanged.
'test' === wp_check_invalid_utf8( 'test' );
$invalid = "the byte xC0 is never allowed in a UTF-8 string.";
// Invalid strings are rejected outright.
'' === wp_check_invalid_utf8( $invalid );
// “Stripping” invalid sequences produces the replacement character instead.
"the byte u{FFFD} is never allowed in a UTF-8 string." === wp_check_invalid_utf8( $invalid, true );
'the byte � is never allowed in a UTF-8 string.' === wp_check_invalid_utf8( $invalid, true );注意事项
- 函数依赖 blog_charset 设置,非 UTF-8 时可能不执行检查
- 使用 $strip 参数可灵活处理无效序列,避免数据丢失
- 相关函数如 wp_is_valid_utf8() 和 wp_scrub_utf8() 提供更多 UTF-8 处理选项
原文内容
Checks for invalid UTF8 in a string.
Description
Note! This function only performs its work if the blog_charset is set to UTF-8. For all other values it returns the input text unchanged.
Note! Unless requested, this returns an empty string if the input contains any sequences of invalid UTF-8. To replace invalid byte sequences, pass true as the optional $strip parameter.
Consider using wp_scrub_utf8() instead which does not depend on the value of blog_charset.
Example:
// The `blog_charset` is `latin1`, so this returns the input unchanged.
$every_possible_input === wp_check_invalid_utf8( $every_possible_input );
// Valid strings come through unchanged.
'test' === wp_check_invalid_utf8( 'test' );
$invalid = "the byte xC0 is never allowed in a UTF-8 string.";
// Invalid strings are rejected outright.
'' === wp_check_invalid_utf8( $invalid );
// “Stripping” invalid sequences produces the replacement character instead.
"the byte u{FFFD} is never allowed in a UTF-8 string." === wp_check_invalid_utf8( $invalid, true );
'the byte � is never allowed in a UTF-8 string.' === wp_check_invalid_utf8( $invalid, true );
Parameters
$textstringrequired-
String which is expected to be encoded as UTF-8 unless
blog_charsetis another encoding. $stripbooloptional-
Whether to replace invalid sequences of bytes with the Unicode replacement character (U+FFFD
�). Defaultfalsereturns an empty string for invalid UTF-8 inputs.Default:
false
Source
function wp_check_invalid_utf8( $text, $strip = false ) {
$text = (string) $text;
if ( 0 === strlen( $text ) ) {
return '';
}
// Store the site charset as a static to avoid multiple calls to get_option().
static $is_utf8 = null;
if ( ! isset( $is_utf8 ) ) {
$is_utf8 = is_utf8_charset();
}
if ( ! $is_utf8 || wp_is_valid_utf8( $text ) ) {
return $text;
}
return $strip
? wp_scrub_utf8( $text )
: '';
}