函数文档

wp_kses_hair()

💡 云策文档标注

概述

wp_kses_hair() 函数用于将 HTML 元素的属性字符串解析为结构化数组,处理各种输入情况,确保符合 W3C HTML 规范。它自动添加引号、移除不良 URL 协议,并处理重复属性。

关键要点

  • 解析 HTML 属性字符串为数组,每个属性包含 name、value、whole 和 vless 字段
  • 自动为未加引号的属性值添加引号,以符合 W3C 规范
  • 移除属性值中的不良 URL 协议,基于 wp_kses_bad_protocol() 函数
  • 处理重复属性,仅保留首次定义的属性
  • 支持无值属性(如 selected)和带引号或不带引号的值
  • 依赖 wp_kses_uri_attributes() 获取 URL 属性列表,wp_kses_html_error() 处理解析错误

代码示例

function wp_kses_hair( $attr, $allowed_protocols ) {
	$attrarr  = array();
	$mode     = 0;
	$attrname = '';
	$uris     = wp_kses_uri_attributes();

	// Loop through the whole attribute list.

	while ( strlen( $attr ) !== 0 ) {
		$working = 0; // Was the last operation successful?

		switch ( $mode ) {
			case 0:
				if ( preg_match( '/^([_a-zA-Z][-_a-zA-Z0-9:.]*)/', $attr, $match ) ) {
					$attrname = $match[1];
					$working  = 1;
					$mode     = 1;
					$attr     = preg_replace( '/^[_a-zA-Z][-_a-zA-Z0-9:.]*/', '', $attr );
				}

				break;

			case 1:
				if ( preg_match( '/^s*=s*/', $attr ) ) { // Equals sign.
					$working = 1;
					$mode    = 2;
					$attr    = preg_replace( '/^s*=s*/', '', $attr );
					break;
				}

				if ( preg_match( '/^s+/', $attr ) ) { // Valueless.
					$working = 1;
					$mode    = 0;

					if ( false === array_key_exists( $attrname, $attrarr ) ) {
						$attrarr[ $attrname ] = array(
							'name'  => $attrname,
							'value' => '',
							'whole' => $attrname,
							'vless' => 'y',
						);
					}

					$attr = preg_replace( '/^s+/', '', $attr );
				}

				break;

			case 2:
				if ( preg_match( '%^"([^"]*)"(s+|/?$)%', $attr, $match ) ) {
					// "value"
					$thisval = $match[1];
					if ( in_array( strtolower( $attrname ), $uris, true ) ) {
						$thisval = wp_kses_bad_protocol( $thisval, $allowed_protocols );
					}

					if ( false === array_key_exists( $attrname, $attrarr ) ) {
						$attrarr[ $attrname ] = array(
							'name'  => $attrname,
							'value' => $thisval,
							'whole' => "$attrname="$thisval"",
							'vless' => 'n',
						);
					}

					$working = 1;
					$mode    = 0;
					$attr    = preg_replace( '/^"[^"]*"(s+|$)/', '', $attr );
					break;
				}

				if ( preg_match( "%^'([^']*)'(s+|/?$)%", $attr, $match ) ) {
					// 'value'
					$thisval = $match[1];
					if ( in_array( strtolower( $attrname ), $uris, true ) ) {
						$thisval = wp_kses_bad_protocol( $thisval, $allowed_protocols );
					}

					if ( false === array_key_exists( $attrname, $attrarr ) ) {
						$attrarr[ $attrname ] = array(
							'name'  => $attrname,
							'value' => $thisval,
							'whole' => "$attrname='$thisval'",
							'vless' => 'n',
						);
					}

					$working = 1;
					$mode    = 0;
					$attr    = preg_replace( "/^'[^']*'(s+|$)/", '', $attr );
					break;
				}

				if ( preg_match( '%^([^s"']+)(s+|/?$)%', $attr, $match ) ) {
					// value
					$thisval = $match[1];
					if ( in_array( strtolower( $attrname ), $uris, true ) ) {
						$thisval = wp_kses_bad_protocol( $thisval, $allowed_protocols );
					}

					if ( false === array_key_exists( $attrname, $attrarr ) ) {
						$attrarr[ $attrname ] = array(
							'name'  => $attrname,
							'value' => $thisval,
							'whole' => "$attrname="$thisval"",
							'vless' => 'n',
						);
					}

					// We add quotes to conform to W3C's HTML spec.
					$working = 1;
					$mode    = 0;
					$attr    = preg_replace( '%^[^s"']+(s+|$)%', '', $attr );
				}

				break;
		} // End switch.

		if ( 0 === $working ) { // Not well-formed, remove and try again.
			$attr = wp_kses_html_error( $attr );
			$mode = 0;
		}
	} // End while.

	if ( 1 === $mode && false === array_key_exists( $attrname, $attrarr ) ) {
		/*
		 * Special case, for when the attribute list ends with a valueless
		 * attribute like "selected".
		 */
		$attrarr[ $attrname ] = array(
			'name'  => $attrname,
			'value' => '',
			'whole' => $attrname,
			'vless' => 'y',
		);
	}

	return $attrarr;
}

注意事项

  • 函数接受两个参数:$attrstring(属性字符串)和 $allowed_protocols(允许的 URL 协议数组)
  • 返回解析后的属性信息数组,每个元素包含 name、value、whole 和 vless 键
  • 内部使用状态机(mode 0、1、2)循环解析属性字符串
  • 依赖 wp_kses_uri_attributes() 来识别 URL 属性,并调用 wp_kses_bad_protocol() 进行协议过滤
  • 解析错误时调用 wp_kses_html_error() 处理
  • 在 WordPress 1.0.0 版本引入,用于多个相关函数如 wp_kses_attr()

📄 原文内容

Builds an attribute list from string containing attributes.

Description

This function does a lot of work. It parses an attribute list into an array with attribute data, and tries to do the right thing even if it gets weird input. It will add quotes around attribute values that don’t have any quotes or apostrophes around them, to make it easier to produce HTML code that will conform to W3C’s HTML specification. It will also remove bad URL protocols from attribute values. It also reduces duplicate attributes by using the attribute defined first (foo='bar' foo='baz' will result in foo='bar').

Parameters

$attrstringrequired
Attribute list from HTML element to closing HTML element tag.
$allowed_protocolsstring[]required
Array of allowed URL protocols.

Return

array[] Array of attribute information after parsing.

Source

function wp_kses_hair( $attr, $allowed_protocols ) {
	$attrarr  = array();
	$mode     = 0;
	$attrname = '';
	$uris     = wp_kses_uri_attributes();

	// Loop through the whole attribute list.

	while ( strlen( $attr ) !== 0 ) {
		$working = 0; // Was the last operation successful?

		switch ( $mode ) {
			case 0:
				if ( preg_match( '/^([_a-zA-Z][-_a-zA-Z0-9:.]*)/', $attr, $match ) ) {
					$attrname = $match[1];
					$working  = 1;
					$mode     = 1;
					$attr     = preg_replace( '/^[_a-zA-Z][-_a-zA-Z0-9:.]*/', '', $attr );
				}

				break;

			case 1:
				if ( preg_match( '/^s*=s*/', $attr ) ) { // Equals sign.
					$working = 1;
					$mode    = 2;
					$attr    = preg_replace( '/^s*=s*/', '', $attr );
					break;
				}

				if ( preg_match( '/^s+/', $attr ) ) { // Valueless.
					$working = 1;
					$mode    = 0;

					if ( false === array_key_exists( $attrname, $attrarr ) ) {
						$attrarr[ $attrname ] = array(
							'name'  => $attrname,
							'value' => '',
							'whole' => $attrname,
							'vless' => 'y',
						);
					}

					$attr = preg_replace( '/^s+/', '', $attr );
				}

				break;

			case 2:
				if ( preg_match( '%^"([^"]*)"(s+|/?$)%', $attr, $match ) ) {
					// "value"
					$thisval = $match[1];
					if ( in_array( strtolower( $attrname ), $uris, true ) ) {
						$thisval = wp_kses_bad_protocol( $thisval, $allowed_protocols );
					}

					if ( false === array_key_exists( $attrname, $attrarr ) ) {
						$attrarr[ $attrname ] = array(
							'name'  => $attrname,
							'value' => $thisval,
							'whole' => "$attrname="$thisval"",
							'vless' => 'n',
						);
					}

					$working = 1;
					$mode    = 0;
					$attr    = preg_replace( '/^"[^"]*"(s+|$)/', '', $attr );
					break;
				}

				if ( preg_match( "%^'([^']*)'(s+|/?$)%", $attr, $match ) ) {
					// 'value'
					$thisval = $match[1];
					if ( in_array( strtolower( $attrname ), $uris, true ) ) {
						$thisval = wp_kses_bad_protocol( $thisval, $allowed_protocols );
					}

					if ( false === array_key_exists( $attrname, $attrarr ) ) {
						$attrarr[ $attrname ] = array(
							'name'  => $attrname,
							'value' => $thisval,
							'whole' => "$attrname='$thisval'",
							'vless' => 'n',
						);
					}

					$working = 1;
					$mode    = 0;
					$attr    = preg_replace( "/^'[^']*'(s+|$)/", '', $attr );
					break;
				}

				if ( preg_match( "%^([^s"']+)(s+|/?$)%", $attr, $match ) ) {
					// value
					$thisval = $match[1];
					if ( in_array( strtolower( $attrname ), $uris, true ) ) {
						$thisval = wp_kses_bad_protocol( $thisval, $allowed_protocols );
					}

					if ( false === array_key_exists( $attrname, $attrarr ) ) {
						$attrarr[ $attrname ] = array(
							'name'  => $attrname,
							'value' => $thisval,
							'whole' => "$attrname="$thisval"",
							'vless' => 'n',
						);
					}

					// We add quotes to conform to W3C's HTML spec.
					$working = 1;
					$mode    = 0;
					$attr    = preg_replace( "%^[^s"']+(s+|$)%", '', $attr );
				}

				break;
		} // End switch.

		if ( 0 === $working ) { // Not well-formed, remove and try again.
			$attr = wp_kses_html_error( $attr );
			$mode = 0;
		}
	} // End while.

	if ( 1 === $mode && false === array_key_exists( $attrname, $attrarr ) ) {
		/*
		 * Special case, for when the attribute list ends with a valueless
		 * attribute like "selected".
		 */
		$attrarr[ $attrname ] = array(
			'name'  => $attrname,
			'value' => '',
			'whole' => $attrname,
			'vless' => 'y',
		);
	}

	return $attrarr;
}

Changelog

Version Description
1.0.0 Introduced.