函数文档

wp_extract_urls()

💡 云策文档标注

概述

wp_extract_urls() 是一个 WordPress 核心函数,用于从任意内容中提取 URL。它基于正则表达式实现,返回一个包含所有找到的 URL 的数组。

关键要点

  • 参数:$content(字符串,必需),要从中提取 URL 的内容。
  • 返回值:字符串数组,包含在传入字符串中找到的所有 URL。
  • 函数内部使用 preg_match_all 进行正则匹配,支持处理 HTML 实体并保持向后兼容性。
  • 相关函数:do_enclose() 和 pingback() 使用此函数来检查内容中的链接。
  • 版本历史:自 WordPress 3.7.0 引入,6.0.0 修复了对 HTML 实体的支持。

代码示例

$string = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin elementum quis lacus in accumsan. Sed sed lacus odio. Sed ullamcorper, nibh et dignissim convallis, lacus tellus pellentesque ipsum, et interdum purus urna ultricies justo. Phasellus blandit eros nec lectus vestibulum consequat. Cras faucibus turpis sed ante commodo cursus. Duis vitae ligula vulputate, varius mi vel, facilisis est. Nulla id mollis ipsum. Nunc faucibus augue vel erat luctus sodales. Curabitur gravida vulputate nulla sed aliquam. Ut posuere mollis mauris, et placerat diam cursus vitae. Vivamus eros arcu, lobortis id sapien at, tempus tristique nunc. Praesent sollicitudin vulputate lorem, vitae vestibulum nisi pretium non. http://example.com is a cool site.';

$urls = wp_extract_urls( $string );

注意事项

  • 该函数可能无法匹配没有顶级域名(TLD)的 localhost URL,例如 http://localhost:8889/?p=9。
  • 用户贡献的笔记中提到了此限制,并链接到相关工单。

📄 原文内容

Uses RegEx to extract URLs from arbitrary content.

Parameters

$contentstringrequired
Content to extract URLs from.

Return

string[] Array of URLs found in passed string.

Source

function wp_extract_urls( $content ) {
	preg_match_all(
		"#(["']?)("
			. '(?:([w-]+:)?//?)'
			. '[^s()<>]+'
			. '[.]'
			. '(?:'
				. '([wd]+)|'
				. '(?:'
					. "[^`!()[]{}:'".,<>«»“”‘’s]|"
					. '(?:[:]d+)?/?'
				. ')+'
			. ')'
		. ")\1#",
		$content,
		$post_links
	);

	$post_links = array_unique(
		array_map(
			static function ( $link ) {
				// Decode to replace valid entities, like &.
				$link = html_entity_decode( $link );
				// Maintain backward compatibility by removing extraneous semi-colons (`;`).
				return str_replace( ';', '', $link );
			},
			$post_links[2]
		)
	);

	return array_values( $post_links );
}

Changelog

Version Description
6.0.0 Fixes support for HTML entities (Trac 30580).
3.7.0 Introduced.

User Contributed Notes

  1. Skip to note 3 content

    This doesn’t work for localhost URLs without TLDs:

    <a href="<a href="http://localhost.com:8889/?p=9">hi</a&gt">http://localhost.com:8889/?p=9">hi</a&gt<>;  // Matches.
    <a href="<a href="http://localhost:8889/?p=9">hi</a&gt">http://localhost:8889/?p=9">hi</a&gt<>;      // Doesn't match.

    (See this ticket.)

  2. Skip to note 4 content

    Example

    This Code:

    $string = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin elementum quis lacus in accumsan. Sed sed lacus odio. Sed ullamcorper, nibh et dignissim convallis, lacus tellus pellentesque ipsum, et interdum purus urna ultricies justo. Phasellus blandit eros nec lectus vestibulum consequat. Cras faucibus turpis sed ante commodo cursus. Duis vitae ligula vulputate, varius mi vel, facilisis est. Nulla id mollis ipsum. Nunc faucibus augue vel erat luctus sodales. Curabitur gravida vulputate nulla sed aliquam. Ut posuere mollis mauris, et placerat diam cursus vitae. Vivamus eros arcu, lobortis id sapien at, tempus tristique nunc. Praesent sollicitudin vulputate lorem, vitae vestibulum nisi pretium non. <a href="http://example.com" rel="nofollow ugc">http://example.com</a> is a cool site.';
    
    $urls = wp_extract_urls( $string );

    Will return an array like this:

    array( 0 => 'http://example.com' )