通过 file_get_contents 和 preg_match 获取 og:image答案

【问题标题】：Fetch og:image by file_get_contents and preg_match通过 file_get_contents 和 preg_match 获取 og:image
【发布时间】：2013-07-20 22:59:26
【问题描述】：

我正在使用 file_get_contents 从任何 url 获取 og:image。

$fooURL = file_get_contents($URLVF['url']);

然后我过滤 property=og:image 以从页面中获取图像，下面的代码适用于大多数网站

preg_match("/content='(.*?)' property='og:image'/", $fooURL, $fooImage);

但是像 www.howcast.com 这样的网站有不同的 og:image 代码，如下所示

<meta content='http://attachments-mothership-production.s3.amazonaws.com/images/main-avatar.jpeg' property='og:image'>

所以要获得上述代码的图片链接，我需要 preg_match 是这样的

preg_match('/property="og:image" content="(.*?)"/', $fooURL, $fooImage);

当然，如果我现在使用上面的代码，唯一可以工作的网站就是 howcast，其他所有网站都不会返回任何内容

知道如何让代码使用任何编写元代码的方法或任何替代方法来顺利获取图像链接

【问题讨论】：

在 DOMDocument 上使用 XPATH。
使用 str 解释的内容，但您可以像这样对您的模式进行分组(pattern1|pattern2)
这里也有确切答案：stackoverflow.com/questions/12014196/…
@str 我试过这段代码jsfiddle.net/P8PrV 但结果总是NULL，我不知道我做错了什么！！
谢谢@Akam，我会检查这个答案。

标签： php preg-match metadata facebook-opengraph file-get-contents

【解决方案1】：

@str 建议的 DOMDocument 和 XPath 示例：

$html = <<<LOD
<html><head>
<meta content='http://attachments-mothership-production.s3.amazonaws.com/images/main-avatar.jpeg' property='og:image'>
</head><body></body></html>
LOD;

$doc = new DOMDocument();
@$doc->loadHTML($html);
// or @$doc->loadHTMLFile($URLVF['url']);
$xpath = new DOMXPath($doc);
$metaContentAttributeNodes = $xpath->query("/html/head/meta[@property='og:image']/@content");
foreach($metaContentAttributeNodes as $metaContentAttributeNode) {
    echo $metaContentAttributeNode->nodeValue . "<br/>";
}

【讨论】：