从 MediaWiki API 调用（XML、cURL）中提取内容答案

【问题标题】：Extract content from MediaWiki API call (XML, cURL)从 MediaWiki API 调用（XML、cURL）中提取内容
【发布时间】：2010-09-13 08:04:22
【问题描述】：

网址：

http://en.wikipedia.org/w/api.php?action=parse&prop=text&page=Lost_(TV_series)&format=xml

这会输出如下内容：

<api><parse><text xml:space="preserve">text...</text></parse></api>

如何仅获取 <text xml:space="preserve"> 和 </text> 之间的内容？

我使用 curl 来获取该 URL 的所有内容。所以这给了我：

$html = curl_exec($curl_handle);

下一步是什么？

【问题讨论】：

标签： php parsing curl mediawiki xml-parsing

【解决方案1】：

使用PHP DOM 解析它。这样做：

//you already have input text in $html
$html = '<api><parse><text xml:space="preserve">text...</text></parse></api>';

//parsing begins here:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('text');

//display what you need:
echo $nodes->item(0)->nodeValue;

这个输出：

文字...

【讨论】：