【发布时间】:2012-01-23 06:01:48
【问题描述】:
好吧,我做一个return simplexml_load_string($data, 'SimpleXMLElement', LIBXML_COMPACT | LIBXML_NOCDATA | LIBXML_NOBLANKS | LIBXML_NOEMPTYTAG ); 并解析 xml 响应。
问题是[描述]的内容真的很乱,我需要选择我需要的数据。
[description] =>
<a href="http://www.metacafe.com/watch/cb-YpE1z5IhjWrmCM62DSTU8jQ9X4IZryVR/the_dish_with_doc_willoughby/"><img src="http://s4.mcstatic.com/thumb/8000947/21507982/4/directors_cut/0/1/the_dish_with_doc_willoughby.jpg?v=8" align="right" border="0" alt="THE Dish with Doc Willoughby" vspace="4" hspace="4" width="134" height="78" /></a>
<p>
Doc Willoughby, guru of "America's Test Kitchen," stopped by "CBS The Morning: Saturday" to share his ultimate dish with Rebecca Jarvis and Jeff Glor: Roast Beef Tenderloin with Dried Fruit and Nut Stuffing. <br>Ranked <strong>4.00</strong> / 5 | 2 views | <a href="http://www.metacafe.com/watch/cb-YpE1z5IhjWrmCM62DSTU8jQ9X4IZryVR/the_dish_with_doc_willoughby/">0 comments</a><br/>
</p>
<p>
<a href="http://www.metacafe.com/watch/cb-YpE1z5IhjWrmCM62DSTU8jQ9X4IZryVR/the_dish_with_doc_willoughby/"><strong>Click here to watch the video</strong></a> (04:58)<br/>
Submitted By: <a href="http://www.metacafe.com/channels/CBS/">CBS</a><br/>
Tags:
<a href="http://www.metacafe.com/topics/cbsepisode/">Cbsepisode</a> <a href="http://www.metacafe.com/topics/dish/">Dish</a> <a href="http://www.metacafe.com/topics/doc_willoughby/">Doc Willoughby</a> <a href="http://www.metacafe.com/topics/america%27s_test_kitchen/">America's Test Kitchen</a> <a href="http://www.metacafe.com/topics/roast_beef_tenderloin/">Roast Beef Tenderloin</a> <a href="http://www.metacafe.com/topics/dried_fruit/">Dried Fruit</a> <a href="http://www.metacafe.com/topics/nut_stuffing/">Nut Stuffing</a> <a href="http://www.metacafe.com/topics/cbs_this_morning/">CBS This Morning</a> <br/>
Categories: <a href='http://www.metacafe.com/videos/news_and_events/'>News & Events</a> </p>
如您所见,它真的一团糟,我想知道如何获得例如第一个 <p> 的数据,直到 "
Ranked ..." 和标签
编辑:
好的,这是我正在使用的 php 代码:
$dom = new DOMDocument();
@$dom->loadHTML($result->description); // or you can use loadXML
$dom->normalizeDocument();
/*$dom->resolveExternals = false;
$dom->substituteEntities = false;*/
$xml = simplexml_import_dom($dom);
$data['viewData']['data']['description'] = $xml;
或
$paragraph = $dom->getElementsByTagName('p'); -> this doesn't work
//$xml = simplexml_import_dom($dom);
$data['viewData']['data']['description'] = $paragraph;
这是输出:
[description] => SimpleXMLElement Object
(
[body] => SimpleXMLElement Object
(
[a] => SimpleXMLElement Object
(
[@attributes] => Array
(
[href] => http://www.metacafe.com/watch/cb-YpE1z5IhjWrmCM62DSTU8jQ9X4IZryVR/the_dish_with_doc_willoughby/
)
[img] => SimpleXMLElement Object
(
[@attributes] => Array
(
[src] => http://s4.mcstatic.com/thumb/8000947/21507982/4/directors_cut/0/1/the_dish_with_doc_willoughby.jpg?v=8
[align] => right
[border] => 0
[alt] => THE Dish with Doc Willoughby
[vspace] => 4
[hspace] => 4
[width] => 134
[height] => 78
)
)
)
[p] => Array
(
[0] =>
Doc Willoughby, guru of "America's Test Kitchen," stopped by "CBS The Morning: Saturday" to share his ultimate dish with Rebecca Jarvis and Jeff Glor: Roast Beef Tenderloin with Dried Fruit and Nut Stuffing. Ranked / 5 | 2 views |
[1] => SimpleXMLElement Object
(
[a] => Array
(
[0] => SimpleXMLElement Object
(
[@attributes] => Array
(
[href] => http://www.metacafe.com/watch/cb-YpE1z5IhjWrmCM62DSTU8jQ9X4IZryVR/the_dish_with_doc_willoughby/
)
[strong] => Click here to watch the video
)
[1] => CBS
[2] => Cbsepisode
[3] => Dish
[4] => Doc Willoughby
[5] => America's Test Kitchen
[6] => Roast Beef Tenderloin
[7] => Dried Fruit
[8] => Nut Stuffing
[9] => CBS This Morning
[10] => News & Events
)
[br] => Array
(
[0] => SimpleXMLElement Object
(
)
[1] => SimpleXMLElement Object
(
)
[2] => SimpleXMLElement Object
(
)
)
)
)
有没有办法“让输出更漂亮”?我的意思是更好的订购...我也尝试过使用getElementsByTagName('p'),但没有成功
【问题讨论】:
-
老实说,没什么...我现在正在尝试一些
preg-match -
为什么不使用 HTML 解析器加载描述?
-
嗯,好主意多尔...我现在就试试
-
@DorShemer 你知道我可以在哪里阅读有关 HTML 解析的更多信息...现在我正在阅读 docs.php.net/manual/en/domdocument.loadhtml.php
-
@w0rldart 看起来是个不错的起点
标签: php preg-match filtering