【问题标题】:Python - lxml / Get full content of xpathPython - lxml / 获取xpath的全部内容
【发布时间】:2013-08-21 04:37:44
【问题描述】:

仅以 Twitter 为例,此代码从 Twitter 页面中抓取第 5 条推文。该页面包含一个链接,除非我尝试使用 lxml 和 xpath 将其拉上来,它只显示切断链接末尾的文本。

脚本:

import urllib2
from lxml import etree

xpathselector = "/html/body/div/div[2]/div/div[5]/div[2]/div/ol/li[5]/div/div/p"
url =  "https://twitter.com/memphismayfire"
response = urllib2.urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
result = tree.xpath(xpathselector)

print result[0].text

打印:

'Miles Away' Acoustic 在 iTunes 上可用!谁下载了单曲?!让我们上单曲榜!!链接:

来自 xPath 位置的 HTML:

<p class="js-tweet-text tweet-text">'Miles Away' Acoustic is available on iTunes! Who's downloaded the single?! Let's get it up the Singles Chart!! Link: <a title="http://smarturl.it/mmf-miles-away" target="_blank" class="twitter-timeline-link" data-expanded-url="http://smarturl.it/mmf-miles-away" dir="ltr" rel="nofollow" href="http://t.co/fU2hVqAiSq" f52ae163cfcf0237f="true"><span class="tco-ellipsis"></span><span class="invisible">http://</span><span class="js-display-url">smarturl.it/mmf-miles-away</span><span class="invisible"></span><span class="tco-ellipsis"><span class="invisible">&nbsp;</span></span></a><div ida2bb72480="_p_mzkte2cwofawsu3r.t.co" style="cursor: pointer; width: 16px; height: 16px;display: inline-block;">&nbsp;</div></p>

打印 xpath 的全部内容而不仅仅是文本的最佳方法是什么?感谢您的帮助!

【问题讨论】:

标签: python xpath lxml


【解决方案1】:

使用lxml.etree.tostring:

print etree.tostring(result[0])

【讨论】:

    猜你喜欢
    • 2011-05-06
    • 1970-01-01
    • 2012-09-29
    • 2020-03-09
    • 1970-01-01
    • 2015-09-03
    • 1970-01-01
    • 2022-01-23
    • 2020-07-12
    相关资源
    最近更新 更多