为什么我没有从网站返回任何数据？答案

【问题标题】：Why am I not getting any data back from website?为什么我没有从网站返回任何数据？
【发布时间】：2019-07-20 11:21:57
【问题描述】：

所以我是全新的整个网络抓取的东西。我一直在做一个项目，该项目需要我从here 获得当日消息。我现在已经成功地抓住了这个词，我只需要得到定义，但是当我这样做时，我得到了这个结果：

Avuncular（当天正确的单词）

定义：

[]

这是我的代码：

from lxml import html
import requests

page = requests.get('https://www.merriam-webster.com/word-of-the-day')
tree = html.fromstring(page.content)

word = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[1]/div[2]/div[1]/div/h1/text()')

WOTD = str(word)
WOTD = WOTD[2:]
WOTD = WOTD[:-2]

print(WOTD.capitalize())


print("Definition:")

wordDef = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[2]/div[1]/div/div[1]/p[1]/text()')

print(wordDef)

[] 应该是第一个定义，但由于某种原因不起作用。

任何帮助将不胜感激。

【问题讨论】：

标签： python html xpath web-scraping lxml

【解决方案1】：

您的 xpath 有点偏离。这是正确的：

wordDef = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[3]/div[1]/div/div[1]/p[1]/text()')

注意 main/article 后面的 div[3] 而不是 div[2]。现在运行时你应该得到：

Avuncular
Definition:
[' suggestive of an uncle especially in kindliness or geniality']

【讨论】：

谢谢！我使用 chrome 复制了 XPath。我一定是抄错了。

【解决方案2】：

如果您想避免在 xpath 中硬编码索引，可以使用以下方法来替代您当前的尝试：

import requests
from lxml.html import fromstring

page = requests.get('https://www.merriam-webster.com/word-of-the-day')
tree = fromstring(page.text)
word = tree.xpath("//*[@class='word-header']//h1")[0].text
wordDef = tree.xpath("//h2[contains(.,'Definition')]/following-sibling::p/strong")[0].tail.strip()
print(f'{word}\n{wordDef}')

如果wordDef 未能获得完整部分，请尝试用以下替换：

wordDef = tree.xpath("//h2[contains(.,'Definition')]/following-sibling::p")[0].text_content()

输出：

avuncular
suggestive of an uncle especially in kindliness or geniality

【讨论】：