Selector.xpath().get() 返回 xpath 目标之后的所有元素

【问题标题】：Selector.xpath().get() returns all elements after xpath targetSelector.xpath().get() 返回 xpath 目标之后的所有元素
【发布时间】：2022-01-18 02:18:01
【问题描述】：

这是我从中提取的 html 代码

from scrapy import Selector
import requests
import pandas as pd

html = '''
<html>
<body>

<h1>My First Heading</h1>

<p>My first paragraph.</p>

</body>
</html>
'''

那我用的是scrapy选择器

sel = Selector( text = html )

然后我使用 xpath 选择一个元素，但它也会返回目标元素之后的所有内容

in:
sel.xpath('/html/body/h1').get()
out:
'<h1>My First Heading</h1>\n\n<p>My first paragraph.</p>\n\n</body>\n</html>\n'

我期待它返回：

'<h1>My First Heading</h1>'

【问题讨论】：

标签： python web-scraping xpath scrapy selector

【解决方案1】：

您使用的 xpath 是正确的。你的表达式最终返回了我的预期结果。试试下面的替代方案。

>>> from scrapy.selector import Selector
>>> sel = Selector(text=html)
>>> sel.xpath("//h1").get()
'<h1>My First Heading</h1>'

【讨论】：

之后它仍然返回所有内容。我编辑了我的帖子以显示我导入的包。也许这会影响它？
Selector 对象是从 scrapy.selector 模块导入的。我已经编辑了答案，向您展示了我是如何导入它的。你用的是什么版本的scrapy？
我使用的是 Scrapy 2.5.1 版本。我尝试使用您的导入，但没有任何变化。我可能会尝试在与 Jupyter 不同的 python 环境中运行它。