Scrapy 无法通过 CSS 或 xPath 请求文本答案

【问题标题】：Scrapy can't manage to request text with neither CSS or xPathScrapy 无法通过 CSS 或 xPath 请求文本
【发布时间】：2019-05-29 02:49:00
【问题描述】：

我一直在尝试提取一些文本，虽然一切正常，但有些东西我无法获得。

访问这个网站：https://duproprio.com/fr/montreal/pierrefonds-roxboro/condo-a-vendre/hab-305-5221-rue-riviera-854000

我想从 class=listing-main-characteristics__number 节点获取文本（图片下方，带有“2 chambres 1 salle de bain Aire habitable (s-sol exclu) 1,030 pi2 (95,69m2)”的框，页面中有 3 个具有该类的元素（“2”、“1”和“1,030 pi² (95,69 m²)”）。我在 XPath 和 CSS 中尝试了一堆选项，但没有一个有效，有些人给出了奇怪的答案。

例如，用：

response.xpath('//span[@class="listing-main-characteristics__number"]').getall()

我明白了：

['<span class="listing-main-characteristics\_\_number">\n 2\n </span>', '<span class="listing-main-characteristics\_\_number">\n 1\n </span>']

例如，在同一个网页上可以正常工作的其他东西：

response.xpath('//div[@property="description"]/p/text()').getall()

如果我通过此查询获得所有跨度：

response.css('span::text').getall()

我可以找到我在开头提到的文本。但是从这里：

response.css('span[class=listing-main-characteristics__number]::text').getall()

我只知道这个

['\n                        2\n                    ', '\n                        1\n                    ']

有人能告诉我我需要什么样的选择吗？非常感谢！

【问题讨论】：

标签： css xpath web-scraping scrapy

【解决方案1】：

这是您必须使用的 xpath。

//div[@data-label='#description']//div[@class='listing-main-characteristics__label']|//div[@data-label='#description']//div[@class='listing-main-characteristics__item-dimensions']/span[2]

您可能必须使用上面的 xpath。（添加 /text() 是你想要的关联文本。）

response.xpath("//div[@data-label='#description']//div[@class='listing-main-characteristics__label']|//div[@data-label='#description']//div[@class='listing-main-characteristics__item-dimensions']/span[2]").getall()

下面是python示例代码

url = "https://duproprio.com/fr/montreal/pierrefonds-roxboro/condo-a-vendre/hab-305-5221-rue-riviera-854000#description"
driver.get(url)
# get the output elements then we will get the text from them
outputs = driver.find_elements_by_xpath("//div[@data-label='#description']//div[@class='listing-main-characteristics__label']|//div[@data-label='#description']//div[@class='listing-main-characteristics__item-dimensions']/span[2]")
for output in outputs:  
    # replace the new line character with space and trim the text
    print(output.text.replace("\n", ' ').strip())

输出：

2 个房间

1 萨莱德贝恩

1,030 pi²（95,69 平方米）

截图：

【讨论】：