如何使用scrapy提取表格中的项目答案

【问题标题】：How to extract items inside a table using scrapy如何使用scrapy提取表格中的项目
【发布时间】：2017-10-23 15:27:35
【问题描述】：

我想提取下面链接中表格中列出的所有函数：python functions list

我已经尝试使用 chrome 开发者控制台来获取要在文件 spider.py 中使用的确切 xpath，如下所示：

$x('//*[@id="built-in-functions"]/table[1]/tbody//a/@href')

但这会返回所有href的列表（我认为xpath表达式所指的内容）。

我相信我需要从这里提取文本，但将 /text() 附加到上述 xpath 不会返回任何内容。有人可以帮我从表中提取函数名称吗？

【问题讨论】：

$x('//*[@id="built-in-functions"]//a').forEach(elt => { console.log(elt.href);} )

【解决方案1】：

我认为这应该可以解决问题

response.css('.docutils .reference .pre::text').extract()

它的非精确 xpath 等价物（但在这种情况下也适用）将是：

response.xpath('//table[contains(@class, "docutils")]//*[contains(@class, "reference")]//*[contains(@class, "pre")]/text()').extract()

【讨论】：

【解决方案2】：

试试这个：

for td in response.css("#built-in-functions > table:nth-child(4) td"):
    td.css("span.pre::text").extract_first()

【讨论】：