【发布时间】:2021-07-07 04:56:07
【问题描述】:
我正在尝试从显示有效出价列表的表格中提取数据:https://purchasing.alabama.gov/active-statewide-contracts/。我是一个 Scrapy 新手,对于为什么我没有输出有点困惑。此外,如何下载表格中的文件?到目前为止,我有以下代码:
import scrapy
class AlabamaSpider(scrapy.Spider):
name = 'alabama'
allowed_domains = ['purchasing.alabama.gov']
start_urls = ['https://purchasing.alabama.gov/active-statewide-contracts/']
def start_requests(self):
urls = ['https://purchasing.alabama.gov/active-statewide-contracts/']
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
for row in response.xpath('//*[@class="table table-bordered table-responsive-sm dataTable no-footer"]//tbody//tr'):
yield {
'Description': row.xpath('td[@class="col-sm-5 sorting_asc"]//text()').extract_first(),
'T-NBR': row.xpath('td[@class="col-sm-1 sorting"]/a/text()').extract_first(),
'Begin Date': row.xpath('td[@class="col-sm-1 sorting"]//text()').extract_first(),
'End Date': row.xpath('td[@class="col-sm-1 sorting"]//text()').extract_first(),
'Buyer Name': row.xpath('td[@class="col-sm-3 sorting"]/a/text()').extract_first(),
'Vendor Websites': row.xpath('td[@class="col-sm-1 sorting"]/a/text()').extract_first(),
}
对此的任何帮助将不胜感激!
谢谢!
【问题讨论】:
-
请检查解决方案,如果您遇到任何问题,请告诉我。
-
非常感谢@Shivam 的提示和解决方案!最后一件事,关于如何下载“T-BNR”列下的文件(它们都是 pdf 文件)的任何指示?再次感谢!
-
我想这个答案可以帮助你stackoverflow.com/questions/57245315/…
标签: python web-scraping xpath scrapy