我想使用 Python 从网页中提取 CSV 文件。网页抓取答案

【问题标题】：I want to extract the CSV file from the webpage using Python. WEBSCRAPING我想使用 Python 从网页中提取 CSV 文件。网页抓取
【发布时间】：2020-07-21 14:34:52
【问题描述】：

我想从此网页获取 .csv 文件或 .xlsx 文件。我想过使用网页抓取，使用beautifulsoup，但这似乎效率低下。我希望能够编写一个函数，当调用此网页时，代码会找到指向 CSV 文件的链接并将 CSV 文件返回给我。

这样我就可以对 CSV 文件进行分析。

请有人在这里帮助我！

这是链接：https://data.london.gov.uk/dataset/recorded_crime_rates

【问题讨论】：

您能在元素中搜索具有 'aria-label="Download crime rates.csv" 的元素并获取 href 值吗？然后在地址前加上'data.london.gov.uk'。然后你会得到'data.london.gov.uk//download/recorded_crime_rates/…'

标签： python csv xlsx

【解决方案1】：

使用 urllib 库获取网页的源代码，.

这似乎有效：

import urllib.request, urllib.error, urllib.parse

url = 'https://data.london.gov.uk/dataset/recorded_crime_rates'
csvfile = r"C:\Tmp\CrimeRates.csv"

#open main page
response = urllib.request.urlopen(url)
webContent = response.read()
wc = str(webContent)

#get csv URL
i = wc.find(r"crime%20rates.csv")
i2 = wc.find("/download/recorded_crime_rates", i-200)
csvURL = "https://data.london.gov.uk" + wc[i2:i+17]
print(csvURL)

#get csv
csvresp = urllib.request.urlopen(csvURL)
csvdata = str(csvresp.read())
print(len(csvdata), "bytes")

#save csv to file
print("Saving To", csvfile)
f = open(csvfile,"w")
f.write(csvdata.replace(r"\r\n","\n"))
f.close()

【讨论】：

谢谢！图例
请确认此答案以将帖子从“无答案”列表中删除。谢谢。