如何遍历链接的 csv 文件以使用 BeautifulSoup 和请求抓取网站而不获取 requests.exceptions.InvalidSchema？答案

【问题标题】：How to loop through a csv file of links to scrape a website using BeautifulSoup and requests and not get requests.exceptions.InvalidSchema?如何遍历链接的 csv 文件以使用 BeautifulSoup 和请求抓取网站而不获取 requests.exceptions.InvalidSchema？
【发布时间】：2020-05-02 07:56:00
【问题描述】：

总的来说，我对编码很陌生，感谢社区的任何支持！

我想做的事： 我有一个 csv 文件，其中包含指向各种产品的链接，我想获取这些产品的标题并将其写回相同或另一个 csv 文件（并不重要）。为此，我尝试导入 csv 文件（可以正常工作），将每一行写入一个列表（也可以），然后选择列表中的每个值以提取产品标题。

我的问题是： 单个链接的抓取有效，所以我猜问题出在列表、循环、请求组合中。如果我运行以下代码，则会收到错误 requests.exceptions.InvalidSchema。

from bs4 import BeautifulSoup
import requests
import csv

f = open('three_links.csv', 'r')
reader = csv.reader(f)
links = []

for row in reader:
    links.append(row)

for link in links:
    response = requests.get(link)
    soup = BeautifulSoup(response.text, 'html.parser')
    title = print(soup.find(class_='sidebar-product-name').text.replace("\n","").replace(" ",""))

提前非常感谢！

【问题讨论】：

你能展示一下你的csv文件是怎么写的吗？

标签： python csv python-requests

【解决方案1】：

问题是从csv文件中读取的每一行都是一个列表，所以

response = requests.get(link)

其实有点像

response = requests.get(['https://www.example.com', 'something', 'something else'])

您可以这样做（假设链接在 csv 的第一列中：

response = requests.get(link[0])

【讨论】：