【发布时间】:2022-01-06 20:55:48
【问题描述】:
1 - 当我检查 csv 文件时,我只能从最后一个链接 (Tugende) 中找到数据。但是当我打印数据时,我得到了我想要的一切。 如何获取 csv 文件中的所有数据?
2 - 对于 'source' 变量,我如何才能仅从中获取文章链接并将其添加到 csv 文件中。
import requests
from bs4 import BeautifulSoup as bs
import csv
url = "https://digestafrica.com/companies/{}"
startups = ['OBM-Education','Crafty-Workshop','Planet42','Paylend','Tugende']
for startup in startups:
u = url.format(startup)
html_text = requests.get(u).text
soup = bs(html_text, 'lxml')
list1 = soup.find_all('div', class_='d-flex flex-wrap content mt-24 border p-2 border-dark')
source1 =soup.find_all('div',class_='col-md-2 mt-3 mt-lg-0')
file = open('funding.csv', 'w',newline='')
writer = csv.writer(file)
mama = (['Name', 'Type', 'date','amount','investors'])
writer.writerow(mama)
for L in list1:
name1 = L.find('span', class_="line-height-1").text
amount1 = L.find('div', class_='p-0').text.replace('Amount','').strip()
date1 = L.find('span', class_="pt-0").text
funding_type1 = L.find('div', class_="col-md-2 mt-2 mt-lg-0").text.replace('Funding Round','')
investor1 = L.find('div',class_='col-md-3 mt-3 mt-lg-0').text.replace('investors','')
source =L.find('div',class_="col-md-2 mt-3 mt-lg-0")
print(name1, funding_type1, date1,amount1, investor1)
writer.writerow([name1, funding_type1, date1,amount1, investor1])
file.close()
【问题讨论】:
标签: python csv web-scraping scrape write