【问题标题】:how to upload scraped data through beautifulsoup4 to csv file如何通过beautifulsoup4将抓取的数据上传到csv文件
【发布时间】:2020-12-23 00:57:52
【问题描述】:
import requests
from bs4 import BeautifulSoup
import csv

#import question from stack overflow
htp='https://stackoverflow.com/questions?tab=newest&page=2'

response=requests.get(htp).text
soup=Beautiful Soup(response,"HTML.parser")
#class question-summary...

question=soup.select("question-summary")

#open csv file....

with open("new.csv","w") as file:
write=csv.writer(file)
write.writerow(["heading","summary","votes"])

for ques in question:
    print(ques.select_one("question-hyperlink").getText())
    print(ques.select_one("excerpt").getText())
    print(ques.select_one("vote-count-post").getText())
    
    #problem area
    write.writerow...???( what to do)


    

【问题讨论】:

  • Andrej Kesely 你能帮帮我吗??
  • 是的,我已经发布了答案。

标签: python-3.x web-scraping beautifulsoup export-to-csv


【解决方案1】:

要将数据保存为 CSV,您可以使用以下示例:

import csv
import requests
from bs4 import BeautifulSoup

#import question from stack overflow
htp='https://stackoverflow.com/questions?tab=newest&page=2'

response=requests.get(htp).text
soup=BeautifulSoup(response, "html.parser")

#class question-summary...
question=soup.select(".question-summary")

#open csv file....
with open("new.csv", "w") as file:
    write=csv.writer(file)
    write.writerow(["heading","summary","votes"])

    for ques in question:
        heading = ques.select_one(".question-hyperlink").getText()
        summary = ques.select_one(".excerpt").getText()
        votes = ques.select_one(".vote-count-post").getText()

        print(heading)
        print(summary)
        print(votes)
        
        write.writerow([heading, summary, votes])

创建 new.csv(来自 LibreOffice 的屏幕截图):

【讨论】:

  • 谢谢.....我应该用哪个来报废。Beautifulsoup,硒还是scrapy?
  • @learner 视情况而定——Selenium 使用真正的浏览器,所以它可以执行 javascript(也许这就是你想要的)。 BeautifulSoup 做不到,但它速度更快,资源消耗更少。我没有使用 Scrapy 的经验...
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2018-06-06
  • 1970-01-01
  • 2020-10-11
  • 2017-06-18
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多