【发布时间】:2018-12-05 01:31:17
【问题描述】:
我是 python 新手,我正在尝试将抓取数据转换为 CSV 文件,但没有成功。
代码如下:
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
import os
import random
import re
from itertools import cycle
def cleanhtml(raw_html):
cleanr = re.compile('<.*?>') #cleaning the strings from these terms
cleantext = re.sub(cleanr, '', raw_html)
return cleantext
def scrape(url, filename, number_id):
"""
This function scrapes a web page looking for text inside its html structure and saves it in .txt file.
So it works only for static content, if you need text in a dynamic part of the web page (e.g. a banner)
look at the other file. Pay attention that the retrieved text must be filtered out
in order to keep only the part you need.
url: url to scrape
filename: name of file where to store text
number_id: itis appended to the filename, to distinguish different filenames
"""
#here there is a list of possible user agents
user_agent = random.choice(user_agent_list)
req = Request(url, headers={'User-Agent': user_agent})
page = urlopen(req).read()
# parse the html using beautiful soup and store in variable 'soup'
soup = BeautifulSoup(page, "html.parser")
row = soup.find_all(class_="row")
for element in row:
viaggio = element.find_all(class_="nowrap")
Partenza = viaggio[0]
Ritorno = viaggio[1]
Viaggiatori = viaggio[2]
Costo = viaggio[3]
Title = element.find(class_="taglist bold")
Content = element.find("p")
Destination = Title.text
Review = Content.text
Departure = Partenza.text
Arrival = Ritorno.text
Travellers = Viaggiatori.text
Cost = Costo.text
TuristiPerCasoList = [Destination, Review, Departure, Arrival, Travellers, Cost]
print(TuristiPerCasoList)
到这里为止,一切正常。现在我必须把它变成一个 CSV 文件。 我试过这个:
import csv
with open('turistipercaso','w') as file:
writer = csv.writer(file)
writer.writerows(TuristiPerCasoList)
但它不会返回 CSV 文件中的任何内容。 有人可以帮助我了解如何将其转换为 CSV 文件吗?
【问题讨论】:
-
TuristiPerCasoList的最后一个打印是空的吗?
标签: python web-scraping beautifulsoup export-to-csv