【问题标题】:From scraping to a CSV file从抓取到 CSV 文件
【发布时间】:2018-12-05 01:31:17
【问题描述】:

我是 python 新手,我正在尝试将抓取数据转换为 CSV 文件,但没有成功。

代码如下:

from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
import os
import random
import re
from itertools import cycle

def cleanhtml(raw_html):
  cleanr = re.compile('<.*?>') #cleaning the strings from these terms
  cleantext = re.sub(cleanr, '', raw_html)
  return cleantext

def scrape(url, filename, number_id):
    """
    This function scrapes a web page looking for text inside its html structure and saves it in .txt file. 
    So it works only for static content, if you need text in a dynamic part of the web page (e.g. a banner) 
    look at the other file. Pay attention that the retrieved text must be filtered out 
    in order to keep only the part you need. 

    url: url to scrape
    filename: name of file where to store text
    number_id: itis appended to the filename, to distinguish different filenames
    """
    #here there is a list of possible user agents

    user_agent = random.choice(user_agent_list)
    req = Request(url, headers={'User-Agent': user_agent})
    page = urlopen(req).read()

    # parse the html using beautiful soup and store in variable 'soup'
    soup = BeautifulSoup(page, "html.parser")

    row = soup.find_all(class_="row")

    for element in row:
        viaggio = element.find_all(class_="nowrap")

        Partenza = viaggio[0]
        Ritorno = viaggio[1]
        Viaggiatori = viaggio[2]
        Costo = viaggio[3]

        Title = element.find(class_="taglist bold")
        Content = element.find("p")



        Destination = Title.text
        Review = Content.text
        Departure = Partenza.text
        Arrival = Ritorno.text
        Travellers = Viaggiatori.text
        Cost = Costo.text


        TuristiPerCasoList = [Destination, Review, Departure, Arrival, Travellers, Cost] 
        print(TuristiPerCasoList)

到这里为止,一切正常。现在我必须把它变成一个 CSV 文件。 我试过这个:

    import csv

    with open('turistipercaso','w') as file:
    writer = csv.writer(file)
    writer.writerows(TuristiPerCasoList)

但它不会返回 CSV 文件中的任何内容。 有人可以帮助我了解如何将其转换为 CSV 文件吗?

【问题讨论】:

  • TuristiPerCasoList的最后一个打印是空的吗?

标签: python web-scraping beautifulsoup export-to-csv


【解决方案1】:

在每次迭代中,您都会重新分配 TuristiPerCasoList 值。
你真正想要的是liststrings 的list,其中字符串是特定单元格的值,第二个列表包含行的值,第一个列表包含所有行。

要实现这一点,您应该在主列表中附加一个表示行的列表:

# instead of
TuristiPerCasoList = [Destination, Review, Departure, Arrival, Travellers, Cost]
# use
TuristiPerCasoList.append([Destination, Review, Departure, Arrival, Travellers, Cost])

【讨论】:

    猜你喜欢
    • 2016-05-24
    • 2018-06-06
    • 2010-10-29
    • 1970-01-01
    • 2020-12-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多