【问题标题】:writing beautiful soup output to CSV将漂亮的汤输出写入 CSV
【发布时间】:2016-08-13 19:19:59
【问题描述】:

我想将价格和相应地址写入 Excel 中的 CSV 文件。到目前为止,我有这段代码,它给出了照片中显示的输出。

我想要的是第一列价格和第二列地址。

[![from bs4 import BeautifulSoup
import requests 
import csv


number = "1"
url = "http://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&v=list&134=1&nofilters=1&originalsidebar=1&key=1654466070&page=" + number + "&sort_order=prop_default&rptpath=350-5748-3399-"
r= requests.get(url)
soup = BeautifulSoup(r.content)


output_file= open("output.csv","w")

price = soup.find_all("div",{"class":"property-card-price-container"})

address = soup.find_all("div",{"class":"property-card-subtitle"})


n = 1
while n != 150:
    b = (price\[n\].text)
    b = str(b)
    n = n + 1
    output_file.write(b)

output_file.close()][1]][1]

【问题讨论】:

    标签: python


    【解决方案1】:

    也许是这样的?

    from bs4 import BeautifulSoup
    import requests 
    import csv
    ....
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    price = soup.find_all("div",{"class":"property-card-price-container"})
    address = soup.find_all("div",{"class":"property-card-subtitle"})
    
    dataset = [(x.text, y.text) for x,y in zip(price, address)]
    
    with open("output.csv", "w", newline='') as csvfile:
        writer = csv.writer(csvfile)
        for data in dataset[:150]: #truncate to 150 rows
            writer.writerow(data)
    

    【讨论】:

      【解决方案2】:

      您的代码存在一些问题。将价格和地址放入单独的列表中可能会导致站点切换项目的顺序等并将它们混淆。当像这样抓取条目时,首先找到较大的封闭容器很重要,然后从那里缩小范围。

      很遗憾,您提供的网址不再有效。因此,我刚刚浏览到此示例的另一组列表:

      from bs4 import BeautifulSoup
      import requests
      import csv
      
      url = 'http://www.trademe.co.nz/property/residential-property-for-sale'
      url += '/waikato/view-list'
      
      r = requests.get(url)
      soup = BeautifulSoup(r.content, 'html5lib')
      
      with open('output.csv', 'w', newline='') as csvfile:
      
          propertyWriter = csv.writer(csvfile, quoting=csv.QUOTE_ALL)
      
          for listing in soup.find_all('div',
                                       {'class': 'property-list-view-card'}):
              price = listing.find_all('div',
                                       {'class': 'property-card-price-container'})
              address = listing.find_all('div',
                                         {'class': 'property-card-subtitle'})
      
              propertyWriter.writerow([price[0].text.strip(),
                                       address[0].text.strip()])
      

      【讨论】:

        猜你喜欢
        • 2015-11-19
        • 2020-01-20
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多