【发布时间】:2014-05-24 04:40:36
【问题描述】:
将具有多个类别的页面抓取到 csv 中。成功将第一个类别放入一列,但第二列数据未写入 csv。我正在使用的代码:
import urllib2
import csv
from bs4 import BeautifulSoup
url = "http://digitalstorage.journalism.cuny.edu/sandeepjunnarkar/tests/jazz.html"
page = urllib2.urlopen(url)
soup_jazz = BeautifulSoup(page)
all_years = soup_jazz.find_all("td",class_="views-field views-field-year")
all_category = soup_jazz.find_all("td",class_="views-field views-field-category-code")
with open("jazz.csv", 'w') as f:
csv_writer = csv.writer(f)
csv_writer.writerow([u'Year Won', u'Category'])
for years in all_years:
year_won = years.string
if year_won:
csv_writer.writerow([year_won.encode('utf-8')])
for categories in all_category:
category_won = categories.string
if category_won:
csv_writer.writerow([category_won.encode('utf-8')])
它将列标题而不是 category_won 写入第二列。
根据您的建议,我将其编译为:
with open("jazz.csv", 'w') as f:
csv_writer = csv.writer(f)
csv_writer.writerow([u'Year Won', u'Category'])
for years, categories in zip(all_years, all_category):
year_won = years.string
category_won = categories.string
if year_won and category_won:
csv_writer.writerow([year_won.encode('utf-8'), category_won.encode('utf-8')])
但我现在收到以下错误:
csv_writer.writerow([year_won.encode('utf-8'), category_won.encode('utf-8')]) ValueError: 对已关闭文件的 I/O 操作
【问题讨论】:
标签: python csv beautifulsoup