【发布时间】:2011-12-30 12:20:15
【问题描述】:
我正在从网站上抓取一些信息,其中一个字段存储在我的列表中,如下所示:[u'Dover Park', u'30 \u2013 38 Dover Rise']
\2013 应该是 –。
尝试写入 .csv 文件时,我收到以下错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 3: ordinal not in range(128).
这是我的代码:
import re
import mechanize
from BeautifulSoup import BeautifulSoup
url = 'http://www.dummy.com'
br = mechanize.Browser()
page = br.open(url)
html = page.read()
html = html.decode('utf-8')
soup = BeautifulSoup(html)
table = soup.find('table', width='800')
property_list = []
for row in table.findAll('tr')[1:]:
for field in row.findAll('td', width='255'):
property_list.append(field.findAll(text=True))
for condo in property_list:
for field in condo:
if field == ' ':
condo.remove(field)
for condo in property_list:
if len(condo) < 2:
condo.append(condo[0])
if condo[1]:
condo[1] = condo[1].replace(',','')
for condo in property_list:
for field in condo:
field = field.encode('utf-8')
import csv
myfile = open('condos.csv', 'wb')
try:
wr = csv.writer(myfile)
wr.writerow(('Name','Address'))
for condo in property_list:
print condo
wr.writerow(condo)
finally:
myfile.close()
【问题讨论】:
-
这是 urlencoded 数据,您可以使用该库对其进行解码。对于您的编码问题,将您的字符串显式编码为 utf8。
-
我尝试了各种组合,但我似乎仍然无法让它发挥作用。请查看更新后的问题。
标签: python csv beautifulsoup mechanize