【发布时间】:2011-03-10 15:53:36
【问题描述】:
我编写了以下试用代码,以从欧洲议会取回立法法案的标题。
import urllib2
from BeautifulSoup import BeautifulSoup
search_url = "http://www.europarl.europa.eu/sides/getDoc.do?type=REPORT&mode=XML&reference=A7-2010-%.4d&language=EN"
for number in xrange(1,10):
url = search_url % number
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
title = soup.findAll("title")
print title
但是,每当我运行它时,我都会收到以下错误:
Traceback (most recent call last):
File "<stdin>", line 20, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 70: ordinal not in range(128)
我已将其范围缩小到 BeautifulSoup 无法读取循环中的第四个文档。谁能向我解释我做错了什么?
致以诚挚的问候
托马斯
【问题讨论】:
标签: python loops beautifulsoup web-scraping