【发布时间】:2010-10-02 13:40:04
【问题描述】:
使用 BeautifulSoup 3.1.0.1 和 Python 2.5.2,并尝试解析法语网页。但是,一旦我调用 findAll,我就会收到以下错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1146: ordinal not in range(128)
下面是我目前正在运行的代码:
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://fr.encarta.msn.com/encyclopedia_761561798/Paris.html")
soup = BeautifulSoup(page, fromEncoding="latin1")
r = soup.findAll("table")
print r
有人知道为什么吗?
谢谢!
更新:根据要求,以下是完整的 Traceback
Traceback (most recent call last):
File "[...]\test.py", line 6, in <module>
print r
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1146-1147: ordinal not in range(128)
【问题讨论】:
标签: python encoding screen-scraping beautifulsoup