【发布时间】:2011-04-21 16:18:02
【问题描述】:
我正在使用 BeautifulSoup 从Hacker News 中提取新闻故事(只是标题),并且到目前为止 -
import urllib2
from BeautifulSoup import BeautifulSoup
HN_url = "http://news.ycombinator.com"
def get_page():
page_html = urllib2.urlopen(HN_url)
return page_html
def get_stories(content):
soup = BeautifulSoup(content)
titles_html =[]
for td in soup.findAll("td", { "class":"title" }):
titles_html += td.findAll("a")
return titles_html
print get_stories(get_page()
)
然而,当我运行代码时,它给出了一个错误-
Traceback (most recent call last):
File "terminalHN.py", line 19, in <module>
print get_stories(get_page())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe2' in position 131: ordinal not in range(128)
如何让它工作?
【问题讨论】:
标签: python beautifulsoup