【发布时间】:2014-02-20 10:10:07
【问题描述】:
这是一个很奇怪的错误,让我们看看细节:
ts.py 文件:
#-*- coding: utf-8 -*-
import requests
from lxml import html
headers = {
'Host':'www.baidu.com',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36',
}
def get_html(url,enable_proxy=None):
r = requests.get(url,headers=headers)
parser = html.HTMLParser(encoding='utf-8')
return html.document_fromstring(r.text, parser=parser)
p = get_html('http://www.baidu.com')
print p.xpath(u'//*[@id="setf"]/text()')[0].encode('utf-8')
如果我只执行 ts.py,lxml 就可以完美运行。
但是!当我将get_html 放入另一个文件时出现错误,请参见:
ts.py:
#-*- coding: utf-8 -*-
import requests
from util import get_html
p = get_html('http://www.baidu.com')
print p.xpath(u'//*[@id="setf"]/text()')[0].encode('utf-8')
util.py:
#-*- coding: utf-8 -*-
import requests
from lxml import html
headers = {
'Host':'www.baidu.com',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36',
}
def get_html(url, enable_proxy=None):
r = requests.get(url,headers=headers)
parser = html.HTMLParser(encoding='uft-8')
return html.document_fromstring(r.text, parser=parser)
tun ts.py 输出:
Traceback (most recent call last):
File "C:\Users\mithril\Desktop\1\ts.py", line 8, in <module>
p = get_html('http://www.baidu.com')
File "C:\Users\mithril\Desktop\1\util.py", line 15, in get_html
parser = html.HTMLParser(encoding='uft-8')
File "E:\Python27\lib\site-packages\lxml\html\__init__.py", line 1662, in __init__
super(HTMLParser, self).__init__(**kwargs)
File "parser.pxi", line 1597, in lxml.etree.HTMLParser.__init__ (src\lxml\lxml.etree.c:99825)
File "parser.pxi", line 792, in lxml.etree._BaseParser.__init__ (src\lxml\lxml.etree.c:92549)
LookupError: unknown encoding: 'uft-8'
我的环境:
- Windows 7 x64
- Python 2.7
- lxml-3.3.1 来自here
我测试了 python 2.7 32 位和 64 位,结果相同。
【问题讨论】:
-
uft-8不是编码。utf-8是。
标签: python python-2.7 utf-8 character-encoding lxml