【问题标题】:Python 2.7 on Google App Engine, cannot use lxml.etreeGoogle App Engine 上的 Python 2.7,无法使用 lxml.etree
【发布时间】:2011-11-15 01:50:44
【问题描述】:

我一直在尝试在谷歌应用引擎中的 python 2.7 上使用 html5lib 和 lxml。但是当我运行以下代码时,它给了我一个错误提示“NameError:未定义全局名称'etree'”。不能在谷歌应用引擎上使用 lxml.etree 吗?还是我错过了什么?

app.yaml

application: testsite
version: 1
runtime: python27
api_version: 1
threadsafe: false

handlers:
- url: /.*
  script: index.py   

libraries:
- name: lxml
  version: "2.3"  # I thought this would allow me to use lxml.etree

index.py

from testhandler import TestHandler
application = webapp.WSGIApplication([('/', TestHandler)], debug=True)

testhandler.py

import urllib2
import html5lib
from html5lib import treebuilders
try:
    from lxml import etree
    print("running with lxml.etree")
except ImportError:
    try:
        # Python 2.5
        import xml.etree.cElementTree as etree
        print("running with cElementTree on Python 2.5+")
    except ImportError:
        try:
            # Python 2.5
            import xml.etree.ElementTree as etree
            print("running with ElementTree on Python 2.5+")
        except ImportError:
            try:
                # normal cElementTree install
                import cElementTree as etree
                print("running with cElementTree")
            except ImportError:
                try:
                    # normal ElementTree install
                    import elementtree.ElementTree as etree
                    print("running with ElementTree")
                except ImportError:
                    print("Failed to import ElementTree from any known place")

from google.appengine.ext import webapp

class TestHandler(webapp.RequestHandler):
    def get(self):
        f = urllib2.urlopen("http://www.yahoo.com/").read()
        doc = html5lib.parse(f, treebuilder='lxml')
        elems = doc.xpath("//*[local-name() = 'a']")
        self.response.out.write(len(elems))

错误

running with cElementTree on Python 2.5+
Status: 500 Internal Server Error
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Content-Length: 769

<pre>Traceback (most recent call last):
  File &quot;/usr/local/bin/google_appengine/google/appengine/ext/webapp/_webapp25.py&quot;,     line 701, in __call__
handler.get(*groups)
  File &quot;/home/test/testhandler.py&quot;, line 38, in get
    parser = html5lib.HTMLParser(tree= treebuilders.getTreeBuilder('lxml'))
  File &quot;/home/test/html5lib/html5parser.py&quot;, line 68, in __init__
    self.tree = tree(namespaceHTMLElements)
  File &quot;/home/test/html5lib/treebuilders/etree_lxml.py&quot;, line 176, in __init__
    builder = etree_builders.getETreeModule(etree, fullTree=fullTree)
NameError: global name 'etree' is not defined
</pre>

添加

不,我尝试了几种方法来创建一个 doc 对象,但没有运气。其中一种方法是,我尝试导入 from lxml.html import document_fromstring,这给了我这个错误。

Traceback (most recent call last):
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 4143, in _HandleRequest
    self._Dispatch(dispatcher, self.rfile, outfile, env_dict)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 4049, in _Dispatch
    base_env_dict=env_dict)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 616, in Dispatch
    base_env_dict=base_env_dict)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 3120, in Dispatch
    self._module_dict)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 3024, in ExecuteCGI
    reset_modules = exec_script(handler_path, cgi_path, hook)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2887, in ExecuteOrImportScript
    exec module_code in script_module.__dict__
  File "/home/yoo/eclipse_workspace/website_checker/src/index.py", line 5, in <module>
    from handlers.updatecheck import UpdateCheckHandler
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2503, in load_module
    return self.FindAndLoadModule(submodule, fullname, search_path)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2375, in FindAndLoadModule
    description)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2318, in LoadModuleRestricted
    description)
  File "/home/test/updatecheck.py", line 4, in <module>
    from lxml.html import document_fromstring
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2503, in load_module
    return self.FindAndLoadModule(submodule, fullname, search_path)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2375, in FindAndLoadModule
    description)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2318, in LoadModuleRestricted
    description)
  File "/usr/lib/python2.7/dist-packages/lxml/html/__init__.py", line 12, in <module>
    from lxml import etree
ImportError: cannot import name etree

根据错误,似乎应用引擎由于某种原因不允许我加载 etree 模块。我想将 xpath 与 lxml 一起使用,但我不能花太多时间来弄清楚这里发生了什么,也没有足够的 python 知识。所以我会尝试找到一种使用“simpletree”版本的方法。

f = urllib2.urlopen("http://www.yahoo.com/").read()
p = html5lib.HTMLParser()
doc = p.parse(f)
# do something with doc.childNodes
self.response.out.write(len(doc.childNodes))  

这不是一个好方法,但至少当我在实时谷歌应用引擎上测试时它有效。

【问题讨论】:

  • 什么版本的 HTML5lib?在 repo 中,出现错误的行不再是第 176 行,而且我看不到在当前版本中可能出现任何错误,因为名称将被定义,或者整个事情将因 ImportError 而失败。
  • 很抱歉没有尽快回复您。根据第 13 行 __version__ = "0.90" 的 html5lib/__init__.py,我认为版本是 0.90。我刚刚通过 pip install 获得了库,可能是旧版本吗?
  • 当我忘记在 app.yaml 中输入正确的条目时出现此错误,但我没有使用 2.3,而是使用了最新版本

标签: python google-app-engine


【解决方案1】:

你是否在本地安装了lxml?我之前有同样的错误 - 导入失败。你可以在这里下载lxml:http://pypi.python.org/pypi/lxml/

lxml 与 GAE 一起工作,这很棒。但现在确实没有任何文档或示例。

【讨论】:

  • 是的。我在本地机器上尝试了原始代码,它运行良好,但是当我将它上传到实时谷歌应用引擎时,他们给了我上面的错误。
【解决方案2】:

在 Windows 上,我遇到了这个问题,这是因为 python27 发行版不包含 lxml。您可以使用脚本 easy_install,但您必须编译给我带来麻烦的源代码。

使用我在 Google 论坛上找到的这篇文章:

https://groups.google.com/forum/?fromgroups=#!topic/comp.lang.python/Q8YeOIbn5Ds

但是,如果您想免去尝试从源代码构建它的痛苦,只需安装一个预编译的二进制文件,例如可从以下位置获得的二进制文件: http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml

只需从上述网站下载可执行文件并运行 *.exe,它就会停止所有必要的代码。

【讨论】:

    【解决方案3】:

    试试

    import lxml

    在测试处理程序的顶部

    【讨论】:

      【解决方案4】:

      使用 pip 安装:pip install lxml

      【讨论】:

        猜你喜欢
        • 2011-07-09
        • 1970-01-01
        • 2012-05-30
        • 2012-11-17
        • 2018-04-29
        • 2013-10-28
        • 1970-01-01
        • 2012-09-26
        • 2012-03-10
        相关资源
        最近更新 更多