条件 etree lxml 的错误答案

【问题标题】：Errors with conditional etree lxml条件 etree lxml 的错误
【发布时间】：2011-10-08 13:24:45
【问题描述】：

如果介于 66 之间，我正在尝试删除所有内容：

我收到以下错误：TypeError: 'NoneType' 类型的参数不可迭代...如果 element.tag == 'answer' and '-66' in element.text:

这有什么问题？有什么帮助吗？

#!/usr/local/bin/python2.7
# -*- coding: UTF-8 -*- 

from lxml import etree

planhtmlclear_utf=u"""
<questionaire>
<question>
<questiontext>What's up?</questiontext>
<answer></answer>
</question>
<question>
<questiontext>Cool?</questiontext>
<answer>-66</answer>
</question>
</questionaire>

"""

html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
    for element in question.getchildren():
        if element.tag == 'answer' and '-66' in element.text:
            html.xpath('/questionaire')[0].remove(question)
print etree.tostring(html)

【问题讨论】：

标签： python lxml xml.etree

【解决方案1】：

element.text 在某些迭代中似乎为 None 。错误是说它不能通过“-66”查看无，所以首先检查 element.text 不是 None 像这样：

html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
    for element in question.getchildren():   
        if element.tag == 'answer' and element.text and '-66' in element.text:
            html.xpath('/questionaire')[0].remove(question)
print etree.tostring(html)

它在 xml 中失败的行是 <answer></answer> 标记之间没有文本。

编辑（关于组合标签问题的第二部分）：

你可以像这样使用BeautifulSoup：

from lxml import etree
import BeautifulSoup

planhtmlclear_utf=u"""
<questionaire>
<question>
<questiontext>What's up?</questiontext>
<answer></answer>
</question>
<question>
<questiontext>Cool?</questiontext>
<answer>-66</answer>
</question>
</questionaire>"""

html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
    for element in question.getchildren():   
        if element.tag == 'answer' and element.text and '-66' in element.text:
            html.xpath('/questionaire')[0].remove(question)

soup = BeautifulSoup.BeautifulStoneSoup(etree.tostring(html))
print soup.prettify()

打印：

<questionaire>
 <question>
  <questiontext>
   What's up?
  </questiontext>
  <answer>
  </answer>
 </question>
</questionaire>

这是一个链接，您可以在其中下载BeautifulSoup module。

或者，以更紧凑的方式执行此操作：

from lxml import etree
import BeautifulSoup    

# abbreviating to reduce answer length...
planhtmlclear_utf=u"<questionaire>.........</questionaire>"

html = etree.fromstring(planhtmlclear_utf)
[question.getparent().remove(question) for question in html.xpath('/questionaire/question[answer/text()="-66"]')]
print BeautifulSoup.BeautifulStoneSoup(etree.tostring(html)).prettify()

【讨论】：

哇，这真的很有帮助！非常感谢！
也许你可以帮我更进一步：-P 我现在得到了输出：What's up? .......所以答案没有完全显示出来......为什么会这样？

【解决方案2】：

检查element.text 是否为None 的另一种方法是优化您的XPath：

questions = html.xpath('/questionaire/question[answer/text()="-66"]')
for question in questions:
    question.getparent().remove(question)

括号[...] 的意思是“这样”。所以

question                          # find all question elements
[                                 # such that 
  answer                          # it has an answer subelement
    /text()                       # whose text 
  =                               # equals
  "-66"                           # "-66"
]

【讨论】：

这解决了这个问题，他没有触及其他答案元素......通过上面的例子，我得到了答案元素......但我不知道为什么......无论如何这个解决方案有效！
不抱歉...他正在剪切空答案标签...为什么总是会发生这种情况？
我不确定我是否理解这个问题。你的意思是<answer></answer> 被缩短为<answer/>？没关系;它们是等价的。
是的......这就是我的意思......但我能做些什么来防止这种情况发生？因为我需要正确格式化的标签..？非常感谢！
import lxml.html as lh。然后lh.tostring(html).