【问题标题】:Errors with conditional etree lxml条件 etree lxml 的错误
【发布时间】:2011-10-08 13:24:45
【问题描述】:

如果介于 66 之间,我正在尝试删除所有内容:

我收到以下错误:TypeError: 'NoneType' 类型的参数不可迭代...如果 element.tag == 'answer' and '-66' in element.text:

这有什么问题?有什么帮助吗?

#!/usr/local/bin/python2.7
# -*- coding: UTF-8 -*- 

from lxml import etree

planhtmlclear_utf=u"""
<questionaire>
<question>
<questiontext>What's up?</questiontext>
<answer></answer>
</question>
<question>
<questiontext>Cool?</questiontext>
<answer>-66</answer>
</question>
</questionaire>

"""

html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
    for element in question.getchildren():
        if element.tag == 'answer' and '-66' in element.text:
            html.xpath('/questionaire')[0].remove(question)
print etree.tostring(html) 

【问题讨论】:

    标签: python lxml xml.etree


    【解决方案1】:

    element.text 在某些迭代中似乎为 None 。错误是说它不能通过“-66”查看无,所以首先检查 element.text 不是 None 像这样:

    html = etree.fromstring(planhtmlclear_utf)
    questions = html.xpath('/questionaire/question')
    for question in questions:
        for element in question.getchildren():   
            if element.tag == 'answer' and element.text and '-66' in element.text:
                html.xpath('/questionaire')[0].remove(question)
    print etree.tostring(html) 
    

    它在 xml 中失败的行是 &lt;answer&gt;&lt;/answer&gt; 标记之间没有文本。


    编辑关于组合标签问题的第二部分)

    你可以像这样使用BeautifulSoup

    from lxml import etree
    import BeautifulSoup
    
    planhtmlclear_utf=u"""
    <questionaire>
    <question>
    <questiontext>What's up?</questiontext>
    <answer></answer>
    </question>
    <question>
    <questiontext>Cool?</questiontext>
    <answer>-66</answer>
    </question>
    </questionaire>"""
    
    html = etree.fromstring(planhtmlclear_utf)
    questions = html.xpath('/questionaire/question')
    for question in questions:
        for element in question.getchildren():   
            if element.tag == 'answer' and element.text and '-66' in element.text:
                html.xpath('/questionaire')[0].remove(question)
    
    soup = BeautifulSoup.BeautifulStoneSoup(etree.tostring(html))
    print soup.prettify()
    

    打印:

    <questionaire>
     <question>
      <questiontext>
       What's up?
      </questiontext>
      <answer>
      </answer>
     </question>
    </questionaire>
    

    这是一个链接,您可以在其中下载BeautifulSoup module


    或者,以更紧凑的方式执行此操作:

    from lxml import etree
    import BeautifulSoup    
    
    # abbreviating to reduce answer length...
    planhtmlclear_utf=u"<questionaire>.........</questionaire>"
    
    html = etree.fromstring(planhtmlclear_utf)
    [question.getparent().remove(question) for question in html.xpath('/questionaire/question[answer/text()="-66"]')]
    print BeautifulSoup.BeautifulStoneSoup(etree.tostring(html)).prettify()
    

    【讨论】:

    • 哇,这真的很有帮助!非常感谢!
    • 也许你可以帮我更进一步:-P 我现在得到了输出:What's up? .......所以答案没有完全显示出来......为什么会这样?
    【解决方案2】:

    检查element.text 是否为None 的另一种方法是优化您的XPath:

    questions = html.xpath('/questionaire/question[answer/text()="-66"]')
    for question in questions:
        question.getparent().remove(question)
    

    括号[...] 的意思是“这样”。所以

    question                          # find all question elements
    [                                 # such that 
      answer                          # it has an answer subelement
        /text()                       # whose text 
      =                               # equals
      "-66"                           # "-66"
    ]
    

    【讨论】:

    • 这解决了这个问题,他没有触及其他答案元素......通过上面的例子,我得到了答案元素......但我不知道为什么......无论如何这个解决方案有效!
    • 不抱歉...他正在剪切空答案标签...为什么总是会发生这种情况?
    • 我不确定我是否理解这个问题。你的意思是&lt;answer&gt;&lt;/answer&gt; 被缩短为&lt;answer/&gt;?没关系;它们是等价的。
    • 是的......这就是我的意思......但我能做些什么来防止这种情况发生?因为我需要正确格式化的标签..?非常感谢!
    • import lxml.html as lh。然后lh.tostring(html).
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-04-24
    • 2013-01-09
    • 2012-08-17
    • 1970-01-01
    • 2016-04-09
    • 2021-05-29
    相关资源
    最近更新 更多