【问题标题】:lxml cssselect Parsinglxml cssselect解析
【发布时间】:2011-06-22 01:00:44
【问题描述】:

我有一个包含以下数据的文档:

<div class="ds-list">
    <b>1. </b> 
    A domesticated carnivorous mammal 
    <i>(Canis familiaris)</i> 
    related to the foxes and wolves and raised in a wide variety of breeds.
</div>

我想得到ds-list 类中的所有内容(没有&lt;b&gt;&lt;i&gt; 标签)。目前我的代码是doc.cssselect('div.ds-list'),但所有这些都是&lt;b&gt; 之前的换行符。我怎样才能让它做我想做的事?

【问题讨论】:

    标签: python html parsing css-selectors lxml


    【解决方案1】:

    也许您正在寻找text_content 方法?:

    import lxml.html as lh
    content='''\
    <div class="ds-list">
        <b>1. </b> 
        A domesticated carnivorous mammal 
        <i>(Canis familiaris)</i> 
        related to the foxes and wolves and raised in a wide variety of breeds.
    </div>'''
    doc=lh.fromstring(content)
    for div in doc.cssselect('div.ds-list'):
        print(div.text_content())
    

    产量

    1.  
    A domesticated carnivorous mammal 
    (Canis familiaris) 
    related to the foxes and wolves and raised in a wide variety of breeds.
    

    【讨论】:

      【解决方案2】:
      doc.cssselect("div.ds-list").text_content()
      

      【讨论】:

        猜你喜欢
        • 2011-08-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2011-10-15
        • 1970-01-01
        • 2012-06-10
        • 1970-01-01
        相关资源
        最近更新 更多