【问题标题】:How to extract paragraph text in python using lxml from html file?如何使用 html 文件中的 lxml 在 python 中提取段落文本?
【发布时间】:2019-01-31 00:17:20
【问题描述】:

我正在尝试提取段落但得到[<Element p at 0x7f8c81a26548>]而不是段落。如何提取段落?

Selector_1 = "div.bloco-imovel-texto p"
tree.cssselect(Selector_1)
<div class="bloco-imovel-texto">
  <h3 class="lbl_description">
    Description </h3>
  <p>At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia
    animi, id est laborum et dolorum fugaEt harum quidem rerum facilis est et expedita distinctio.Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est,
    omnis dolor repellendus.</p>
</div>

【问题讨论】:

    标签: python html lxml.html


    【解决方案1】:

    应该是

    tree.cssselect(Selector_1)[0].text
    

    【讨论】:

      猜你喜欢
      • 2011-06-29
      • 2015-02-22
      • 2019-08-14
      • 2016-05-11
      • 2019-05-05
      • 2020-04-13
      • 2021-04-06
      • 2013-08-25
      • 1970-01-01
      相关资源
      最近更新 更多