【问题标题】:How do i scrape a paragraph which follows another paragraph with matching string?我如何用匹配的字符串刮掉另一个段落之后的段落?
【发布时间】:2016-05-07 11:36:28
【问题描述】:

我想抓取一个段落,该段落跟在另一个带有特定文本 "Interested String ZZZ" 的段落之后

例如:

<p align="center"><strong><span style="text-decoration: underline;">Interested String ZZZ</span></strong></p>
<p style="text-align: justify;"><span style="font-size: small;">This is the paragraph string that i want to scrape out</span></p>

我如何在 python 中做到这一点?

【问题讨论】:

    标签: python web-scraping beautifulsoup


    【解决方案1】:

    使用text参数通过文本内容匹配元素,然后使用find_next_sibling()获取下一个&lt;p&gt;兄弟元素:

    >>> from bs4 import BeautifulSoup
    >>> raw = '''<div>
    ... <p align="center"><strong><span style="text-decoration: underline;">Interested String ZZZ</span></strong></p>
    ... <p style="text-align: justify;"><span style="font-size: small;">This is the paragraph string that i want to scrape out</span></p>
    ... </div>'''
    ... 
    >>> soup = BeautifulSoup(raw, "lxml")
    >>> [s.find_next_sibling("p").string for s in soup("p", text="Interested String ZZZ")]
    [u'This is the paragraph string that i want to scrape out']
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-03-30
      • 1970-01-01
      • 2016-05-01
      • 1970-01-01
      • 2015-09-09
      • 2021-06-25
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多