【问题标题】:Need to Scrape text with beautifulsoup [closed]需要用beautifulsoup 刮掉文字[关闭]
【发布时间】:2018-06-21 17:38:56
【问题描述】:

谁能帮助我如何使用beautifulsoup从这段代码中删除下面的文本。

“失望的教练伯特·范马尔维克表示,澳大利亚队在周四与丹麦队 1-1 战平后,如果要留在世界杯上,就必须找到难题的最后一部分。澳大利亚队队长迈尔·杰迪纳克(Mile Jedinak)在 VAR 辅助下判罚点球在克里斯蒂安·埃里克森揭幕战之后,在俄罗斯赢得了 Socceroos 的第一分,给了澳大利亚”

<a href="website" target="_blank" rel="nofollow" onmouseover="ddrivetip('<em>Thu, 21 Jun 2018</em> <br/> Disappointed coach Bert van Marwijk said Australia have to find the last part of the puzzle if they are to stay in the World Cup after a 1-1 draw with Denmark on Thursday. Australia captain Mile Jedinak hit a VAR-assisted penalty to earn the Socceroos first point in Russia after Christian Eriksens opener, giving Australia []')" ;="" onmouseout="hideddrivetip()">Australias Van Marwijk says last part of puzzle missing at World Cup</a>

【问题讨论】:

    标签: python html


    【解决方案1】:
    from bs4 import BeautifulSoup
    
    html = """
    <a href="website" 
        target="_blank" 
        rel="nofollow" 
        onmouseover="ddrivetip('<em>Thu, 21 Jun 2018</em> <br/> Disappointed coach Bert van Marwijk said Australia have to find the last part of the puzzle if they are to stay in the World Cup after a 1-1 draw with Denmark on Thursday. Australia captain Mile Jedinak hit a VAR-assisted penalty to earn the Socceroos first point in Russia after Christian Eriksens opener, giving Australia []')" ;="" 
        onmouseout="hideddrivetip()">
        Australias Van Marwijk says last part of puzzle missing at World Cup
    </a>
    """
    
    soup = BeautifulSoup(html, 'lxml')
    
    for a in soup.find_all('a'):
        attr_text = a.attrs['onmouseover'][43:-4]                                                                                                                                     
        print(attr_text + a.text)
    

    输出

    Disappointed coach Bert van Marwijk said Australia have to find the
    last part of the puzzle if they are to stay in the World Cup after a 
    1-1 draw with Denmark on Thursday. Australia captain Mile Jedinak hit 
    a VAR-assisted penalty to earn the Socceroos first point in Russia 
    after Christian Eriksens opener, giving Australia Australias Van 
    Marwijk says last part of puzzle missing at World Cup
    

    【讨论】:

      【解决方案2】:

      您可以使用a.attrs['onmouseover']

      例如:

      from bs4 import BeautifulSoup
      import re
      s = """<a href="website" target="_blank" rel="nofollow" onmouseover="ddrivetip('<em>Thu, 21 Jun 2018</em> <br/> Disappointed coach Bert van Marwijk said Australia have to find the last part of the puzzle if they are to stay in the World Cup after a 1-1 draw with Denmark on Thursday. Australia captain Mile Jedinak hit a VAR-assisted penalty to earn the Socceroos first point in Russia after Christian Eriksens opener, giving Australia []')" ;="" onmouseout="hideddrivetip()">Australias Van Marwijk says last part of puzzle missing at World Cup</a>"""
      soup = BeautifulSoup(s, "html.parser")
      val = soup.a.attrs['onmouseover']
      m = re.search("\((.*?)\)", val)
      if m:
          print(m.group())
      

      输出:

      ('<em>Thu, 21 Jun 2018</em> <br/> Disappointed coach Bert van Marwijk said Australia have to find the last part of the puzzle if they are to stay in the World Cup after a 1-1 draw with Denmark on Thursday. Australia captain Mile Jedinak hit a VAR-assisted penalty to earn the Socceroos first point in Russia after Christian Eriksens opener, giving Australia []')
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2012-03-05
        • 1970-01-01
        • 1970-01-01
        • 2021-03-20
        • 2022-01-14
        • 1970-01-01
        • 2021-01-04
        • 2019-10-28
        相关资源
        最近更新 更多