【问题标题】:Get Structured Data from HTML using python and beautiful soup使用 python 和美丽的汤从 HTML 中获取结构化数据
【发布时间】:2014-12-22 23:01:50
【问题描述】:

我是 python 新手。我想得到如下代码的结果:

Score      Postive        Negative
  5         good            bad
  7       interesting
  3                       horrible

但是我的代码什么也没输出。请问问题出在哪里?

from bs4 import BeautifulSoup
text = """
... <body>
        <div class="review">
        <p class="pos">good</p>
        <p class="neg">bad</p>
    </div>
    <div class="review">
        <p class="pos">interesting</p>
    </div>
    <div class="review">
        <p class="neg">horrible</p>
    </div>
... </body>"""
soup = BeautifulSoup(text)
for parent in soup.find_all('div', attrs={'class': 'review'}):   
if parent.findNextSiblings('p', attrs={'class': 'pos'}):
    postive.append(parent.get_text())
else:
    postive.append("")
if parent.findNextSiblings('p', attrs={'class': 'neg'}): 
    negtive.append(parent.get_text())
else:
    negtive.append("")

【问题讨论】:

    标签: python beautifulsoup html-parsing


    【解决方案1】:

    p 标签不是 divreview 的同级标签,它们是子标签:

    positive = []
    negative = []
    for div in soup.find_all('div', attrs={'class': 'review'}):
        pos = div.find('p', {'class': 'pos'})
        positive.append(pos.get_text() if pos else '')
    
        neg = div.find('p', {'class': 'neg'})
        negative.append(neg.get_text() if neg else '')
    
    print positive
    print negative
    

    打印:

    [u'good', u'interesting', ''] 
    [u'bad', '', u'horrible']
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2013-12-29
      • 1970-01-01
      • 1970-01-01
      • 2015-10-21
      • 2014-04-16
      • 1970-01-01
      • 2018-08-17
      • 1970-01-01
      相关资源
      最近更新 更多