【问题标题】:Get node inside div without duplicating在不重复的情况下获取div内的节点
【发布时间】:2019-10-24 04:32:46
【问题描述】:

我有下面的html代码

<div class = "matches">
<div class = "conf">
Brazil vs. Colombia
</ div>
<div class = "targetHour"> 08:00 pm </ div>
</ div>
</ div>
<div class = "matches">
<div class = "conf">
Chilex Argentina
</ div>
<div class = "targetHour"> 08:00 pm </ div>
</ div>
</ div>

我需要获取父div的值和子div的值,不重复结果。将每场比赛的时间表与各自的家长联系起来。

这是我的python代码

for nc in soup.find_all('div', attrs={'class': 'league-data'}):
    campeonato = nc.text
    for hr in soup.find('div', attrs={'class': 'match row cf'}).findAll("div",recursive=False):
        print(campeonato + "|" + hr.text)

【问题讨论】:

  • 为什么不列出已经收集的项目并从中过滤?

标签: python python-3.x beautifulsoup selenium-chromedriver


【解决方案1】:

您可以使用zip() 函数将比赛与相应的时间表联系起来:

from bs4 import BeautifulSoup

data = '''<div class = "conf">
Brazil vs. Colombia
</div>
<div class = "targetHour"> 08:00 pm </div>
</div>
</div>
<div class = "matches">
<div class = "conf">
Chilex Argentina
</div>
<div class = "targetHour"> 08:00 pm </div>
</div>
</div>'''

soup = BeautifulSoup(data, 'lxml')

for match, hour in zip( soup.select('div.conf'), soup.select('div.targetHour') ):
    print(match.text.strip(), hour.text.strip())

打印:

Brazil vs. Colombia 08:00 pm
Chilex Argentina 08:00 pm

【讨论】:

    【解决方案2】:

    另一种选择(假设列表长度为偶数)

    from bs4 import BeautifulSoup
    
    data = '''<div class = "conf">
    Brazil vs. Colombia
    </div>
    <div class = "targetHour"> 08:00 pm </div>
    </div>
    </div>
    <div class = "matches">
    <div class = "conf">
    Chilex Argentina
    </div>
    <div class = "targetHour"> 08:00 pm </div>
    </div>
    </div>'''
    
    soup = BeautifulSoup(data, 'lxml')
    items = [item.text.strip() for item in soup.select('.conf, .targetHour')]
    for i in range(0, len(items), 2):
           print(items[i],items[i+1])
    

    【讨论】:

      猜你喜欢
      • 2017-10-15
      • 1970-01-01
      • 2023-03-12
      • 2013-11-25
      • 2013-05-19
      • 1970-01-01
      • 1970-01-01
      • 2020-06-14
      • 2015-09-30
      相关资源
      最近更新 更多