【问题标题】:Scrape specific div with Selenium with Python使用 Python 使用 Selenium 刮取特定的 div
【发布时间】:2021-12-26 09:13:25
【问题描述】:

我有 HTML 代码,需要从其中抓取 <div class="odds ng-star-inserted"> 1.30 </div><div class="odds ng-star-inserted"> 2.30 </div><div class="odds ng-star-inserted"> 1.31 </div><div class="odds ng-star-inserted"> 2.31 </div> 值 1.30、2.30、1.31 和 2.31,但它们每行只返回 1.30 和 2.30。

结果必须是:

荷兰\n韩国 1.30\n2.30 德国\n日本 1.31\n2.31

但我明白了:

荷兰\n韩国 1.30\n2.30 德国\n日本 1.30\n2.30

这是 Python 代码:

teams = []
btts = []
odds_events = []

box = driver.find_element(By.XPATH, '//*[@id="page"]/div[2]')
#Looking for 'sports titles'
sport_title = box.find_element(By.CLASS_NAME, 'sport-name')

parent = sport_title.find_element(By.XPATH, './..')
grandparent = parent.find_element(By.XPATH, './..').find_element(By.XPATH, './..').find_element(By.XPATH, './..')

single_row_events = grandparent.find_elements(By.CLASS_NAME, 'event')

for match in single_row_events:
    odds_event = match.find_elements(By.CLASS_NAME, 'games')
    odds_events.append(odds_event)
    # Scrape teams
    for team in match.find_elements(By.CLASS_NAME, 'rivals'):
        teams.append(team.text)
        
for odds_event in odds_events:
    for n, box in enumerate(odds_event):
    rows = box.find_elements(By.XPATH, '//div[@class="game g2 ng-star-inserted"]')
       if n == 0:
          btts.append(rows[0].text)

如果我设置 rows = box.find_elements(By.XPATH, './/*')if n == 2: 显示错误

ValueError: 所有数组的长度必须相同

但是如果我设置 if n == 0: 给我一个很好的结果,但对于 <div class="game g3 ng-star-inserted"> 所以在这种情况下结果是,但我不需要它。

荷兰\n韩国 1.10\n2.10\n3.10 德国\n日本 1.11\n2.11\n3.11

这是 HTML 代码:

  <div id="events">
    <game-filter class="ng-star-inserted">
      <div id="sport-legend" class="single">
        <div class="sport-name"> Football </div>
        <div class="games g3">
          <div class="game ng-star-inserted">
            <div class="game-name"> KI </div>
            <div class="selections s3 ng-star-inserted">
              <div class="selection ng-star-inserted"> Home </div>
              <div class="selection ng-star-inserted"> Away </div>
            </div>
          </div>
          <div class="game ng-star-inserted">
            <div class="game-name"> UG </div>
            <div class="selections s3 ng-star-inserted">
              <div class="selection ng-star-inserted"> Over </div>
              <div class="selection ng-star-inserted"> O/U </div>
              <div class="selection ng-star-inserted"> Under </div>
            </div>
          </div>
          <div class="game ng-star-inserted">
            <div class="game-name"> BTTS </div>
            <div class="selections s2 ng-star-inserted">
              <div class="selection ng-star-inserted"> GG </div>
              <div class="selection ng-star-inserted"> NG </div>
            </div>
          </div>
        </div>
      </div>
    </game-filter>
    <standard-item-info class="event ng-star-inserted">
      <div class="details">
        <div class="info">
          <div class="time">01:01</div>
          <div class="date">01.01.</div>
        </div>
        <div class="rivals">
          <div class="league">
            <!---->
            <span class="time-special ng-star-inserted">VIRT 10'
            </span> EL
          </div>
          <div class="home"> Netherlands </div>
          <div class="away"> South Korea </div>
        </div>
      </div>
      <standard-item-games class="games g3 ng-star-inserted">
        <div class="game g3 ng-star-inserted">
          <div class="ng-star-inserted">
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 1.10 </div>
            </standard-item-game>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 2.10 </div>
            </standard-item-game>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 3.10 </div>
            </standard-item-game>
          </div>
        </div>
        <div class="game g2 g3 ng-star-inserted">
          <div class="ng-star-inserted">
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 1.20 </div>
            </standard-item-game>
            <div class="odds limit ng-star-inserted"> 2.20 </div>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 3.20 </div>
            </standard-item-game>
          </div>
        </div>
        <div class="game g2 ng-star-inserted">
          <div class="ng-star-inserted">
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 1.30 </div>
            </standard-item-game>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 2.30 </div>
            </standard-item-game>
          </div>
        </div>
      </standard-item-games>
      <div class="show-all-expand ng-star-inserted">
        <div class="event-expand">
          <div class="icon"></div>
        </div>
      </div>
    </standard-item-info>
    <standard-item-info class="event ng-star-inserted">
      <div class="details">
        <div class="info">
          <div class="time">01:01</div>
          <div class="date">01.01.</div>
        </div>
        <div class="rivals">
          <div class="league">
            <!---->
            <span class="time-special ng-star-inserted">VIRT 10'
            </span> EL
          </div>
          <div class="home"> Germany </div>
          <div class="away"> Japan </div>
        </div>
      </div>
      <standard-item-games class="games g3 ng-star-inserted">
        <div class="game g3 ng-star-inserted">
          <div class="ng-star-inserted">
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 1.11 </div>
            </standard-item-game>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 2.11 </div>
            </standard-item-game>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 3.11 </div>
            </standard-item-game>
          </div>
        </div>
        <div class="game g2 g3 ng-star-inserted">
          <div class="ng-star-inserted">
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 1.21 </div>
            </standard-item-game>
            <div class="odds limit ng-star-inserted"> 2.21 </div>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 3.21 </div>
            </standard-item-game>
          </div>
        </div>
        <div class="game g2 ng-star-inserted">
          <div class="ng-star-inserted">
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 1.31 </div>
            </standard-item-game>
            <standard-item-game class="ng-star-inserted">
              <div class="odds ng-star-inserted"> 2.31 </div>
            </standard-item-game>
          </div>
        </div>
      </standard-item-games>
      <div class="show-all-expand ng-star-inserted">
        <div class="event-expand">
          <div class="icon"></div>
        </div>
      </div>
    </standard-item-info>
  </div>
</div>```

【问题讨论】:

    标签: python python-3.x selenium xpath scrape


    【解决方案1】:

    一个解决方案:

    teamsdiv = driver.find_elements_by_xpath ("//div[@id='events']//div[@class='home' or @class='away']")
    notesdiv = driver.find_elements_by_xpath ("//div[@id='events']//standard-item-games")
    
    teams = []
    for i in range(0, len(teamsdiv), 2):
        teams.append([teamsdiv[i].text, teamsdiv[i+1].text])
    
    notes = []
    for i in range(len(notesdiv)):
        notes.append(notesdiv[i].text.split('\n')[-2:])
    
    for i in range(len(notes)):
        print(teams[i], notes[i])
    

    结果:

    ['Netherlands', 'South Korea'] ['1.30', '2.30']
    ['Germany', 'Japan'] ['1.31', '2.31']
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-03-21
      • 2021-12-19
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多