【问题标题】:Why is selenium not looping properly?为什么硒不能正确循环?
【发布时间】:2021-10-27 21:01:21
【问题描述】:

我运行这段代码试图使用 selenium 抓取一个动态网站。而不是按照我的代码的指示运行 for 循环,并在共享相同类名的其他元素中为我提供更多数据。它只重复第一个元素的数据。

代码

import time
from selenium import webdriver
from selenium.webdriver.chrome import service
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

ser= Service("C:\Program Files (x86)\chromedriver.exe")
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options,service=ser)
driver.get('https://soundcloud.com/jujubucks')
print(driver.title)

song_contents = driver.find_elements(By.CLASS_NAME, 'soundList__item')


song_list = []

for song in song_contents:
 search = driver.find_element(By.CLASS_NAME, 'soundTitle__usernameText').text
 search_song = driver.find_element(By.XPATH, '//span[@class=""]').text
 search_date = driver.find_element(By.CLASS_NAME, 'sc-visuallyhidden').text
 search_plays = driver.find_element(By.XPATH, '//*[@id="content"]/div/div[4]/div[1]/div/div[2]/div/div[2]/ul/li[1]/div/div/div/div[2]/div[4]/div[2]/div/ul/li/span/span[2]').text
 song ={
     'Artist': search, 
     'Song_title': search_song, 
     'Date': search_date,
     'Streams': search_plays
 }

 song_list.append(song)

df = pd.DataFrame(song_list)
print(df)

driver.quit()

这是它给出的输出。只有一组数据,而不是转移到其他组

输出

Stream Juju Bucks music | Listen to songs, albums, playlists for free on SoundCloud
       Artist                              Song_title               Date Streams
0  Juju Bucks  Squad Too Deep Ft. Cool Prince (Outro)  Posted 1 year ago      31
1  Juju Bucks  Squad Too Deep Ft. Cool Prince (Outro)  Posted 1 year ago      31
2  Juju Bucks  Squad Too Deep Ft. Cool Prince (Outro)  Posted 1 year ago      31
3  Juju Bucks  Squad Too Deep Ft. Cool Prince (Outro)  Posted 1 year ago      31
4  Juju Bucks  Squad Too Deep Ft. Cool Prince (Outro)  Posted 1 year ago      31

【问题讨论】:

  • 您的定位器每次迭代都相同。也许你想做“song.find_element...”?我也会避免使用“song”作为数组...(可能不是问题,它只是与迭代器变量混淆...)也许改用“song-info”?
  • 不应该吗?所有歌曲都具有相同的元素名称或结构。

标签: python pandas selenium for-loop selenium-webdriver


【解决方案1】:

要在元素中查找元素,请在 xpath 中使用点,如下所示:

driver.get("https://soundcloud.com/jujubucks")

wait = WebDriverWait(driver,30)

# Close Cookie pop-up
wait.until(EC.element_to_be_clickable((By.ID,"onetrust-accept-btn-handler"))).click()

song_contents = driver.find_elements(By.CLASS_NAME, 'soundList__item')

for option in song_contents:
    title = option.find_element_by_xpath(".//a[contains(@class,'soundTitle__title')]/span").text # Extract title from that particular song.
    print(title)

更新:

i = 1
for _ in range(20):
    song_contents = driver.find_element_by_xpath("//li[@class='soundList__item'][{}]".format(i))
    driver.execute_script("arguments[0].scrollIntoView(true);",song_contents)
    title = song_contents.find_element_by_xpath(".//a[contains(@class,'soundTitle__title')]/span").text # Use a dot in the xpath to find element within in an element
    print(title)
    i+=1
Squad Too Deep Ft. Cool Prince (Outro)
Tropikana ft. P-Dogg Amazing
Party Ka Mngani Ft. X-Poll
Joy Ft. Black Sushi & Gavin Bowden
Amazing ft. X-Poll
Owami
Phakade
Ain't No Thang ft. Musiholiq
Bhalela Ft. X-Poll
Piece Of Me ft. King Cobra
Put Me Down ft. Fabee (Interlude)
Way Up ft. Musiholiq
Carlito ft. Captain Blu
Blaze
Talk About Me (Ft. Cool Prince)
Get Em Up
Ntate Modimo
In Bucks We Trust (Gold Edition)
Intro (In Bucks We Trust )
Juju Bucks - Show Me (ft. Mbali Zondi)

【讨论】:

  • 为什么只返回 5 个标题而不是整个目录?
  • @HoustonKhanyile - 歌曲在您向下滚动时出现。并且列表在 DOM 中分别更新。应用滚动效果。
  • @HoustonKhanyile - 已更新答案。经历同样的事情。你可以参考这个 - https://stackoverflow.com/q/69375160/16452840
  • 本次更新使情况变得更糟,而不是迭代。它只取回第一个元素。我已经尽力修复它,但我不能。
  • @HoustonKhanyile - 对我来说效果很好。已更新代码的输出。不确定你是如何应用滚动的。
猜你喜欢
  • 1970-01-01
  • 2013-06-15
  • 2019-12-24
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-05-23
相关资源
最近更新 更多