【发布时间】:2021-10-27 21:01:21
【问题描述】:
我运行这段代码试图使用 selenium 抓取一个动态网站。而不是按照我的代码的指示运行 for 循环,并在共享相同类名的其他元素中为我提供更多数据。它只重复第一个元素的数据。
代码
import time
from selenium import webdriver
from selenium.webdriver.chrome import service
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
ser= Service("C:\Program Files (x86)\chromedriver.exe")
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options,service=ser)
driver.get('https://soundcloud.com/jujubucks')
print(driver.title)
song_contents = driver.find_elements(By.CLASS_NAME, 'soundList__item')
song_list = []
for song in song_contents:
search = driver.find_element(By.CLASS_NAME, 'soundTitle__usernameText').text
search_song = driver.find_element(By.XPATH, '//span[@class=""]').text
search_date = driver.find_element(By.CLASS_NAME, 'sc-visuallyhidden').text
search_plays = driver.find_element(By.XPATH, '//*[@id="content"]/div/div[4]/div[1]/div/div[2]/div/div[2]/ul/li[1]/div/div/div/div[2]/div[4]/div[2]/div/ul/li/span/span[2]').text
song ={
'Artist': search,
'Song_title': search_song,
'Date': search_date,
'Streams': search_plays
}
song_list.append(song)
df = pd.DataFrame(song_list)
print(df)
driver.quit()
这是它给出的输出。只有一组数据,而不是转移到其他组
输出
Stream Juju Bucks music | Listen to songs, albums, playlists for free on SoundCloud
Artist Song_title Date Streams
0 Juju Bucks Squad Too Deep Ft. Cool Prince (Outro) Posted 1 year ago 31
1 Juju Bucks Squad Too Deep Ft. Cool Prince (Outro) Posted 1 year ago 31
2 Juju Bucks Squad Too Deep Ft. Cool Prince (Outro) Posted 1 year ago 31
3 Juju Bucks Squad Too Deep Ft. Cool Prince (Outro) Posted 1 year ago 31
4 Juju Bucks Squad Too Deep Ft. Cool Prince (Outro) Posted 1 year ago 31
【问题讨论】:
-
您的定位器每次迭代都相同。也许你想做“song.find_element...”?我也会避免使用“song”作为数组...(可能不是问题,它只是与迭代器变量混淆...)也许改用“song-info”?
-
不应该吗?所有歌曲都具有相同的元素名称或结构。
标签: python pandas selenium for-loop selenium-webdriver