【发布时间】:2021-12-29 20:59:59
【问题描述】:
这是我的代码:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
import pandas as pd
driver = webdriver.Chrome(service=Service(executable_path=ChromeDriverManager().install()))
driver.maximize_window()
driver.get('https://quotes.toscrape.com/')
df = pd.DataFrame(
{
'Quote': [''],
'Author': [''],
'Tags': [''],
}
)
quotes = driver.find_elements(By.CSS_SELECTOR, '.quote')
for quote in quotes:
text = quote.find_element(By.CSS_SELECTOR, '.text')
author = quote.find_element(By.CSS_SELECTOR, '.author')
tags = quote.find_elements(By.CSS_SELECTOR, '.tag')
for tag in tags:
quote_tag = tag
df = df.append(
{
'Quote': text.text,
'Author': author.text,
'Tags': quote_tag.text,
},
ignore_index = True
)
df.to_csv('C:/Users/Jay/Downloads/Python/!Learn/practice/scraping/selenium/quotes.csv', index=False)
我应该得到这个结果:
| Quote | Author | Tags |
|---|---|---|
| “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.” | Albert Einstein | change deep-thoughts thinking world |
相反,我得到了这个:
| Quote | Author | Tags |
|---|---|---|
| “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.” | Albert Einstein | world |
我只得到了Tags 列中的最后一项,而不是全部四项。
如果我跑:
quotes = driver.find_elements(By.CSS_SELECTOR, '.quote')
for quote in quotes:
tags = quote.find_elements(By.CSS_SELECTOR, '.tag')
for tag in tags:
quote_tag = tag
print(quote_tag.text)
我明白了:
change
deep-thoughts
thinking
world
etc
所以那段代码可以工作。
为什么没有正确填充 Tags 列?
【问题讨论】:
标签: python pandas selenium web-scraping