为什么我在 pandas 列中只得到一个项目（而不是多个项目）？答案

【问题标题】：Why am I getting just one item (instead of multiple items) in a pandas column?为什么我在 pandas 列中只得到一个项目（而不是多个项目）？
【发布时间】：2021-12-29 20:59:59
【问题描述】：

这是我的代码：

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
import pandas as pd

driver = webdriver.Chrome(service=Service(executable_path=ChromeDriverManager().install()))
driver.maximize_window()
driver.get('https://quotes.toscrape.com/')

df = pd.DataFrame(
    {        
        'Quote': [''],        
        'Author': [''],
        'Tags': [''],
    }
)

quotes = driver.find_elements(By.CSS_SELECTOR, '.quote')
for quote in quotes:
    text = quote.find_element(By.CSS_SELECTOR, '.text')
    author = quote.find_element(By.CSS_SELECTOR, '.author')
    
    tags = quote.find_elements(By.CSS_SELECTOR, '.tag')
    for tag in tags:
        quote_tag = tag

    df = df.append(
        {            
            'Quote': text.text,
            'Author': author.text,            
            'Tags': quote_tag.text,
        },        
        ignore_index = True
    )

df.to_csv('C:/Users/Jay/Downloads/Python/!Learn/practice/scraping/selenium/quotes.csv', index=False)

我应该得到这个结果：

Quote	Author	Tags
“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”	Albert Einstein	change deep-thoughts thinking world

相反，我得到了这个：

Quote	Author	Tags
“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”	Albert Einstein	world

我只得到了Tags 列中的最后一项，而不是全部四项。

如果我跑：

quotes = driver.find_elements(By.CSS_SELECTOR, '.quote')
for quote in quotes:        
    tags = quote.find_elements(By.CSS_SELECTOR, '.tag')
    for tag in tags:
        quote_tag = tag
        print(quote_tag.text)

我明白了：

change
deep-thoughts
thinking
world
etc

所以那段代码可以工作。

为什么没有正确填充 Tags 列？

【问题讨论】：

标签： python pandas selenium web-scraping

【解决方案1】：

使用您的代码

for tag in tags:
    quote_tag = tag

在 for 循环的每次运行中将 quote_tag 替换为 tag，从而覆盖之前存储在 quote_tag 中的值。因此，在最后一次运行之后，quote_tag 只包含最后一个标签。

你需要做类似的事情

quote_tag = ''
for tag in tags:
    quote_tag += ' ' + tag

如果你想将所有标签连接在一起。

【讨论】：

【解决方案2】：

对于您的循环，请使用以下代码：

quote_tags = []
for tag in tags:
    quote_tags.append(tag.text)

df = df.append(
    {            
        'Quote': text.text,
        'Author': author.text,            
        'Tags': ' '.join(quote_tags),
    },        
    ignore_index = True
)

如果您注意到，添加的唯一标签 (world) 恰好是最后一个标签...这不是巧合。这是因为您循环遍历标签，并且对于每个标签，您将该标签分配给quote_tag 变量，但您不对其进行任何操作，因此下一次循环迭代只是覆盖了前一次迭代设置的值。最后，当循环结束时，quote_tag 具有最后一个标签的值。

【讨论】：