【问题标题】:How to save to CSV instead of printing to terminal Selenium webscrape data如何保存到 CSV 而不是打印到终端 Selenium webscrape 数据
【发布时间】:2020-04-12 06:42:27
【问题描述】:

我终于能够从网站上抓取数据了!并将标题和日期打印到终端。但我想将它保存到一个 CSV 文件中,其中有一列是标题,一列是日期。我怎么做?

我的代码附在下面:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver = webdriver.Chrome(
    chrome_options=options,
    executable_path=r"//usr/local/Caskroom/chromedriver/81.0.4044.69/chromedriver")

driver.get(
    "https://www.nytimes.com/search?dropmab=true&endDate=20180111&query=nyc&sections=New%20York%7Cnyt%3A%2F%2Fsection%2F39480374-66d3-5603-9ce1-58cfa12988e2&sort=best&startDate=20180107")

myLength = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located(
    (By.XPATH, "//figure[@class='css-tap2ym']//following::a[1]"))))

while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    try:
        WebDriverWait(driver, 20).until(EC.element_to_be_clickable(
            (By.XPATH, "//div[@class='css-vsuiox']//button[@data-testid='search-show-more-button']"))).click()

    WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_xpath(
        "//figure[@class='css-tap2ym']//following::a[1]")) > myLength)
    titles = driver.find_elements_by_xpath(
        "//figure[@class='css-tap2ym']//following::a[1]")

    myLength = len(titles)
except TimeoutException:
    break

headlines_element = driver.find_elements_by_xpath('//p[@class="css-16nhkrn"]')
headlines = [x.text for x in eheadlines_element]
print('headlines:')
print(headlines, '\n')

dates_element = driver.find_elements_by_xpath("//time[@class='css-17ubb9w']")
dates = [x.text for x in dates_element]
print("dates:")
print(dates, '\n')

for headlines, dates in zip(headlines, dates):
    print("Headlines : Dates")
    print(headlines + ": " + dates, '\n')

driver.quit()

获取标题和日期的是最后一段代码。提前感谢您的帮助!

【问题讨论】:

    标签: python selenium web-scraping formatting export-to-csv


    【解决方案1】:

    您可以使用csv.writer将数据写入csv文件。

    用途:

    with open("your_csv_file", "w") as file:
        writer = csv.writer(file)
        writer.writerow(["Headlines", "Dates"]) # --> Write header
        for h, d in zip(headlines, dates):
            writer.writerow([h, d]) # --> Write data
    

    【讨论】:

    • 我得到 NameError: name 'csv' is not defined
    • 我刚刚添加了 import csv 抱歉
    • @shubbbby 如果这回答了你的问题,你可以接受它
    • 垂直保存时出现错误 [i.stack.imgur.com/XfTn5.png] 我将代码添加到我的代码底部,如下所示 [i.stack.imgur.com/qQwBp.png]
    • @shubbbby 你必须在所有forloops中用for h, d替换for headlines, dates
    猜你喜欢
    • 2020-10-28
    • 2023-03-19
    • 1970-01-01
    • 1970-01-01
    • 2019-05-26
    • 2013-06-27
    • 2013-09-14
    • 1970-01-01
    • 2015-11-04
    相关资源
    最近更新 更多