【问题标题】:How to write to Csv File for each list如何为每个列表写入 Csv 文件
【发布时间】:2018-02-04 17:41:27
【问题描述】:

我从一个网站上抓取了三个列表,并将它们打印到 Selenium 中。那些是团队,赔率和 Href。但是,这些列表不会正确写入 CSV 文件。我希望将每个列表放入第 1、2 和 3 列。有什么帮助吗?

我倾向于得到很多:<selenium.webdriver.remote.webelement.WebElement (session="211dc26889dedb4d1d5db5f355c9b225", element="0.936313100855265-9")>

我的数据如下所示:https://ibb.co/iW6rbk

我想要它的样子:https://ibb.co/fhna2Q

我相信这是因为它编写了 Web 元素而不是我真正想要的。关于如何调整我的代码以便它实际写入我想要的内容(抓取的值)的任何建议?

谢谢

 from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    import csv
    import requests
    import time
    from selenium import webdriver
    driver = webdriver.Chrome(executable_path=r'C:\Brother\chromedriver.exe')
    driver.set_window_size(1024, 600)
    driver.maximize_window()


    driver.get('https://www.bookmaker.com.au/sports/soccer/37854435-football-australia-australian-npl-2-new-south-wales/')

    SCROLL_PAUSE_TIME = 0.5

    # Get scroll height
    last_height = driver.execute_script("return document.body.scrollHeight")

    while True:
        # Scroll down to bottom
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        # Wait to load page
        time.sleep(SCROLL_PAUSE_TIME)

        # Calculate new scroll height and compare with last scroll height
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:
            break
        last_height = new_height

    time.sleep( 5 )

    #link
    elems = driver.find_elements_by_css_selector("h3 a[Href*='/sports/soccer']")
    for elem in elems:
        print(elem.get_attribute("href"))



    #TEAM
    langs1 = driver.find_elements_by_css_selector(".row:nth-child(1) td:nth-child(1)")
    for lang in langs1:
        print (lang.text)



    time.sleep( 10)

    #ODDS
    langs = driver.find_elements_by_css_selector(".row:nth-child(1) span")
    for lang in langs:
        print (lang.text)






    time.sleep( 10 )

    import csv

    with open ('I AM HERE12345.csv','w') as file:
       writer=csv.writer(file)
       for row in langs, langs1, elems:
          writer.writerow(row)

【问题讨论】:

    标签: python python-3.x selenium selenium-webdriver


    【解决方案1】:

    您的代码中有两个问题

    #TEAM
    langs1 = driver.find_elements_by_css_selector(".row:nth-child(1) td:nth-child(1)")
    for lang in langs1:
        print (lang.text)
    

    langs1 是一个元素数组。您打印每个的文本,但数组仍然只有元素而不是文本。那么,当您从未存储过文本时,如何将其添加到 CSV 中呢?所以我像下面这样改变它。不是最优化的代码,但可以工作

    langs1 = driver.find_elements_by_css_selector(".row:nth-child(1) td:nth-child(1)")
    langs1_text = []
    
    for lang in langs1:
        print(lang.text)
        langs1_text.append(lang.text)
    

    接下来你的 csv 循环是错误的

    for row in langs_text, langs1_text, elem_href:
        writer.writerow(row)
    

    此循环将所有数组组合成单行而不是多行。您需要的是每个数组中的一个值,一次一个

    for row in zip(langs_text, langs1_text, elem_href):
        writer.writerow(row)
    

    Edit-1

    虽然可以使您的代码正常工作。但是使用的方法是不对的。当您想从多个部分捕获数据时,您应该遍历每个部分,然后从该部分收集数据。

    为此我更改了代码

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    import csv
    import requests
    import time
    from selenium import webdriver
    
    driver = webdriver.Chrome()
    driver.set_window_size(1024, 600)
    driver.maximize_window()
    
    driver.get('https://www.bookmaker.com.au/sports/soccer/36116103-football-russia-russian-national-football-league/')
    
    SCROLL_PAUSE_TIME = 0.5
    
    # Get scroll height
    last_height = driver.execute_script("return document.body.scrollHeight")
    
    while True:
        # Scroll down to bottom
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    
        # Wait to load page
        time.sleep(SCROLL_PAUSE_TIME)
    
        # Calculate new scroll height and compare with last scroll height
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:
            break
        last_height = new_height
    
    time.sleep(5)
    
    sections = driver.find_elements_by_css_selector(".fullbox")
    # link
    import csv
    
    with open('I AM HERE12345.csv', 'w') as file:
        writer = csv.writer(file)
        for section in sections:
            link = section.find_element_by_css_selector("h3 a").get_attribute("href")
            team_name = section.find_element_by_css_selector("tr.row[data-teamname]").get_attribute("data-teamname")
            bet = section.find_element_by_css_selector("a.odds.quickbet").text
    
            writer.writerow((bet, team_name, link))
    

    CSV 生成良好

    Edit-2

    空白行的问题是特定于 Windows 的,这就是为什么没有出现在我的 Mac 上的原因。您可以使用以下任何一种方法来摆脱它

    with open('I AM HERE12345.csv', 'w', newline='') as file:
    

    with open('I AM HERE12345.csv', 'w', newline='\n') as file:
    

    【讨论】:

    • 我似乎无法将此应用于 href elem = driver.find_elements_by_css_selector(".row:nth-child(1) td:nth-child(1)") elem_href = [] for elem 中的元素: print(elem.href) elem_href.append(elem.href)
    • 它会elem_href.append(elem.get_attribute("href"))
    • 我有以下工作,但为什么出于好奇它没有显示 3 次? elem = driver.find_elements_by_css_selector(".row:nth-child(1) td:nth-child(1)") elem_href = [] for elem in elem: print(elem.get_attribute("href")) elem_href.append( elem.get_attribute("href"))。其中给出: 无 无 无 SKA Energiya Khabarovsk CSKA Moscow Zenit Krasnodar
    • 您可能会选择没有 href 的空白元素。这就是为什么没有
    • 有没有办法删除 CSV 中的空格,这样我就可以将数据直接放在彼此的下方? ibb.co/fnt9U5
    猜你喜欢
    • 1970-01-01
    • 2019-09-02
    • 1970-01-01
    • 1970-01-01
    • 2014-06-27
    • 2018-06-01
    • 1970-01-01
    • 2016-09-26
    相关资源
    最近更新 更多