【问题标题】:Python Web Scraping: Clicking links one by one on Ajax SitePython Web Scraping:在 Ajax 站点上一一点击链接
【发布时间】:2016-07-09 15:42:34
【问题描述】:

我有一个保存在 csv 文件中的搜索条件列表。我想遍历每个搜索条件以在网站上生成相应的搜索结果。对于生成的每组搜索结果(链接),我想点击链接,然后从生成的新页面中获取数据。不幸的是,我在进入每个链接时都遇到了问题。如果有人可以请提供一些见解,那将非常感谢

import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

# read list of CAS Numbers to be searched
data = pd.read_csv("NPRI CACs.csv", names=["CAS Number", "Chemical Name"])
data.dropna()
CAS = data["CAS Number"]

# Parameters to be called
url = 'http://www.lifelabs.msdss.com/Login.aspx?ReturnUrl=%2fMainMenu.aspx%3ffm%3d0%26tb%3d0'

# Sign into SafeTec
browser = webdriver.Firefox()
browser.get(url)
browser.find_element_by_class_name("text").click()

# Conduct MSDS Searches on SafeTec
for i in range(10):
    try:
        Ingredient_CAS_Number = browser.find_element_by_id("placeBody_dynField48_txtTextBox")
        Ingredient_CAS_Number.send_keys(CAS[i])
        browser.find_element_by_id("placeBody_linkSearchBottom").click()

        list_links = browser.find_elements_by_css_selector("a[href*='MSDSDetail']")
        links = []
        for j in range(len(list_links)):
            links.append(list_links[j].get_attribute('href'))

        Product_Name = []
        for link in links:
            browser.get(link)
            product = browser.find_element_by_id("placeBody_dynField1_txtTextBox")
            Product_Name.append(product)
        print(Product_Name)

        browser.get(url)
    except:
        print(CAS[i])
        continue

【问题讨论】:

    标签: python loops selenium hyperlink screen-scraping


    【解决方案1】:

    我设法用下面的代码解决了这个问题。虽然,解决方案有点不雅......

    import pandas as pd
    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from bs4 import BeautifulSoup
    
    # read list of CAS Numbers to be searched
    data = pd.read_csv("NPRI CACs.csv", names=["CAS Number", "Chemical Name"])
    data.dropna()
    CAS = data["CAS Number"]
    
    # Parameters to be called
    url = 'http://www.lifelabs.msdss.com/Login.aspx?ReturnUrl=%2fMainMenu.aspx%3ffm%3d0%26tb%3d0'
    
    # Sign into SafeTec
    browser = webdriver.Firefox()
    browser.get(url)
    browser.find_element_by_class_name("text").click()
    
    # Conduct MSDS Searches on SafeTec
    for i in range(2):
    
            Ingredient_CAS_Number = browser.find_element_by_id("placeBody_dynField48_txtTextBox")
            Ingredient_CAS_Number.send_keys(CAS[i])
            browser.find_element_by_id("placeBody_linkSearchBottom").click()
    
            list_links = browser.find_elements_by_css_selector("a[href*='MSDSDetail']")
            all_results = []
            for j in list_links:
                result = j.text
                all_results.append(result)
    
            for i in range(len(all_results)):
                browser.find_element_by_link_text(all_results[i]).click()
                browser.back()
    
    
            browser.get(url)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-07-26
      • 2020-03-25
      • 2012-11-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多