【发布时间】:2021-06-01 15:13:22
【问题描述】:
我一直在关注 this guide 来抓取 LinkedIn 和谷歌搜索。自创建指南以来,谷歌搜索结果的 HTML 发生了一些变化,因此我不得不稍微修改一下代码。我现在需要从搜索结果中获取链接,但遇到了一个问题,即由于错误,即使在从 this post 实施代码修复后程序也没有返回任何内容。我不确定我在这里做错了什么。
import Parameters
from time import sleep
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from parsel import Selector
import csv
# defining new variable passing two parameters
writer = csv.writer(open(Parameters.file_name, 'w'))
# writerow() method to the write to the file object
writer.writerow(['Name', 'Job Title', 'Company', 'College', 'Location', 'URL'])
# specifies the path to the chromedriver.exe
driver = webdriver.Chrome('/Users/.../Python Scripts/chromedriver')
driver.get('https://www.linkedin.com')
sleep(0.5)
# locate email form by_class_name then send_keys() to simulate key strokes
username = driver.find_element_by_id('session_key')
username.send_keys(Parameters.linkedin_username)
sleep(0.5)
password = driver.find_element_by_id('session_password')
password.send_keys(Parameters.linkedin_password)
sleep(0.5)
sign_in_button = driver.find_element_by_class_name('sign-in-form__submit-button')
sign_in_button.click()
sleep(3)
driver.get('https:www.google.com')
sleep(3)
search_query = driver.find_element_by_name('q')
search_query.send_keys(Parameters.search_query)
sleep(0.5)
search_query.send_keys(Keys.RETURN)
sleep(3)
################# HERE IS WHERE THE ISSUE LIES ######################
#linkedin_urls = driver.find_elements_by_class_name('iUh30')
linkedin_urls = driver.find_elements_by_css_selector("yuRUbf > a")
for url_prep in linkedin_urls:
url_prep.get_attribute('href')
#linkedin_urls = [url.text for url in linkedin_urls]
sleep(0.5)
print('Supposed to be URLs')
print(linkedin_urls)
搜索参数是
search_query = 'site:linkedin.com/in/ AND "python developer" AND "London"'
编辑:如果我通过.find_elements_by_class_name 或 Sector97 的第一次编辑,这是输出。
【问题讨论】:
标签: python python-3.x selenium web-scraping