【问题标题】:How To Parse a slow-loading webpage with scrapy in combination with selenium?如何使用scrapy和selenium解析加载缓慢的网页?
【发布时间】:2020-05-27 16:06:36
【问题描述】:

以下是我尝试过的:

import scrapy
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from scrapy_selenium import SeleniumRequest

class PdfxSpider(scrapy.Spider):
    name = 'pdf'
    urls = 'https://www.pdfdrive.com/living-in-the-light-a-guide-to-personal-transformation-d10172273.html'

    def start_requests(self):
         yield SeleniumRequest(
            url=self.urls,
            callback=self.parse,
            #wait_time=1000,
            wait_until=EC.element_to_be_clickable((By.ID, 'alternatives'))
    )

    def parse(self, response):
        print(response.css('a.btn-success').xpath('@href').get())

【问题讨论】:

  • 有什么问题?

标签: python selenium web dynamic scrapy


【解决方案1】:

我会尝试使用 requests 和 BeautifulSoup

这样的东西会给你类似的链接,而且速度很快。

import requests
from bs4 import BeautifulSoup

url = 'https://www.pdfdrive.com/living-in-the-light-a-guide-to-personal-transformation-d10172273.html'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
}

response = requests.get(url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'lxml')
links = soup.find_all('a', {"class":"ai-similar"})
for link in links:
        print(link['href'])

【讨论】:

    猜你喜欢
    • 2021-11-18
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-05-16
    • 2020-10-03
    相关资源
    最近更新 更多