【发布时间】:2018-02-12 23:36:17
【问题描述】:
如何使用 Splash 为 Scrapy 设置用户代理,方法如下:
import requests
from bs4 import BeautifulSoup
ua = {"User-Agent":"Mozilla/5.0"}
url = "http://www.example.com"
page = requests.get(url, headers=ua)
soup = BeautifulSoup(page.text, "lxml")
我的蜘蛛看起来会像这样:
import scrapy
from scrapy_splash import SplashRequest
class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
start_urls = ["https://www.example.com/"]
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(
url,
self.parse,
args={'wait': 0.5}
)
【问题讨论】:
-
你试过SplashRequest的
splash_headers参数了吗?
标签: python-3.x web-scraping scrapy splash-screen