【发布时间】:2017-05-08 12:42:46
【问题描述】:
我有一个 URL 列表,其中一些包含 .onion 站点和其他明确的网站对于普通的 .com 和 .net 站点,或者它对 .onion 站点使用 Socks5 代理
def random_dedicate_proxy():
dedicated_ips = [ proxy1, proxy2, proxy3
]
dedicated_proxies = [{'http':'http://' + ip, 'https':'https://' + ip} for ip in dedicated_ips]
return choice(dedicated_proxies)
def proxy_selector(url):
TOR_CLIENT = 'socks5h://127.0.0.1:9050'
if '.onion' in url:
proxy = {'http': TOR_CLIENT, 'https': TOR_CLIENT}
else:
proxy = random_dedicate_proxy()
return proxy
def get_urls_from_spreadsheet():
fname = 'list_of_stuff.csv'
url_df = pd.read_csv(fname,usecols=['URL'],keep_default_na=False)
URL = url_df.URL.dropna()
urls = [clean_url(url) for url in URL if url != '']
return urls
class BasicSpider(scrapy.Spider):
name = "basic"
rotate_user_agent = True
start_urls = get_urls_from_spreadsheet()
def parse(self, response):
item = StatusCehckerItem()
item['url'] = response.url
item['status_code'] = response.status
item['time'] = time.time()
response.meta['proxy'] = proxy_selector(response.url)
return item
使用此代码时,我得到一个DNSLookupError: DNS lookup failed: no results for hostname lookup: mqqrfjmfu2i73bjq.onion/.
【问题讨论】:
-
你在
{'proxy': proxy}这里输入了什么proxy?
标签: python proxy scrapy tor socks