PhantomJS 使用 HTTPS 返回空白页面答案

【问题标题】：PhantomJS returning blank page with HTTPSPhantomJS 使用 HTTPS 返回空白页面
【发布时间】：2017-07-13 21:14:55
【问题描述】：

使用 phantomjs selenium beautifulsoup 设置打印页面源，但仅在 https 上返回空白 html。在 http 上返回页面源。阅读诸如this 和this 之类的材料，但没有结果。

from selenium import webdriver
import urllib.request as urllib2
import requests
import urllibh
from bs4 import BeautifulSoup
import csv
import time

browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true', '--ssl-protocol=any'])
browser.get('https://google.com')
browser.set_window_size(2000, 1500)

soup = BeautifulSoup(browser.page_source, "html.parser")

print(soup)

browser.quit()

结果

<html><head></head><body></body></html>
Complete

【问题讨论】：

您知道 Google 竭尽全力防止他们的内容被未经授权的机器人自动化/抓取吗？
我以google为例，它可以是任何https页面。与此无关。

标签： selenium selenium-webdriver web-scraping phantomjs

【解决方案1】：

browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true', '--ssl-client-certificate-file=C:\tmp\clientcert.cer', '--ssl-client-key-file=C:\tmp\clientcert.key', '--ssl-client-key-passphrase=1111'])

必须将 SSL 证书指向本地文件。

【讨论】：