【问题标题】:httplib.BadStatusLine: '' with Selenium and PhantomJShttplib.BadStatusLine: '' 与 Selenium 和 PhantomJS
【发布时间】:2017-10-12 09:05:06
【问题描述】:

我在加载 URL 时遇到此错误,并且遇到一个奇怪的错误,我不知道如何解决。我的情况要求我必须使用 PhantomJS,因为我认为我不能在 AWS Lambda 上使用 Firefox 驱动程序,并且在我的抓取中我遇到了一个 Chromedriver 无法单击的按钮。

如果我将 PhantomJS 切换为 Chrome 或 Firefox,则 url 解析得很好。

使用 selenium==3.4.1 和 PhantomJS 2.1.1

user_agent = (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:53.0) Gecko/20100101 Firefox/53.0")

dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = user_agent
dcap["phantomjs.page.settings.javascriptEnabled"] = True

browser = webdriver.PhantomJS(service_log_path=os.path.devnull, service_args=[
    '--ignore-ssl-errors=true'], desired_capabilities=dcap)

browser.set_window_size(1120, 550)
browser.get('https://drizly.com/session/new')



File "main.py", line 257, in <module>
    lambda_handler(None, None)
  File "main.py", line 103, in lambda_handler
    browser.get('https://drizly.com/session/new')
  File "/Users/aymon/Envs/drizly/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 264, in get
    self.execute(Command.GET, {'url': url})
  File "/Users/aymon/Envs/drizly/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 250, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/Users/aymon/Envs/drizly/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 464, in execute
    return self._request(command_info[0], url, body=data)
  File "/Users/aymon/Envs/drizly/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 526, in _request
    resp = opener.open(request, timeout=self._timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1227, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1200, in do_open
    r = h.getresponse(buffering=True)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1132, in getresponse
    response.begin()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 453, in begin
    version, status, reason = self._read_status()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 417, in _read_status
    raise BadStatusLine(line)
httplib.BadStatusLine: ''

【问题讨论】:

    标签: python selenium phantomjs aws-lambda


    【解决方案1】:

    BadStatusLineHTTPException 的子类,如果a server responds with a HTTP status code that we don’t understand 则会引发该子类。你可能想抓住它,如下

    #...
    browser.set_window_size(1120, 550)
    try:
        browser.get('https://drizly.com/session/new')
    except httplib.BadStatusLine as bsl:
        print('[!!!] {message}'.format(bsl.message))
    #...
    

    请注意,一个好的做法是错误should never pass silently。因此使用print

    【讨论】:

    • 除了 200 以外,其他司机都没有得到任何东西
    • @AymonFournier。许多 web 开发人员构建了特定的策略来检测类似 phantomjs(无头)浏览器。例如。见Detecting PhantomJS Based Visitors。我认为这就是你正在经历的。这就是为什么你应该尝试超越BadStatusLine异常,看看你是否真的收到了网页源。
    猜你喜欢
    • 1970-01-01
    • 2018-12-17
    • 2016-03-10
    • 2017-03-10
    • 2013-07-19
    • 1970-01-01
    • 1970-01-01
    • 2015-02-21
    • 1970-01-01
    相关资源
    最近更新 更多