【问题标题】:How to deal with httplib.BadStatusLine: ''如何处理 httplib.BadStatusLine: ''
【发布时间】:2017-05-28 15:29:19
【问题描述】:

我正在使用 Python、BeautifulSoup 和 Selenium 抓取一些网络数据。我也在使用 PyVirtualDisplay,所以我不需要显示器。

它可以在我的笔记本电脑上完美运行,但是当我从服务器运行时,我收到以下错误:

httplib.BadStatusLine: ''

我在第二次抓取页面时得到了这个。它现在一直在这样做。有什么问题?

编辑

添加代码:

import requests, bs4
import csv
import re
import datetime
import time
import os 

from contextlib import closing
from selenium import webdriver
from selenium.webdriver import Firefox # pip install selenium
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from pyvirtualdisplay import Display

display = Display(visible=0, size=(1500, 1200))
display.start()

url_base = "https://www.seek.com.au/jobs?page="

# open web browser and login
binary = FirefoxBinary('/home/firefox/firefox/firefox')
driver = webdriver.Firefox(firefox_binary=binary)

overlap = False
page = 0

while not overlap:
    page += 1
    driver.get(url_base+str(page))

    ...

这是回溯:

Traceback (most recent call last):
  File "manage.py", line 22, in <module>
    execute_from_command_line(sys.argv)
  File "/var/www/matt/env/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
    utility.execute()
  File "/var/www/matt/env/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 359, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/www/matt/env/local/lib/python2.7/site-packages/django/core/management/base.py", line 294, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/www/matt/env/local/lib/python2.7/site-packages/django/core/management/base.py", line 345, in execute
    output = self.handle(*args, **options)
  File "/var/www/matt/matt/management/commands/mattv3.py", line 109, in handle
    driver.get(url_base+str(page))
  File "/var/www/matt/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 245, in get
    self.execute(Command.GET, {'url': url})
  File "/var/www/matt/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/var/www/matt/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute
    return self._request(command_info[0], url, body=data)
  File "/var/www/matt/env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 426, in _request
    resp = self._conn.getresponse()
  File "/usr/lib/python2.7/httplib.py", line 1136, in getresponse
    response.begin()
  File "/usr/lib/python2.7/httplib.py", line 453, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.7/httplib.py", line 417, in _read_status
    raise BadStatusLine(line)
httplib.BadStatusLine: ''

【问题讨论】:

  • 发布您的代码。

标签: python-2.7 selenium beautifulsoup


【解决方案1】:

我在一个非常小的服务器(512MB,20GB SSD)上运行它。我已经增加了它,它运行良好。如果有人可以向我解释这个问题,我很乐意理解。

【讨论】:

    猜你喜欢
    • 2015-02-21
    • 2013-08-03
    • 2012-05-29
    • 1970-01-01
    • 2017-09-24
    • 2017-10-12
    • 2017-04-09
    • 2017-03-10
    • 1970-01-01
    相关资源
    最近更新 更多