【问题标题】:Error when passing URLs into a requests.get()将 URL 传递到 requests.get() 时出错
【发布时间】:2022-09-30 12:41:28
【问题描述】:

我一直在开发一个程序,该程序从 .csv 获取 URL 并计算网页上的字数。 URL 来自 pandas 数据框中“文章”列下的行。 URL 被输入到设置为变量的 requests.get(url) 中。在我对错误的调查中,将 URL 输入到 requrests.get() 时出现问题。

def file_input(file):
   #takes a .csv file from the user
   df = pd.read_csv(file, sep='[;,]', engine='python')
   for i in range(len(df)):
     df.at[i, "Word Count"] = word_counter(df.at[i, "Article"])
def word_counter(url):
  #keeps tracks of the page's word count
  count = 0
  #the requests.get(url) takes the string of url and gets the access of the webpage
  page = requests.get(url)

以下是错误消息:

Traceback (most recent call last):
  File "/home/runner/Article-Word-counter/venv/lib/python3.8/site-packages/urllib3/response.py", line 406, in _decode
    data = self._decoder.decompress(data)
  File "/home/runner/Article-Word-counter/venv/lib/python3.8/site-packages/urllib3/response.py", line 93, in decompress
    ret += self._obj.decompress(data)
zlib.error: Error -3 while decompressing data: incorrect header check

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/Article-Word-counter/venv/lib/python3.8/site-packages/requests/models.py", line 816, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/home/runner/Article-Word-counter/venv/lib/python3.8/site-packages/urllib3/response.py", line 627, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/home/runner/Article-Word-counter/venv/lib/python3.8/site-packages/urllib3/response.py", line 599, in read
    data = self._decode(data, decode_content, flush_decoder)
  File "/home/runner/Article-Word-counter/venv/lib/python3.8/site-packages/urllib3/response.py", line 409, in _decode
    raise DecodeError(
urllib3.exceptions.DecodeError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 59, in <module>
    main()
  File "main.py", line 44, in main
    file_input(file)
  File "main.py", line 35, in file_input
    df.at[i, "Word Count"] = word_counter(df.at[i, "Article"])
  File "main.py", line 13, in word_counter
    page = requests.get(anything)
  File "/home/runner/Article-Word-counter/venv/lib/python3.8/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/home/runner/Article-Word-counter/venv/lib/python3.8/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/runner/Article-Word-counter/venv/lib/python3.8/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/runner/Article-Word-counter/venv/lib/python3.8/site-packages/requests/sessions.py", line 745, in send
    r.content
  File "/home/runner/Article-Word-counter/venv/lib/python3.8/site-packages/requests/models.py", line 899, in content
    self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
  File "/home/runner/Article-Word-counter/venv/lib/python3.8/site-packages/requests/models.py", line 820, in generate
    raise ContentDecodingError(e)
requests.exceptions.ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))

【问题讨论】:

  • 错误是什么? (请提供完整的堆栈跟踪)

标签: python-3.x pandas python-requests


【解决方案1】:

requests.exceptions.ContentDecodingError: ('收到内容编码的响应:gzip,但未能解码。', error('解压数据时出错-3: 错误的头检查'))

似乎服务器的响应表明它是 gzip 编码的,但 requests 在将其视为 gzip 时未能对其进行解码。这可能是服务器配置错误,或更微妙的事情。尝试通过指定 Accept-Encoding 标头来请求非压缩响应(尽管服务器可能不会尊重您的请求):

headers = { 'Accept-Encoding': 'identity' }
page = requests.get(url, headers=headers)

您还可以检查是否可以使用其他工具(例如 curl)或您的网络浏览器访问该 URL。此外,您可以明确检查raw response 以查看服务器实际发送给您的内容。但似乎联系相关 URL 的网站管理员可能是真正的解决方案。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-04-25
    • 2013-05-06
    • 1970-01-01
    相关资源
    最近更新 更多