代码不断产生一个空字符串答案

【问题标题】：Code keeps producing an empty string代码不断产生一个空字符串
【发布时间】：2016-01-05 13:12:47
【问题描述】：

我不明白为什么下面的代码总是产生一个空字符串。我正在尝试获取将网站内容提取到“txt”文件的代码，但它只是不断产生一个空字符串。代码有错误吗？

import urllib3
import certifi


# Function: Convert information within html document to a text file
# Append information to the file
def html_to_text(source_html, target_file):

    http = urllib3.PoolManager(
        cert_reqs='CERT_REQUIRED',      # Force certificate check.
        ca_certs=certifi.where(),       # Path to the Certifi Bundle
        headers={'connection': 'keep-alive', 'user-agent': 'Mozilla/5.0', 'accept-encoding': 'gzip, deflate'},
    )

    r = http.urlopen('GET', source_html)
    print(source_html)
    response = r.read().decode('utf-8')
    # TODO: Find the problem that keeps making the code produce an empty string
    print(response)
    temp_file = open(target_file, 'w+')
    temp_file.write(response)


source_address = "https://sg.finance.yahoo.com/lookup/all?s=*&t=A&m=SG&r=&b=0"
target_location = "C:\\Users\\Admin\\PycharmProjects\\TheLastPuff\\Source\\yahoo_ticker_symbols.txt"

html_to_text(source_address, target_location)

【问题讨论】：

当您说“生产”时，您的意思是“打印”、“写入文件”还是“打印和写入文件”？ print(source_html) 和 print(response) 是否打印任何内容？
打印和写入功能都没有产生任何东西。 “print(source_html)”确实成功打印了“source_address”。
r 对象似乎有一个包含响应正文的r.data 属性。 urllib3.readthedocs.org/en/latest/#usage
@Cloud 我刚刚在我的电脑上测试了它，它工作得很好，它打印并在文件中写入了网站源代码。
难道不应该尊重站长不被刮的愿望吗？

标签： python python-3.x urllib3

【解决方案1】：

我收到以下代码的响应。唯一相关的更改是使用r.data 而不是r.read()。

import urllib3
import certifi


def html_to_text(source_html):

    http = urllib3.PoolManager(
        cert_reqs='CERT_REQUIRED',      # Force certificate check.
        ca_certs=certifi.where(),       # Path to the Certifi Bundle
        headers={'connection': 'keep-alive', 'user-agent': 'Mozilla/5.0',    'accept-encoding': 'gzip, deflate'},
    )

    r=http.urlopen('GET', source_html)
    print(source_html)
    print(r.headers)
    response = r.data                   # instead of read().decode('utf-8')
    print(response)


source_address = "https://sg.finance.yahoo.com/lookup/all?s=*&t=A&m=SG&r=&b=0"

html_to_text(source_address)

使用过的版本：

>>> certifi.__version__
'2015.11.20.1'
>>> urllib3.__version__
'1.14'
>>> sys.version
'3.5.1 (default, Dec  7 2015, 12:58:09) \n[GCC 5.2.0]'

【讨论】：

这段代码似乎可以工作，但我收到另一个错误：“urllib.error.HTTPError: HTTP Error 502: Server Hangup”。我认为这是把我踢出去的网站。