给定一个文本文件的 URL，读取文本文件内容的最简单方法是什么？答案

【问题标题】：Given a URL to a text file, what is the simplest way to read the contents of the text file?给定一个文本文件的 URL，读取文本文件内容的最简单方法是什么？
【发布时间】：2020-09-15 05:30:25
【问题描述】：

在 Python 中，当给定文本文件的 URL 时，访问文本文件内容并在本地逐行打印文件内容而不保存文本的本地副本的最简单方法是什么文件？

TargetURL=http://www.myhost.com/SomeFile.txt
#read the file
#print first line
#print second line
#etc

【问题讨论】：

标签： python

【解决方案1】：

我确实认为requests 是最好的选择。还要注意手动设置编码的可能性。

import requests
response = requests.get("http://www.gutenberg.org/files/10/10-0.txt")
# response.encoding = "utf-8"
hehe = response.text

【讨论】：

【解决方案2】：

requests 包非常适合简单的 ui 正如@Andrew Mao 建议的那样

import requests
response = requests.get('http://lib.stat.cmu.edu/datasets/boston')
data = response.text
for i, line in enumerate(data.split('\n')):
    print(f'{i}   {line}')

o/p:

0    The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
1    prices and the demand for clean air', J. Environ. Economics & Management,
2    vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
3    ...', Wiley, 1980.   N.B. Various transformations are used in the table on
4    pages 244-261 of the latter.
5   
6    Variables in order:

在how to extract dataset/dataframe from URL上结帐 kaggle 笔记本

【讨论】：

【解决方案3】：

只需在此处更新@ken-kinder 为 Python 2 建议的解决方案以适用于 Python 3：

import urllib
urllib.request.urlopen(target_url).read()

【讨论】：

【解决方案4】：

对我来说，以上回应都没有直接奏效。相反，我必须执行以下操作（Python 3）：

from urllib.request import urlopen

data = urlopen("[your url goes here]").read().decode('utf-8')

# Do what you need to do with the data.

【讨论】：

【解决方案5】：

编辑 09/2016：在 Python 3 及更高版本中使用 urllib.request 而不是 urllib2

其实最简单的方法是：

import urllib2  # the lib that handles the url stuff

data = urllib2.urlopen(target_url) # it's a file like object and works just like a file
for line in data: # files are iterable
    print line

正如 Will 所建议的，您甚至不需要“readlines”。您甚至可以将其缩短为：^*

import urllib2

for line in urllib2.urlopen(target_url):
    print line

但请记住，在 Python 中，可读性很重要。

但是，这是最简单的方法，但不是安全的方法，因为大多数时候使用网络编程，您不知道预期的数据量是否会得到尊重。因此，您通常最好读取固定且合理数量的数据，您知道这些数据足以满足您期望的数据，但会防止您的脚本被淹没：

import urllib2

data = urllib2.urlopen("http://www.google.com").read(20000) # read only 20 000 chars
data = data.split("\n") # then split it into lines

for line in data:
    print line

^{* Python 3 中的第二个示例：}

import urllib.request  # the lib that handles the url stuff

for line in urllib.request.urlopen(target_url):
    print(line.decode('utf-8')) #utf-8 or iso8859-1 or whatever the page encoding scheme is

【讨论】：

【解决方案6】：

requests library 具有更简单的界面，可与 Python 2 和 3 一起使用。

import requests

response = requests.get(target_url)
data = response.text

【讨论】：

【解决方案7】：

Python 3 中的另一种方法是使用urllib3 package。

import urllib3

http = urllib3.PoolManager()
response = http.request('GET', target_url)
data = response.data.decode('utf-8')

这可能是比 urllib 更好的选择，因为 urllib3 拥有

线程安全。

连接池。

客户端 SSL/TLS 验证。

使用多部分编码的文件上传。

重试请求和处理 HTTP 重定向的帮助程序。

支持 gzip 和 deflate 编码。

对 HTTP 和 SOCKS 的代理支持。

100% 的测试覆盖率。

【讨论】：

requests 库部分基于 urllib3。
实际上这是上述答案中唯一一个将为迄今为止最新版本的 Python 安装 (urllibx) 的答案。

【解决方案8】：

我是 Python 的新手，在接受的解决方案中关于 Python 3 的随意评论令人困惑。对于后代，在 Python 3 中执行此操作的代码是

import urllib.request
data = urllib.request.urlopen(target_url)

for line in data:
    ...

或者

from urllib.request import urlopen
data = urlopen(target_url)

请注意，仅import urllib 不起作用。

【讨论】：

【解决方案9】：

真的没有必要逐行阅读。你可以像这样得到整个东西：

import urllib
txt = urllib.urlopen(target_url).read()

【讨论】：

不起作用：AttributeError: module 'urllib' has no attribute 'urlopen'
此答案仅适用于 Python 2。编辑：请参阅Andrew Mao's answer for Python 3。
对于 Python 3，它将是：txt = urllib.request.urlopen(target_url).read()

【解决方案10】：

import urllib2
for line in urllib2.urlopen("http://www.myhost.com/SomeFile.txt"):
    print line

【讨论】：

【解决方案11】：

import urllib2

f = urllib2.urlopen(target_url)
for l in f.readlines():
    print l

【讨论】：

+1，但请注意这是最简单的方法，而不是最安全的方法。如果服务器端发生任何错误，并且永远发送此内容，您可能会陷入无限循环。