如何使用 Python 读取 URL 的内容？答案

【问题标题】：How can I read the contents of an URL with Python?如何使用 Python 读取 URL 的内容？
【发布时间】：2013-02-28 14:55:56
【问题描述】：

当我将它粘贴到浏览器上时，以下工作：

http://www.somesite.com/details.pl?urn=2344

但是当我尝试使用 Python 读取 URL 时，什么也没有发生：

 link = 'http://www.somesite.com/details.pl?urn=2344'
 f = urllib.urlopen(link)           
 myfile = f.readline()  
 print myfile

我需要对 URL 进行编码，还是有什么我看不到的东西？

【问题讨论】：

标签： python

【解决方案1】：

回答你的问题：

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)
myfile = f.read()
print(myfile)

你需要read()，而不是readline()

编辑 (2018-06-25)：从 Python 3 开始，旧版 urllib.urlopen() 被 urllib.request.urlopen() 取代（有关详细信息，请参阅来自 https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen 的注释）。

如果您使用的是 Python 3，请参阅 Martin Thoma 或 i.n.n.m 在此问题中的回答： https://stackoverflow.com/a/28040508/158111 (Python 2/3 兼容) https://stackoverflow.com/a/45886824/158111 (Python 3)

或者，在这里获取这个库：http://docs.python-requests.org/en/latest/ 并认真使用它:)

import requests

link = "http://www.somesite.com/details.pl?urn=2344"
f = requests.get(link)
print(f.text)

【讨论】：

@KiranSubbaraman 这是一个非常好的项目，从 API 到代码结构
我也推荐并鼓励程序员使用新品牌requests Module，它的使用更适合Pythonic代码。
我在 python 3.5.2 上收到以下错误：Traceback (most recent call last): File "/home/lars/parser.py", line 9, in <module> f = urllib.urlopen(link) AttributeError: module 'urllib' has no attribute 'urlopen' 似乎 python 3.5 中没有 urlopen 函数。改名了吗？编辑：下面的答案片段解决了：from urllib.request import urlopen
@user7185318 是的，在 Python 3 中，urlib 包进行了一些重构和 API 更改。我将更新答案以强调 Python 2。
如果提供的链接要求输入用户名和密码怎么办？那么代码怎么改呢？

【解决方案2】：

python3的用户，为了节省时间，使用下面的代码，

from urllib.request import urlopen

link = "https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html"

f = urlopen(link)
myfile = f.read()
print(myfile)

我知道错误有不同的线程：Name Error: urlopen is not defined，但我认为这可能会节省时间。

【讨论】：

这不是使用 python3 从 url 读取数据的最佳方式，因为它错过了 'with' 语句的好处。看我的回答：stackoverflow.com/a/56295038/908316
不，这不适用于 while 循环。一个电话。如果你问我，这很糟糕

【解决方案3】：

这些答案都不是非常适合 Python 3（在本文发布时已在最新版本上测试）。

这就是你的做法......

import urllib.request

try:
   with urllib.request.urlopen('http://www.python.org/') as f:
      print(f.read().decode('utf-8'))
except urllib.error.URLError as e:
   print(e.reason)

以上内容适用于返回“utf-8”的内容。 .decode('utf-8') 如果你想让 python “猜测适当的编码”，请删除。

文档： https://docs.python.org/3/library/urllib.request.html#module-urllib.request

【讨论】：

谢谢，原始代码是为 Python 2 编写的，但您在此处的贡献已被记录。

【解决方案4】：

适用于 Python 2.X 和 Python 3.X 的解决方案利用 Python 2 和 3 兼容库 six：

from six.moves.urllib.request import urlopen
link = "http://www.somesite.com/details.pl?urn=2344"
response = urlopen(link)
content = response.read()
print(content)

【讨论】：

【解决方案5】：

我们可以读取网站的html内容如下：

from urllib.request import urlopen
response = urlopen('http://google.com/')
html = response.read()
print(html)

【讨论】：

这与@i.n.n.m 的回答相同

【解决方案6】：

#!/usr/bin/python
# -*- coding: utf-8 -*-
# Works on python 3 and python 2.
# when server knows where the request is coming from.

import sys

if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    from urllib import urlopen
with urlopen('https://www.facebook.com/') as \
    url:
    data = url.read()

print data

# When the server does not know where the request is coming from.
# Works on python 3.

import urllib.request

user_agent = \
    'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

url = 'https://www.facebook.com/'
headers = {'User-Agent': user_agent}

request = urllib.request.Request(url, None, headers)
response = urllib.request.urlopen(request)
data = response.read()
print data

【讨论】：

【解决方案7】：

URL 应该是一个字符串：

import urllib

link = "http://www.somesite.com/details.pl?urn=2344"
f = urllib.urlopen(link)           
myfile = f.readline()  
print myfile

【讨论】：

' 和 " 在 Python 中都是字符串

【解决方案8】：

我使用了以下代码：

import urllib

def read_text():
      quotes = urllib.urlopen("https://s3.amazonaws.com/udacity-hosted-downloads/ud036/movie_quotes.txt")
      contents_file = quotes.read()
      print contents_file

read_text()

【讨论】：

【解决方案9】：

# retrieving data from url
# only for python 3

import urllib.request

def main():
  url = "http://docs.python.org"

# retrieving data from URL
  webUrl = urllib.request.urlopen(url)
  print("Result code: " + str(webUrl.getcode()))

# print data from URL 
  print("Returned data: -----------------")
  data = webUrl.read().decode("utf-8")
  print(data)

if __name__ == "__main__":
  main()

【讨论】：

【解决方案10】：

from urllib.request import urlopen

# if has Chinese, apply decode()
html = urlopen("https://blog.csdn.net/qq_39591494/article/details/83934260").read().decode('utf-8')
print(html)

【讨论】：

感谢您提供此代码 sn-p，它可能会提供一些有限的即时帮助。 proper explanation 将通过展示为什么这是解决问题的好方法，并使其对有其他类似问题的未来读者更有用，从而大大提高其长期价值。请edit您的回答添加一些解释，包括您所做的假设。