Python wget 保存一个文件。如何获取变量中的数据答案

【问题标题】：Python wget saves a file. how to get data in variablePython wget 保存一个文件。如何获取变量中的数据
【发布时间】：2015-09-01 15:09:56
【问题描述】：

我在python中使用wget作为

import wget
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

down = wget.download(url)
print down

然后它将 html 数据下载到一个文件中。但我希望它在变量上。我是 python 新手。任何帮助，将不胜感激。提前谢谢

【问题讨论】：

What is the quickest way to HTTP GET in Python?的可能重复
那你为什么用wget呢？为什么不使用requests？
我想删除 facebook 页面，我从stackoverflow.com/questions/18990597/…阅读它

标签： python

【解决方案1】：

您不需要使用wget 将HTML 下载到文件然后读入，您可以直接获取HTML。这是使用requests（在我看来比pythons urllibs好得多）

import requests
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

html = requests.get(url).text
print html

这是一个使用urllib2中内置的python的示例：

import urllib2
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

html = urllib2.urlopen(url).read()
print html

编辑

我知道你的意思是直接从网站获得的 HTML 与从 wget 模块获得的 HTML 之间的区别。以下是使用 wget 模块的方法：

import wget
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

down = wget.download(url)

f = open(down, 'r')
htmlText = "\n".join(f.readlines())
f.close()
print htmlText

【讨论】：

我已经完成了检查元素，但我没有得到确切的文本。所以stackoverflow.com/questions/18990597/… 我读到我们必须使用 wget
@Harish 我明白你现在的意思了......很抱歉我的更新答案应该是你想要的。 wget 模块也不喜欢两次拥有相同的文件。因此，请确保在运行脚本之前始终删除 events 文件，或者在下载之前让脚本将其删除
感谢@heinst，但是当我进行报废时，它仍然显示不同的数据，因为需要登录 facebook 页面才能获得所有访问数据的授权。请帮忙解决。再次感谢
@Harish 这应该是一个全新的问题，需要比现在更多的代码。