Python BS4 未在 find_all() 函数中返回 Unicode 值答案

【问题标题】：Python BS4 not returning a Unicode value in find_all() functionPython BS4 未在 find_all() 函数中返回 Unicode 值
【发布时间】：2017-11-28 02:08:55
【问题描述】：

对于一个学校项目，我想编写一个 python 程序，从这个网站提取比特币的当前价值：http://www.coindesk.com/price/。为此，我安装了 BeautifulSoup4 和 Requests 库，以便提取 HTML 数据并对其进行解析，但是当实际得到价格时，我的程序什么也没有返回。这是我想要得到的picture。这是我的代码：

import requests as r
from bs4 import BeautifulSoup as bs
doc = r.get("http://www.coindesk.com/price/")
soup = bs(doc.content, "html.parser")
price = soup.find_all("a", {"class":"bpiUSD"})
text = []
contents = []
for item in price:
    text.append(item.text)
for item in price:
    contents.append(item.contents)
print "text:", type(text[0])
print "contents:", type(contents[0])
print "text[0]:", text[0]
print "contents[0]", contents[0]

这是输出：

text: <type 'unicode'>
contents: <type 'list'>
text[0]: 
contents[0] []

我用这种方式来获取字符串和数字，它可以工作，但是当涉及到这个特定的数字时，它什么也没返回。另外，我知道比特币价格是 Unicode 格式的（至少我假设是这样的），我尝试将其转换为字符串值，但尽管 .type() 函数确实提到列表是 Unicode，但没有任何效果。

【问题讨论】：

price 实际上并不包含价格。制作后立即打印价格检查。价格可能由一些 javascript 代码决定，所以你可能想使用 Selenium webdriver 之类的东西
如果要使用request和bs4，就得另找网站了
等一下，我正在寻找的值是由单独的 javascript 程序生成的吗？对不起，如果我的任务看起来很愚蠢，但我只是一个初学者。
很有可能。你也可以试试dryscrape之类的东西

标签： python python-2.7 unicode python-requests bs4

【解决方案1】：

您要么必须找到其他网站，要么使用 selenium webdriver。价格由请求不执行的 javascript 生成。

from bs4 import BeautifulSoup as bs

doc = r.get("http://www.coindesk.com/price/")
soup = bs(doc.content, "lxml")
price = soup.find_all(class_="currency-price")
print(price)

打印：

[<div class="currency-price">
<a class="bpiUSD" href="/price/" style="color:white;"></a>
</div>, <div class="currency-price">
<a class="bpiUSD" href="/price/" style="color:white;"></a>
</div>]

其中不包含您的号码。如果您检查网站上的 html，它将在 a 标签之间有数字。使用 selenium 之类的库可以让您运行 javascript。

【讨论】：

【解决方案2】：

您尝试使用 Beautiful soup 解析的网站正在通过 javascript 调用呈现，这些调用从 api 发布数据的 json 再现中获取数据，即 coindesk api。这就是为什么你漂亮的汤调用不起作用的原因。

要获取此数据，您需要使用 requests 对 json 进行请求，然后迭代到您需要的数据。

我在下面的脚本中为您完成了该过程。我添加了注释，以便您了解我在每个部分中所做的事情。它可以用更少的代码行来完成，但我认为这将帮助您更好地理解如何循环遍历 json。

这是在python 3中，如果您希望python2.7中的输出更漂亮，请删除打印语句周围的括号。

import requests
jsonurl = 'http://api2.coindesk.com/site/headerdata.json?currency=BTC'
json = requests.get(jsonurl).json()

for key, value in json.items():                     #Loop through the first branch of the json
    if type(value) == type(dict()):                 #each branch that has a dictionary is what contains the currency and rate
        for subKey, subValue in value.items():      #Loop through those dictionaries
            if type(subValue) == type(dict()):      #if there is a dictionary in this key value pair then we loop through them.
                for subKey1, subValue1 in subValue.items():     #Our final loop
                    if subKey1 == 'rate_float':                 #The rates are held in rate_float value in this key
                        print('exchange: ' + subKey, 'rate: ' +  str(subValue1)             )

【讨论】：