【问题标题】:Trying to find something specific in html code试图在 html 代码中找到特定的东西
【发布时间】:2021-05-04 05:59:25
【问题描述】:

我正在尝试查找山寨币的特定 ID,但不知道该怎么做。当我打印时,我得到一个很长的 json 脚本,我在试图找到它时迷失了方向。有没有更简单的方法?

from bs4 import BeautifulSoup
import requests
import pandas as pd
import json
import time


cmc = requests.get('https://coinmarketcap.com/')
soup = BeautifulSoup(cmc.content, 'html.parser')

print(soup.prettify())

我想要的输出是确定与山寨币对应的确切 id。下面的输出是一个硬币,但它是一个很长的列表,如果不手动查找,我很难找到确切的那个。

{"id":1,"name":"Bitcoin","symbol":"BTC","slug":"bitcoin","max_supply":21000000,"circulating_supply":18614718,"total_supply":18614718,"last_updated":"2021-01-30T15:00:02.000Z","quote":{"USD":{"price":34177.31601866782,"volume_24h":83208963467.24487,"percent_change_1h":1.15037986,"percent_change_24h":-10.87555443,"percent_change_7d":7.03677315,"percent_change_30d":19.84946991,"market_cap":636201099684.3843,"last_updated":"2021-01-30T15:00:02.000Z"}},"rank":1,"noLazyLoad":true}

【问题讨论】:

  • 你想要的输出是什么?
  • 我更新了问题@baduker
  • 您的代码应该打印 html。不清楚你从哪里得到这个 JSON。另外,coinmarketcap 有一个 API,所以你不需要 beautifulsoup coinmarketcap.com/api
  • 这个api我知道,但是获取历史数据需要付费

标签: python json beautifulsoup


【解决方案1】:

我仔细查看了 HTML。

您查找的 JSON 字符串数据似乎位于 ID 为 "__NEXT_DATA__"<script> 标记内。

我对 BeautifulSoup 不太熟悉,因此可能存在一种更优雅的方式来获取数据。这是我使用的代码。

cmc = requests.get('https://coinmarketcap.com/')
soup = BeautifulSoup(cmc.content, 'html.parser')

for item in soup.select('script[id="__NEXT_DATA__"]'):
    data = json.loads(item.string) # load JSON string as a dict
    desired_data = data["props"]["initialState"]["cryptocurrency"]["listingLatest"][
        "data"
    ]
    print(
        json.dumps( # pretty output string
            desired_data,
            indent=2,
        ),
    )

截断输出:

[
  {
    "id": 1,
    "name": "Bitcoin",
    "symbol": "BTC",
    "slug": "bitcoin",
    "max_supply": 21000000,
    "circulating_supply": 18614718,
    "total_supply": 18614718,
    "last_updated": "2021-01-30T14:51:02.000Z",
    "quote": {
      "USD": {
        "price": 34138.18238095427,
        "volume_24h": 83651976977.0413,
        "percent_change_1h": 1.36922474,
        "percent_change_24h": -9.82670796,
        "percent_change_7d": 6.33079444,
        "percent_change_30d": 19.72629419,
        "market_cap": 635472638054.0323,
        "last_updated": "2021-01-30T14:51:02.000Z"
      }
    },
    "rank": 1,
    "noLazyLoad": true
  },
  {
    "id": 1027,
    "name": "Ethereum",
    "symbol": "ETH",
    "slug": "ethereum",
    "max_supply": null,
    "circulating_supply": 114465285.999,
    "total_supply": 114465285.999,
    "last_updated": "2021-01-30T14:51:02.000Z",
    "quote": {
      "USD": {
        "price": 1364.155096452962,
        "volume_24h": 38819994919.48616,
        "percent_change_1h": 1.95180621,
        "percent_change_24h": -3.86551103,
        "percent_change_7d": 10.22893483,
        "percent_change_30d": 85.96783538,
        "market_cap": 156148403262.48172,
        "last_updated": "2021-01-30T14:51:02.000Z"
      }
    },
    "rank": 2,
    "noLazyLoad": true
  },…

【讨论】:

  • 哇!!正是我想要的。非常感谢!!!!
猜你喜欢
  • 1970-01-01
  • 2021-11-11
  • 1970-01-01
  • 1970-01-01
  • 2019-04-25
  • 1970-01-01
  • 2022-01-23
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多