【问题标题】:Getting data from World Bank API using pandas使用 pandas 从 World Bank API 获取数据
【发布时间】:2022-01-04 19:13:25
【问题描述】:

我正在尝试获取一个数据表,该表仅获取来自 World Bank API 的国家、年份和价值,但我似乎无法筛选出我想要的数据。我看到已经提出过这类问题,但似乎所有答案都不起作用。

非常感谢一些帮助。谢谢!

import requests
import pandas as pd
from bs4 import BeautifulSoup
import json
url ="http://api.worldbank.org/v2/country/{}/indicator/NY.GDP.PCAP.CD?date=2015&format=json"
country = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
          "GRC""HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
          "NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
          "GBR","USA","VNM","ZWE"]

html={}
for i in country:
 url_one = url.format(i)
 html[i] = requests.get(url_one).json()
my_values=[]
for i in country:

  value=html[i][1][0]['value']
  my_values.append(value)

编辑

我的数据目前看起来像这样,我正在尝试提取 '{'country': {'id': 'AO', 'value': 'Angola''} 中的国家名称,'date '和'价值'

编辑 2 得到了我正在寻找的数据,但每次重复两次

【问题讨论】:

  • Pandas 已经有 bultin 工具来读取 json 文件,pandas.read_json("your_url")
  • 嗯,不,这似乎不起作用。我添加了最初问题的消息
  • 你当前正在获取一个列表,尝试打开链接看看你到底想要什么,然后用列表切片提取想要的字典
  • @AkmalSoliev 啊,是的,我明白这一点,但我无法提取我需要的东西,因为数据嵌套在许多数组中
  • 我看到你找到了答案,现在一种快速而肮脏的方法就是删除重复项

标签: python pandas dataframe api


【解决方案1】:

注意: 假设一次存储所有年份的信息而不仅仅是一年 - 使您能够在以后的处理中简单地过滤。看一下,你们国家之间少了一个“,”"GRC""HUN"

有不同的选择可以实现您的目标,只需将其中两个指向正确的方向即可。

选项 #1

从 json 响应中选择需要的信息,创建一个重构的 dict 并将 append() 它发送到 my_values

for d in data[1]:

    my_values.append({
        'country':d['country']['value'],
        'date':d['date'],
        'value':d['value']
    })

示例

import requests
import pandas as pd


url = 'http://api.worldbank.org/v2/country/%s/indicator/NY.GDP.PCAP.CD?format=json'
countries = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
          "GRC","HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
          "NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
          "GBR","USA","VNM","ZWE"]
    
my_values = []
for country in countries:
    data = requests.get(url %country).json()

    try:
        for d in data[1]:
            my_values.append({
                'country':d['country']['value'],
                'date':d['date'],
                'value':d['value']
            })
    except Exception as err:
        print(f'[ERROR] country ==> {country} error ==> {err}')

pd.DataFrame(my_values).sort_values(['country', 'date'], ascending=True)

选项 #2

直接从 json 响应创建一个数据帧,连接它们并对最终的数据帧进行一些调整:

for d in data[1]:
    my_values.append(pd.DataFrame(d))

...

pd.concat(my_values).loc[['value']][['country','date','value']].sort_values(['country', 'date'], ascending=True)

输出

country date value
Algeria 1971 341.389
Algeria 1972 442.678
Algeria 1973 554.293
Algeria 1974 818.008
Algeria 1975 936.79
... ... ...
Zimbabwe 2016 1464.59
Zimbabwe 2017 1235.19
Zimbabwe 2018 1254.64
Zimbabwe 2019 1316.74
Zimbabwe 2020 1214.51

【讨论】:

  • 这就是我要找的。太感谢了! (也指出了缺失的''。是的-我一直在寻找所有年份,而不仅仅是一年。再次感谢您
【解决方案2】:

Pandas read_json 方法需要有效的 JSON 字符串、路径对象或类似文件的对象,但您输入了字符串。 https://pandas.pydata.org/docs/reference/api/pandas.read_json.html

试试这个:

import requests
import pandas as pd


url = "http://api.worldbank.org/v2/country/%s/indicator/NY.GDP.PCAP.CD?date=2015&format=json"
countries = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
          "GRC""HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
          "NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
          "GBR","USA","VNM","ZWE"]

datas = []
for country in countries:
    data = requests.get(url %country).json()
    try:
        values = data[1][0]
        datas.append(pd.DataFrame(values))
    except Exception as err:
        print(f"[ERROR] country ==> {country} with error ==> {err}")

df = pd.concat(datas)

【讨论】:

  • 嗯,我仍然收到错误消息,for country in countrys: 中是否有错字
  • @Jadewest 和现在
  • 它正在工作。非常感谢!但是同一个项目似乎出现了两次,一次是国家名称,一次是国家代码(请参阅编辑 2 下添加的屏幕截图)-知道如何解决这个问题吗?
猜你喜欢
  • 2022-08-17
  • 1970-01-01
  • 1970-01-01
  • 2020-09-23
  • 2016-11-27
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多