【问题标题】:How do I convert a Json from GET request into pandas dataframe如何将 Json 从 GET 请求转换为 pandas 数据帧
【发布时间】:2020-01-03 16:53:55
【问题描述】:

我正在使用 GoToWebinar API 从网络研讨会获取数据。 我做了所有事情,但我的脚本中唯一缺少的一点是将我得到的 Json 转换为 pandas 中的数据框,以便我可以进行分析。

我得到的Json文件结构如下(我屏蔽了数据):

{
  "_embedded": {
    "webinars": [
      {
        "webinarKey": "GGGGGGGGGGGGGGGG",
        "webinarId": "BBBBBBBBBBB",
        "organizerKey": "RRRRRRRRRRRRR",
        "omid": "RRRRRRRRRRR",
        "accountKey": "WWWWWWWWWWW",
        "recurrenceKey": "EEEEEEEEEEEEEEEEE21",
        "subject": "LEEEEEEEEEESEon",
        "description": "EEEEEEEEEEEEE",
        "times": [
          {
            "startTime": "2019-07-01T13:00:00Z",
            "endTime": "2019-07-01T13:30:00Z"
          }
        ],
        "timeZone": "America/New_York",
        "locale": "en_US",
        "status": "UPDATED",
        "approvalType": "AUTOMATIC",
        "registrationUrl": "https://attendee.gotowebinar.com/rt/XXXXXXXXXXXXXXXX",
        "impromptu": false,
        "isPasswordProtected": false,
        "recurrenceType": "series",
        "experienceType": "broadcast",
        "registrationSettingsKey": "DDDDDDDD"
      },
      {
        "webinarKey": "GGGGGGGGGGGGGGGG",
        "webinarId": "BBBBBBBBBBB",
        "organizerKey": "RRRRRRRRRRRRR",
        "omid": "RRRRRRRRRRR",
        "accountKey": "WWWWWWWWWWW",
        "recurrenceKey": "EEEEEEEEEEEEEEEEE21",
        "subject": "LEEEEEEEEEESEon",
        "description": "EEEEEEEEEEEEE",
        "times": [
          {
            "startTime": "2019-07-01T13:00:00Z",
            "endTime": "2019-07-01T13:30:00Z"
          }
        ],
        "timeZone": "America/New_York",
        "locale": "en_US",
        "status": "UPDATED",
        "approvalType": "AUTOMATIC",
        "registrationUrl": "https://attendee.gotowebinar.com/rt/XXXXXXXXXXXXXXXX",
        "impromptu": false,
        "isPasswordProtected": false,
        "recurrenceType": "series",
        "experienceType": "broadcast",
        "registrationSettingsKey": "DDDDDDDD"
      },
..other webinars.....
 ]
  },
  "page": {
    "size": 10,
    "totalElements": 26,
    "totalPages": 3,
    "number": 0
  }
}

这是我的代码,我基本上不知道如何进行。 我尝试了 DataFrame.from_dict、read_json 以及这里提出的解决方案:Convert JSON data from Request into Pandas DataFrame

'''Getting the webinar lists'''
base_url = 'https://api.getgo.com/G2W/rest/v2'

##setting up paramters
param_1 = '2019-07-01T10%3A00%3A00Z'
param_2 = '2019-09-01T10%3A00%3A00Z'

##buidling the path
path = base_url + '/accounts/' + account_key + '/webinars?fromTime=' + param_1 +'&toTime=' + param_2
print(path)

headers = {'accept' : 'application/json' , 'Authorization' : access_token}

webinars_req = session.get(path, headers = headers)

webinars_json = webinars_req.json()

我想要一个数据框,其中包含所有内部标签(例如 webinarkey、webinarid 等)作为具有对应值的列...

希望大家帮忙!

【问题讨论】:

  • 嗨,你能用所需的数据框更新问题吗?它会很容易使用。

标签: python json pandas post python-requests


【解决方案1】:

你可以试试requests模块

import requests

webinars_req = requests.get(path, headers = headers)
df = pd.read_json(webinars_req.text, ignore_index=True)

【讨论】:

  • 感谢您的回复。不幸的是,这不起作用,因为返回的类型实际上是字典类型而不是 json 对象。我在下面发布了我的解决方案,以防它有用:-)
【解决方案2】:

好的,我做到了! 基本上我只需要在网络研讨会级别从字典中获取列表,然后将其放入数据框中:

webinars_json = webinars_req.json()

##put all webinars data in a dataframe

webinars_list = webinars_json.get('_embedded').get('webinars')
df_webinars = pd.DataFrame(webinars_list)

效果很好:-) 希望这会对某人有所帮助

【讨论】:

    【解决方案3】:

    js = { "_embedded": { "webinars": [ { "webinarKey": "GGGGGGGGGGGGGGGG", "webinarId": "BBBBBBBBBBB", "organizerKey": "RRRRRRRRRRRRR", "omid": "RRRRRRRRRRR", "accountKey": "WWWWWWWWWWW", "recurrenceKey": "EEEEEEEEEEEEEEEEE21", "subject": "LEEEEEEEEEESEon", "description": "EEEEEEEEEEEEE", "times": [ { "startTime": "2019-07-01T13:00:00Z", "endTime": "2019-07-01T13:30:00Z" } ], "timeZone": "America/New_York", "locale": "en_US", "status": "UPDATED", "approvalType": "AUTOMATIC", "registrationUrl": "https://attendee.gotowebinar.com/rt/XXXXXXXXXXXXXXXX", "impromptu": "false", "isPasswordProtected": "false", "recurrenceType": "series", "experienceType": "broadcast", "registrationSettingsKey": "DDDDDDDD" }, { "webinarKey": "GGGGGGGGGGGGGGGG", "webinarId": "BBBBBBBBBBB", "organizerKey": "RRRRRRRRRRRRR", "omid": "RRRRRRRRRRR", "accountKey": "WWWWWWWWWWW", "recurrenceKey": "EEEEEEEEEEEEEEEEE21", "subject": "LEEEEEEEEEESEon", "description": "EEEEEEEEEEEEE", "times": [ { "startTime": "2019-07-01T13:00:00Z", "endTime": "2019-07-01T13:30:00Z" } ], "timeZone": "America/New_York", "locale": "en_US", "status": "UPDATED", "approvalType": "AUTOMATIC", "registrationUrl": "https://attendee.gotowebinar.com/rt/XXXXXXXXXXXXXXXX", "impromptu": "false", "isPasswordProtected": "false", "recurrenceType": "series", "experienceType": "broadcast", "registrationSettingsKey": "DDDDDDDD" } ] } }

    import json 
    from pandas.io.json import json_normalize 
    s = json.dumps(js) #convert dict to string 
    data = json.loads(s) #load str as json 
    #also look at meta arguments for json_normalize 
    df = json_normalize(data=data['_embedded'], record_path=['webinars'])
    

    【讨论】:

      猜你喜欢
      • 2019-08-31
      • 1970-01-01
      • 2018-10-27
      • 1970-01-01
      • 1970-01-01
      • 2018-06-15
      • 1970-01-01
      • 2015-09-08
      • 1970-01-01
      相关资源
      最近更新 更多