【问题标题】:Turn pandas nested JSON structure into a data frame将 pandas 嵌套的 JSON 结构转换为数据框
【发布时间】:2021-08-26 20:44:31
【问题描述】:

我有作为嵌套 JSON 的输出。如何获取这个嵌套的 JSON 结构并将其更改为数据框?

我认为有两个主要级别“行情”和“承运人”。我有兴趣将“引号”作为数据框中的行。

{
  "Quotes" : [ {
    "QuoteId" : 1,
    "MinPrice" : 1765,
    "Direct" : false,
    "OutboundLeg" : {
      "CarrierIds" : [ 881 ],
      "OriginId" : 56949,
      "DestinationId" : 45348,
      "DepartureDate" : "2021-08-31T00:00:00"
    },
    "QuoteDateTime" : "2021-06-09T09:15:00"
  }, {
    "QuoteId" : 2,
    "MinPrice" : 1774,
    "Direct" : false,
    "OutboundLeg" : {
      "CarrierIds" : [ 881 ],
      "OriginId" : 56949,
      "DestinationId" : 45348,
      "DepartureDate" : "2021-07-06T00:00:00"
    },
    "QuoteDateTime" : "2021-06-08T11:49:00"
  }, {
    "QuoteId" : 3,
    "MinPrice" : 1792,
    "Direct" : false,
    "OutboundLeg" : {
      "CarrierIds" : [ 881 ],
      "OriginId" : 56949,
      "DestinationId" : 45348,
      "DepartureDate" : "2021-10-12T00:00:00"
    },
    "QuoteDateTime" : "2021-06-07T01:22:00"
  }, {
    "QuoteId" : 4,
    "MinPrice" : 1792,
    "Direct" : false,
    "OutboundLeg" : {
      "CarrierIds" : [ 881 ],
      "OriginId" : 56949,
      "DestinationId" : 45348,
      "DepartureDate" : "2022-03-01T00:00:00"
    },
    "QuoteDateTime" : "2021-06-07T03:28:00"
  }, {
    "QuoteId" : 5,
    "MinPrice" : 2458,
    "Direct" : false,
    "OutboundLeg" : {
      "CarrierIds" : [ 881 ],
      "OriginId" : 56949,
      "DestinationId" : 45348,
      "DepartureDate" : "2021-06-19T00:00:00"
    },
    "QuoteDateTime" : "2021-06-07T19:28:00"
  }, {
    "QuoteId" : 6,
    "MinPrice" : 2462,
    "Direct" : false,
    "OutboundLeg" : {
      "CarrierIds" : [ 881 ],
      "OriginId" : 56949,
      "DestinationId" : 45348,
      "DepartureDate" : "2021-12-06T00:00:00"
    },
    "QuoteDateTime" : "2021-06-06T19:16:00"
  }, {
    "QuoteId" : 7,
    "MinPrice" : 2734,
    "Direct" : true,
    "OutboundLeg" : {
      "CarrierIds" : [ 234 ],
      "OriginId" : 56949,
      "DestinationId" : 45348,
      "DepartureDate" : "2021-06-19T00:00:00"
    },
    "QuoteDateTime" : "2021-06-06T20:26:00"
  }, {
    "QuoteId" : 8,
    "MinPrice" : 2734,
    "Direct" : true,
    "OutboundLeg" : {
      "CarrierIds" : [ 234 ],
      "OriginId" : 56949,
      "DestinationId" : 45348,
      "DepartureDate" : "2021-08-02T00:00:00"
    },
    "QuoteDateTime" : "2021-06-06T20:27:00"
  }, {
    "QuoteId" : 9,
    "MinPrice" : 2760,
    "Direct" : true,
    "OutboundLeg" : {
      "CarrierIds" : [ 234 ],
      "OriginId" : 56949,
      "DestinationId" : 45348,
      "DepartureDate" : "2021-07-02T00:00:00"
    },
    "QuoteDateTime" : "2021-06-07T06:11:00"
  }, {
    "QuoteId" : 10,
    "MinPrice" : 4126,
    "Direct" : true,
    "OutboundLeg" : {
      "CarrierIds" : [ 234 ],
      "OriginId" : 56949,
      "DestinationId" : 45348,
      "DepartureDate" : "2021-12-15T00:00:00"
    },
    "QuoteDateTime" : "2021-06-06T19:16:00"
  } ],
  "Carriers" : [ {
    "CarrierId" : 234,
    "Name" : "Airlink"
  }, {
    "CarrierId" : 881,
    "Name" : "British Airways"
  } ],
  "Places" : [ {
    "Name" : "Cape Town",
    "Type" : "Station",
    "PlaceId" : 45348,
    "IataCode" : "CPT",
    "SkyscannerCode" : "CPT",
    "CityName" : "Cape Town",
    "CityId" : "CPTA",
    "CountryName" : "South Africa"
  }, {
    "Name" : "Harare",
    "Type" : "Station",
    "PlaceId" : 56949,
    "IataCode" : "HRE",
    "SkyscannerCode" : "HRE",
    "CityName" : "Harare",
    "CityId" : "HREA",
    "CountryName" : "Zimbabwe"
  } ],
  "Currencies" : [ {
    "Code" : "ZAR",
    "Symbol" : "R",
    "ThousandsSeparator" : ",",
    "DecimalSeparator" : ".",
    "SymbolOnLeft" : true,
    "SpaceBetweenAmountAndSymbol" : true,
    "RoundingCoefficient" : 0,
    "DecimalDigits" : 2
  } ]
}

编辑 1:

我尝试了类似下面的代码,但我不明白如何使用将这些嵌套的 JSON 结构转换为数据帧:

import json

with open('myJson.json') as data_file:    
    data = json.load(data_file)  

df = pd.json_normalize(data, 'Quotes', ["QuoteId", "MinPrice", "Direct",  "DestinationId" , "DepartureDate", "QuoteDateTime"], 
                    record_prefix='Quotes_')

我也发现了一个类似的问题here

【问题讨论】:

  • 我建议谷歌搜索“熊猫嵌套 json”
  • 谢谢@AlexHall,我现在正在查看。看起来它可能正是解决方案。
  • 提醒:如果APIkey 是一个真正的价值,那么它现在就被宠坏了!
  • @AlexHall,我认为这是我正在努力实施的解决方案。我已经简化了这个问题。

标签: python json pandas data-wrangling json-normalize


【解决方案1】:

如你所愿:

COLS = ["QuoteId", "MinPrice", "Direct", "DestinationId",
        "DepartureDate", "QuoteDateTime"]

df1 = pd.DataFrame(data["Quotes"])
df11 = pd.DataFrame(df1["OutboundLeg"].to_list())

quotes = pd.concat([df1, df11], axis="columns")[COLS].add_prefix("Quotes_")
>>> quotes
   Quotes_QuoteId  Quotes_MinPrice  Quotes_Direct  Quotes_DestinationId Quotes_DepartureDate Quotes_QuoteDateTime
0               1             1765          False                 45348  2021-08-31T00:00:00  2021-06-09T09:15:00
1               2             1774          False                 45348  2021-07-06T00:00:00  2021-06-08T11:49:00
2               3             1792          False                 45348  2021-10-12T00:00:00  2021-06-07T01:22:00
3               4             1792          False                 45348  2022-03-01T00:00:00  2021-06-07T03:28:00
4               5             2458          False                 45348  2021-06-19T00:00:00  2021-06-07T19:28:00
5               6             2462          False                 45348  2021-12-06T00:00:00  2021-06-06T19:16:00
6               7             2734           True                 45348  2021-06-19T00:00:00  2021-06-06T20:26:00
7               8             2734           True                 45348  2021-08-02T00:00:00  2021-06-06T20:27:00
8               9             2760           True                 45348  2021-07-02T00:00:00  2021-06-07T06:11:00
9              10             4126           True                 45348  2021-12-15T00:00:00  2021-06-06T19:16:00

【讨论】:

  • 我在 df1,TypeError 上得到一个错误:字符串索引必须是整数
  • 但这正是我想要达到的@Corralien。我将添加拉取数据的函数,然后您可以看到数据是如何导入的。
  • 它适用于您的示例(data
  • 我正在使用上面的语句来导入数据。我只是试图简化问题,因为 import 语句正在工作。 @Corralien,这可能是问题吗?
  • 这是因为data 是一个字符串......而不是一个字典。您需要将您的 json 字符串转换为 python dict:data = json.loads(data)
猜你喜欢
  • 2018-09-07
  • 1970-01-01
  • 2017-03-21
  • 2020-12-16
  • 1970-01-01
  • 2021-12-21
  • 2019-11-24
  • 1970-01-01
相关资源
最近更新 更多