【问题标题】:Transforming nested JSON to Pandas df将嵌套的 JSON 转换为 Pandas df
【发布时间】:2020-10-17 03:02:23
【问题描述】:

我有一个如下所示的 JSON:

{
  "4.0": {
    "A1": {
      "dR-14": 1.181,
      "ev": 1.102,
      "move11": 1.259,
      "move6": 1.259,
      "sILo": 1.259,
      "tR-14": 1.04
    },
    "A2": {
      "dR-03": 0.418,
      "ev": -0.177,
      "move11": 1.663,
      "move6": 1.663,
      "sILo": 0.418,
      "tR-03": 0.818
    },
    "A3": {
      "dR-16": 3.956,
      "ev": 3.667,
      "move11": 4.179,
      "sILo": 4.246,
      "tR-16": 3.465
    },
...

我正试图把它变成一个看起来像这样的 pandas df

var1 var2 dR     ev     move11 move6 sILo   tR
4.0  A1   1.181  1.102  1.259  1.259 1.259  1.04
4.0  A2   0.418  -0.177 1.663  1.663 0.418  0.818
4.0  A3   3.956  3.667  4.179  NaN   4.246  3.465

我试过像这样使用 pandas json_normalize:

js = pd.read_json('path', orient='index', typ='series', convert_dates=False, convert_axes = True)
pd.json_normalize(js, record_prefix = True)

但这将第一个和第二个索引连接起来,所以我最终得到一个看起来像这样的 df:

    A1.0.2          A2.0.8 ... 
0   1.0             1.0
1   NaN             NaN

我已经为 read_json 和 json_normalize 尝试了一些不同的 arg 组合,所有结果都相似。

【问题讨论】:

    标签: python json pandas dataframe


    【解决方案1】:

    用途:

    # STEP 1
    df = pd.DataFrame(data).stack()
    
    # STEP 2
    df = df.apply(pd.Series).rename_axis(['var1', 'var2']).reset_index()
    
    # STEP 3
    df['dR'] = df.filter(like='dR').stack().reset_index(drop=True)
    df['tR'] = df.filter(like='tR').stack().reset_index(drop=True)
    
    # STEP 4
    m = df.columns.str.contains(r'^dR-\d+') | df.columns.str.contains(r'^tR-\d+')
    df = df.loc[:, ~m]
    

    步骤:

    # STEP 1
    A1  4.0    {'dR-14': 1.181, 'ev': 1.102, 'move11': 1.259,...
    A2  4.0    {'dR-03': 0.418, 'ev': -0.177, 'move11': 1.663...
    A3  4.0    {'dR-16': 3.956, 'ev': 3.667, 'move11': 4.179,...
    
    
    # STEP 2
      var1 var2  dR-14     ev  move11  move6   sILo  tR-14  dR-03  tR-03  dR-16  tR-16
    0  4.0   A1  1.181  1.102   1.259  1.259  1.259   1.04    NaN    NaN    NaN    NaN
    1  4.0   A2    NaN -0.177   1.663  1.663  0.418    NaN  0.418  0.818    NaN    NaN
    2  4.0   A3    NaN  3.667   4.179    NaN  4.246    NaN    NaN    NaN  3.956  3.465
    
    # STEP 3
      var1 var2  dR-14     ev  move11  move6   sILo  tR-14  dR-03  tR-03  dR-16  tR-16     dR     tR
    0  4.0   A1  1.181  1.102   1.259  1.259  1.259   1.04    NaN    NaN    NaN    NaN  1.181  1.040
    1  4.0   A2    NaN -0.177   1.663  1.663  0.418    NaN  0.418  0.818    NaN    NaN  0.418  0.818
    2  4.0   A3    NaN  3.667   4.179    NaN  4.246    NaN    NaN    NaN  3.956  3.465  3.956  3.465
    
    # STEP 4 (RESULT)
      var1 var2     ev  move11  move6   sILo     dR     tR
    0  4.0   A1  1.102   1.259  1.259  1.259  1.181  1.040
    1  4.0   A2 -0.177   1.663  1.663  0.418  0.418  0.818
    2  4.0   A3  3.667   4.179    NaN  4.246  3.956  3.465
    

    【讨论】:

    • @LMGagne 这回答了你的问题吗?
    猜你喜欢
    • 2021-06-28
    • 2020-03-21
    • 2019-06-09
    • 2014-06-27
    • 1970-01-01
    • 2022-11-03
    • 2022-11-23
    • 2021-11-30
    • 2021-08-26
    相关资源
    最近更新 更多