【问题标题】:Pandas convert Dataframe to Nested JsonPandas 将 Dataframe 转换为嵌套的 Json
【发布时间】:2014-06-27 22:00:40
【问题描述】:

我的问题基本上与这个问题相反:

Create a Pandas DataFrame from deeply nested JSON

我想知道是否可以反过来。给定一个像这样的表:

     Library  Level           School Major  2013 Total
200  MS_AVERY  UGRAD  GENERAL STUDIES  GEST        5079
201  MS_AVERY  UGRAD  GENERAL STUDIES  HIST           5
202  MS_AVERY  UGRAD  GENERAL STUDIES  MELC           2
203  MS_AVERY  UGRAD  GENERAL STUDIES  PHIL          10
204  MS_AVERY  UGRAD  GENERAL STUDIES  PHYS           1
205  MS_AVERY  UGRAD  GENERAL STUDIES  POLS          53

是否可以生成嵌套字典(或 JSON),如:

字典:

{'MS_AVERY': 
    { 'UGRAD' :
        {'GENERAL STUDIES' : {'GEST' : 5}
                             {'MELC' : 2}

 ...

【问题讨论】:

标签: python json dictionary pandas


【解决方案1】:

在给定DataFrame 对象的情况下,创建一个将构建递归字典的函数似乎并不难:

def fdrec(df):
    drec = dict()
    ncols = df.values.shape[1]
    for line in df.values:
        d = drec
        for j, col in enumerate(line[:-1]):
            if not col in d.keys():
                if j != ncols-2:
                    d[col] = {}
                    d = d[col]
                else:
                    d[col] = line[-1]
            else:
                if j!= ncols-2:
                    d = d[col]
    return drec

这将产生:

{'MS_AVERY':
    {'UGRAD':
        {'GENERAL STUDIES': {'PHYS': 1L, 
                             'POLS': 53L,
                             'PHIL': 10L,
                             'HIST': 5L,
                             'MELC': 2L,
                             'GEST': 5079L}}}}

【讨论】:

  • 感谢您的回复 saullo。我想知道是否有一个内置函数可以做到这一点,但这很好用!
  • 这是一个可爱的函数,但是对于 JSON,所有值都必须用双引号括起来。
【解决方案2】:

这是我在处理 this question 时提出的解决方案:

def rollup_to_dict_core(x, values, columns, d_columns=None):
    if d_columns is None:
        d_columns = []

    if len(columns) == 1:
        if len(values) == 1:
            return x.set_index(columns)[values[0]].to_dict()
        else:
            return x.set_index(columns)[values].to_dict(orient='index')
    else:
        res = x.groupby([columns[0]] + d_columns).apply(lambda y: rollup_to_dict_core(y, values, columns[1:]))
        if len(d_columns) == 0:
            return res.to_dict()
        else:
            res.name = columns[1]
            res = res.reset_index(level=range(1, len(d_columns) + 1))
            return res.to_dict(orient='index')

def rollup_to_dict(x, values, d_columns=None):
    if d_columns is None:
        d_columns = []

    columns = [c for c in x.columns if c not in values and c not in d_columns]
    return rollup_to_dict_core(x, values, columns, d_columns)

>>> pprint(rollup_to_dict(df, ['2013 Total']))
{'MS_AVERY': {'UGRAD': {'GENERAL STUDIES': {'GEST': 5079,
                                            'HIST': 5,
                                            'MELC': 2,
                                            'PHIL': 10,
                                            'PHYS': 1,
                                            'POLS': 53}}}}

【讨论】:

    【解决方案3】:
    key = ['Library', 'Level', 'School']
    series = (df.groupby(key, sort=False)[df.columns.difference(key)]
                .apply(lambda x: x[['Major', '2013 Total']].to_dict('records'))
             )
    
    # build: {Major: Total}
    major = {}
    values = series.values[0]
    for i in range(len(values)):
        major.update({values[i]['Major']: values[i]['2013 Total']})
    
    # build the recursive dictionary
    index = series.index[0]
    d = {}
    for i in reversed(range(len(index))):
        if not bool(d):
            d = {index[i]: major}
        else:
            d = {index[i]: d}
    print(json.dumps(d, indent=2))
    

    它会产生:

    {
      "MS_AVERY": {
        "UGRAD": {
          "GENERAL STUDIES": {
            "GEST": 5079,
            "HIST": 5,
            "MELC": 2,
            "PHIL": 10,
            "PHYS": 1,
            "POLS": 53
          }
        }
      }
    }
    

    【讨论】:

      【解决方案4】:

      这是生成此格式的通用方法,可能是其他人正在寻找的。所需格式:

      { "data": 
         [
              {
                  "NAME": [1, 2, 3]
              },
              {
                  "NAME": [1, 2, 3]
              },
          ]
      }
      

      要做到这一点:

      import json
      jsonstr = '{"data":['
      for (columnName, columnData) in df.iteritems():
          jsonstr+='{"'
          jsonstr+=columnName
          jsonstr+='":'
          jsonstr+=json.dumps(list(columnData.values))
          jsonstr+='},'
      jsonstr = jsonstr[:-1]
      jsonstr+=']}'
      jsonobject = json.loads(jsonstr)
      jsonobject
      

      【讨论】:

        猜你喜欢
        • 2019-06-09
        • 1970-01-01
        • 2021-03-07
        • 2020-09-29
        • 1970-01-01
        • 1970-01-01
        • 2021-12-09
        • 2021-12-29
        • 2021-11-16
        相关资源
        最近更新 更多