【问题标题】:How to convert pandas DataFrame to nested dictionary?如何将 pandas DataFrame 转换为嵌套字典?
【发布时间】:2023-01-17 23:14:20
【问题描述】:

我有一个像这样的熊猫数据框:

id unit step phase start_or_end_of_phase op_name occurence
1 A 50l LOAD start P12load5 2
2 A 50l LOAD end P12load5 2
3 A 50l STIR start P12s5 4
4 A 50l STIR end P13s5 3
5 A 50l COLLECT start F7_col1 1
6 A 50l COLLECT end H325_col1 1
7 A 1000l SET_TEMP start xyz 2
8 A 1000l SET_TEMP end qwe 3
9 A 1000l SET_TEMP2 start asf 4
10 A 1000l SET_TEMP2 end fdsa 5
11 A 1000l FILTER start 4fags 1
11 A 1000l FILTER end mllsgrs_1 1
12 B MACHINE1 ... ... ... ...

...并想像这样创建嵌套字典:

A = {50l : {
       'LOAD' :
                {'start':{'op_name' : 'p12load5',
                           'occurrence': 2},
                 'end':{'op_name': 'P12load5',
                        'occurrence': 2}},
        'STIR': 
                {'start':{'op_name' : 'P12s5',
                           'occurrence': 4},
                 'end':{'op_name': 'P13s5',
                        'occurrence': 3}},
        'COLLECT': 
                {'start':{'op_name' : 'F7_col1',
                           'occurrence': 1},
                 'end':{'op_name': 'H325_col1',
                        'occurrence': 1}}
          }, 
    1000l : {
       'SET_TEMP' : ....

我一直在尝试将 groupby() 与 to_dict() 结合起来,但无法理解它。 我最后一次尝试是这样的(基于How to convert pandas dataframe to nested dictionary):

populated_dict = process_steps_table.groupby(['unit', 'step', 'phase', 'start_or_end_phase']).apply(lambda x: x.set_index('start_or_end_phase').to_dict(orient='index')).to_dict()

并得到他的错误:DataFrame index must be unique for orient='index'。

我不确定是否必须将 set_index() lambda 函数应用于组以及为什么。

【问题讨论】:

  • 链接的问题是 2 级索引/键组合。如果你想要另一层,你需要添加另一个groupby

标签: python pandas dictionary


【解决方案1】:

在导出为字典之前,您必须重塑数据框:

nested_cols = ['step', 'phase', 'start_or_end_of_phase']
value_cols = ['op_name', 'occurence']

# Reshape your dataframe
df1 = df.set_index(nested_cols)[value_cols].stack()

# Export nested dict
d = {}
for t, v in df1.items():
    e = d.setdefault(t[0], {})
    for k in t[1:-1]:
        e = e.setdefault(k, {})
    e[t[-1]] = v

输出

import json  # just for a best representation
print(json.dumps(d, indent=4))

# Output
{
    "50l": {
        "LOAD": {
            "start": {
                "op_name": "P12load5",
                "occurence": 2
            },
            "end": {
                "op_name": "P12load5",
                "occurence": 2
            }
        },
        "STIR": {
            "start": {
                "op_name": "P12s5",
                "occurence": 4
            },
            "end": {
                "op_name": "P13s5",
                "occurence": 3
            }
        },
        "COLLECT": {
            "start": {
                "op_name": "F7_col1",
                "occurence": 1
            },
            "end": {
                "op_name": "H325_col1",
                "occurence": 1
            }
        }
    },
    "1000l": {
        "SET_TEMP": {
            "start": {
                "op_name": "xyz",
                "occurence": 2
            },
            "end": {
                "op_name": "qwe",
                "occurence": 3
            }
        },
        "SET_TEMP2": {
            "start": {
                "op_name": "asf",
                "occurence": 4
            },
            "end": {
                "op_name": "fdsa",
                "occurence": 5
            }
        },
        "FILTER": {
            "start": {
                "op_name": "4fags",
                "occurence": 1
            },
            "end": {
                "op_name": "mllsgrs_1",
                "occurence": 1
            }
        }
    }
}

【讨论】:

  • 注意:我很确定我已经回答了类似的问题,但我找不到问题,因为我保留了这段代码:-(
猜你喜欢
  • 2019-12-09
  • 1970-01-01
  • 2013-11-16
  • 2019-04-08
  • 2023-02-10
  • 2018-10-26
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多