将具有多个值和变量键的嵌套字典导出到 Excel答案

【问题标题】：Exporting a nested dictionary with multiple values and variable keys to excel将具有多个值和变量键的嵌套字典导出到 Excel
【发布时间】：2019-12-25 23:07:33
【问题描述】：

here 的第二次尝试。我需要的是将以下字典导出到excel。

{1: {'Field Cluster': ['This', 'This', 'This'], 
     'Exploration Block': ['Is', 'Is', 'Is'], 
     'Producing since': [1923.0, 1923.0, 1923.0], 
     'Fluids': ['A ', 'A ', 'A '], 
     'Reservoirs': ['Test', 'Test', 'Test'], 
     'Area (km2)': ['File', 'File', 'File'], 
     'Depth (m)': ['A\nHuge\nDepth', 'A\nHuge\nDepth', 'A\nHuge\nDepth'],   
     'Concession License No.': ['UNIX license', 'UNIX license', 'UNIX license'], 
     'License Expiry Date / Extension': ['Everlasting', 'Everlasting', 'Everlasting'], 
     'Working Interest with SB': ['There is one\n', 'There is one\n', 'There is one\n'], 
     'Government approval:': ['It is!', 'It is!', 'It is!'], 
     'Last study:': ['Million years ago', 'Million years ago', 'Million years ago'], 
     'Parameters': ['Horizon1', 'Horizon2', 'Horizon3'], 
     'Reservoir rock': ['First', 'Second', 'Third'], 
     'Net pay thickness (m)': [1.0, 21.0, 41.0], 
     'Avr. porosity (%)': [2.0, 22.0, 42.0], 
     'Average absolute permeability  (mD)': [3.0, 23.0, 43.0], 
     'Swi (%)': [4.0, 24.0, 44.0], 
     'Initial pressure (at)': [5.0, 25.0, 45.0], 
     'Bubble Pressure (at.)': [6.0, 26.0, 46.0], 
     'Dew Point Pressure (at)': [7.0, 27.0, 47.0], 
     'Initial Solution Ratio (Stm3/m3)': [8.0, 28.0, 48.0], 
     'Initial Condensate Gas Ratio (g/Stm3)': [9.0, 29.0, 49.0], 
     'Oil density (kg/cm)': [10.0, 30.0, 50.0], 
     'Oil viscosity (Pb) (cP)': [11.0, 31.0, 51.0], 
     'Contaminants (H2S, CO2)': [12.0, 32.0, 52.0], 
     'Initial Oil in Place (e3 to)': [13.0, 33.0, 53.0], 
     'Initial NGL in Place (e3 to)': [14.0, 34.0, 54.0]}, 
 2: {'Field Cluster': ['This fff', 'This fff', 'This fff', 'This fff'],                 
     'Exploration Block': ['fff', 'fff', 'fff', 'fff'], 
     'Producing since': ['1923fff', '1923fff', '1923fff', '1923fff'],     
     'Fluids': ['A fff', 'A fff', 'A fff', 'A fff'],
     'Reservoirs': ['Test', 'Test', 'Test', 'Test'], 
     'Area (km2)': ['File', 'File', 'File', 'File'], 
     'Depth (m)': ['A\nHuge\nDepthfff', 'A\nHuge\nDepthfff', 'A\nHuge\nDepthfff', 'A\nHuge\nDepthfff'], 
     'Concession License No.': ['UNIX license', 'UNIX license', 'UNIX license', 'UNIX license'], 
     'License Expiry Date / Extension': ['Everlastingfff', 'Everlastingfff', 'Everlastingfff', 'Everlastingfff'], 
     'Working Interest': ['There is one\n', 'There is one\n', 'There is one\n', 'There is one\n'], 
     'Gouvernment approval:': ['ffff', 'ffff', 'ffff', 'ffff'], 
     'Last study:': ['Million years fffff', 'Million years fffff', 'Million years fffff', 'Million years fffff'], 
     'Parameters': ['Horizon1', 'Horizon2', 'Horizon3', 'Horizon4'],     
     'Reservoir rock': ['First', 'Second', 'Third', 'Fourth'], 
     'Net pay thickness (m)': [1.0, 21.0, 41.0, 61.0], 
     'Avr. porosity (%)': [2.0, 22.0, 42.0, 62.0], 
     'Average absolute permeability  (mD)': [3.0, 23.0, 43.0, 63.0], 
     'Swi (%)': [4.0, 24.0, 44.0, 64.0], 
     'Initial Oil in Place (e3 to)': [13.0, 33.0, 53.0, 73.0], 
     'Initial NGL in Place (e3 to)': [14.0, 34.0, 54.0, 74.0], 
     'Initial Gas (assoc.) in Place (e6 m3) sol.gas/gas cap': [15.0, 35.0, 55.0, 75.0], 
     'Initial Gas (non assoc.) in Place (e6 m3)': [16.0, 36.0, 56.0, 76.0],    
     'Primary recovery / drive mechanism\nNone': ['Wow\nA', 'Recovery\nNone', 'Mechanism\nNone', 'Nice\nNone', ''], 
     'Secondary recovery': ['Another one', '', '', '', ''], 
     'Total Wells': ['1000', '-', '-', '-', ''], 
     'Productive wells (oil/gas)': ['500', '-', '-', '-', ''], 
     'Injection wells (water/gas)': ['500', '-', '-', '-', ''], 
     'Rate of best producer in the field (tons / e3 Sm3/day)': ['30', '-', '-', '-', ''], 
     'WOW Production (Something)': ['1', 2.0, '3', '4', '']}}

上一篇文章给出了两个答案。第一个：

df=pd.DataFrame(d) # assuming d is the name of the dict
cols=df.columns
final=pd.concat([pd.DataFrame(df[i].dropna().tolist()) for i in cols],axis=1,keys=cols)
final.index=df.index
print(final)

这个仅适用于第一个嵌套字典。关键问题是第二个子字典中缺少一些键，并且根据第一个字典使用的顺序对值进行排序。这会导致值与相应的参数不匹配。

另一个答案非常相似，它适用于测试词典，但不适用于上述词典：

df=pd.DataFrame(d) # assuming d is the name of the dict
cols=df.columns
final=pd.concat([pd.DataFrame(v).T for k,v in d.items()],axis=1,sort=False,keys=d.keys())
final.index=df.index
print(final)

对于实际字典，此代码仅返回两行，其中包含元组中的参数。而且，它只考虑第二个子字典。

简而言之，我想要什么：假设我们有这个小字典，和实际的很相似：

{1: 
    {'Parameter 1': ['Value 1', 'Value 2', 'Value 3'], 
     'Parameter 2': ['Value 11', 'Value 22', 'Value 33'], 
     'Parameter 3': ['Num1', 'Num2', 'Num3']},
 2:
    {'Parameter 1': ['Data 1', 'Data 2', 'Data 3'], 
     'Parameter 2': ['Data 11', 'Data 22', 'Data 33'], 
     'Parameter 4': ['Numb11', 'Numb22', 'Numb33']}
}

我想从中得到这样的表：

            |               1             |             2            |    
---------------------------------------------------------------------
Parameter 1 | Value 1 | Value 2 | Value 3 | Data 1 | Data 2 | Data 3 |
----------------------------------------------------------------------
Parameter 2 | Value 11| Value 22| Value 33| Data 1 | Data 2 | Data 3 |
----------------------------------------------------------------------
Parameter 3 |   Num1  |   Num2  |   Num3  |        |        |        |
----------------------------------------------------------------------
Parameter 4 |         |         |         | Numb11 | Numb22 | Numb33 | 
----------------------------------------------------------------------

所以每个值都对应它的参数，所有参数都位于第一列，不重复。

【问题讨论】：

你能提供一本小字典作为最后一本未能提供你想要的东西吗？您提供的代码适用于小型数据框。
@ndclt 刚刚编辑了我的问题。如您所见，两个子词典具有相似的字段，但只有第一个具有“与 SB 合作的兴趣”参数，而只有第二个具有“WOW Production (Something)”。

标签： python-3.x pandas dataframe dictionary xlrd

【解决方案1】：

以下内容与您提供的内容相同（但少了两个）：

df_to_concat = {k: pd.DataFrame(v).transpose() for (k, v) in d.items()}
df = pd.concat(df_to_concat.values(), keys=df_to_concat.keys(), axis='columns')

但是您的大字典有不相等的列表，会出现以下错误：

ValueError: arrays must all be same length

最后一个键有一个最后一个空值。当我手动删除时，代码有效。如果您想以编程方式执行此操作，您可以在创建数据框之前执行类似操作，它会删除包含太多项目的列表的最后一个值：

min_length = {k: min([len(one_list) for one_list in v.values()]) for (k, v) in d.items()}
new_d = {}
for k, v in d.items():
    new_v = {}
    for k2, one_list in v.items():
        new_v.update({k2: one_list[:min_length[k]]})
    new_d.update({k: new_v})

【讨论】：