将 pandas 数据框转为字典答案

【问题标题】：Pivoted pandas dataframe to dictionary将 pandas 数据框转为字典
【发布时间】：2015-10-20 00:48:40
【问题描述】：

我有一个这样的pandas.DataFrame：

df
#     col3 2000 5000 7500 10000 12000 15000 20000 30000
#col1 col2                              
#  22   0   NaN  NaN  NaN   NaN   NaN   NaN     1   NaN
#       1   NaN  NaN  NaN   NaN   NaN   NaN     1   NaN
#  24   0     1  NaN  NaN   NaN   NaN     1   NaN   NaN
#       1     1  NaN  NaN   NaN   NaN   NaN     1   NaN
#  26   0   NaN  NaN  NaN   NaN   NaN     1   NaN   NaN
#       1   NaN  NaN  NaN   NaN   NaN     1   NaN   NaN
#  29   0     1  NaN  NaN   NaN   NaN   NaN   NaN   NaN
#  31   1   NaN  NaN  NaN   NaN   NaN   NaN   NaN   NaN

我需要先将每条记录映射如下（伪代码）if df.ix[row,col] == 1: df.ix[row,col] = col。

然后我想将映射的记录存储在列表中，忽略 NaN 值，例如像

[ ('col2_0' , 20000), ('col2_1' , 20000),
  ('col2_0' , 2000), ('col2_1', 2000),
  ('col2_0' , 15000), ('cols_1' , 20000),
  ('col2_0' , 15000), ('col2_1' , 15000),
  ('col2_0' , 2000), ('col2_1' , 2000),

非常感谢任何帮助。

【问题讨论】：

欢迎来到 Stack Overflow。您可以使用tour 并访问help center，因为您的问题缺少我们期望从帖子中获得的一些质量属性。在链接中，您可以找到帮助您改进问题的指导，方法是给它一个edit。
我编辑了帖子以使其更加清晰。您可能会考虑接受它，以便获得一些帮助。最重要的是，您应该知道 dict 不能像上面描述的那样（具有重复的键）。

标签： python python-2.7 numpy pandas dataframe

【解决方案1】：

这应该会让你上路。假设你有一个数据框

d
#           2000  3000
#col1 col2            
#0    0        1     1
#1    0        1     1
#     1        1   NaN
#2    0        1     1
#     1        1   NaN
#3    0      NaN     1
#     1        1   NaN

接下来要重置索引

d_flat = d.reset_index()
#   col1  col2  2000  3000
#0     0     0     1     1
#1     1     0     1     1
#2     1     1     1   NaN
#3     2     0     1     1
#4     2     1     1   NaN
#5     3     0   NaN     1
#6     3     1     1   NaN

现在，您可以映射第 2 列：

d_flat.col2 = d_flat.col2.map(lambda x: 'col2_%d'%x)

#d_flad.col2
#0    col2_0
#1    col2_0
#2    col2_1
#3    col2_0
#4    col2_1
#5    col2_0
#6    col2_1
#Name: col2, dtype: object

下一步你要为每一行创建一个字典列表。执行以下操作

mycols = ['2000', '3000']
d_dict = d_flat[mycols].to_dict(orient='records')
#[{'2000': 1.0, '3000': 1.0},
# {'2000': 1.0, '3000': 1.0},
# {'2000': 1.0, '3000': nan},
# {'2000': 1.0, '3000': 1.0},
# {'2000': 1.0, '3000': nan},
# {'2000': nan, '3000': 1.0},
# {'2000': 1.0, '3000': nan}]

orient='records' 选项将每个条目分开存储，因此您可以有重复的条目（这就是为什么有一个字典列表而不是单个字典的原因）。

接下来是有趣的部分。您想仔细过滤掉 nan 值，您可以在理解中执行此操作。

from itertools import izip

mylist = [(col,key)  
          for col,records in izip( d_flat.col2, d_dict) 
          for key,val in records.iteritems() 
          if not pandas.np.isnan(val)]
#[('col2_0', '2000'),
# ('col2_0', '3000'),
# ('col2_0', '2000'),
# ('col2_0', '3000'),
# ('col2_1', '2000'),
# ('col2_0', '2000'),
# ('col2_0', '3000'),
# ('col2_1', '2000'),
# ('col2_0', '3000'),
# ('col2_1', '2000')]

【讨论】：