Pandas：创建一个包含列列表作为值的字典答案

【问题标题】：Pandas: create a dictionary with a list of columns as valuesPandas：创建一个包含列列表作为值的字典
【发布时间】：2017-07-05 15:46:05
【问题描述】：

鉴于此DataFrame：

import pandas as pd
first=[0,1,2,3,4]
second=[10.2,5.7,7.4,17.1,86.11]
third=['a','b','c','d','e']
fourth=['z','zz','zzz','zzzz','zzzzz']
df=pd.DataFrame({'first':first,'second':second,'third':third,'fourth':fourth})
df=df[['first','second','third','fourth']]

   first  second third fourth
0      0   10.20     a      z
1      1    5.70     b     zz
2      2    7.40     c    zzz
3      3   17.10     d   zzzz
4      4   86.11     e  zzzzz

我可以使用 df 创建字典

a=df.set_index('first')['second'].to_dict()

这样我就可以决定什么是keys，什么是values。但是，如果您希望 values 成为列列表，例如 second AND third，该怎么办？

如果我试试这个

b=df.set_index('first')[['second','third']].to_dict()

我得到了一本奇怪的字典

{'second': {0: 10.199999999999999,
  1: 5.7000000000000002,
  2: 7.4000000000000004,
  3: 17.100000000000001,
  4: 86.109999999999999},
 'third': {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}}

相反，我想要一个列表字典

{0: [10.199999999999999,a],
 1: [5.7000000000000002,b],
 2: [7.4000000000000004,c],
 3: [17.100000000000001,d],
 4: [86.109999999999999,e]}

如何处理？

【问题讨论】：

标签： python list pandas dictionary

【解决方案1】：

其他人可能会提出纯熊猫解决方案，但在紧要关头，我认为这应该对您有用。您基本上可以即时创建字典，而是在每一行中索引值。

d = {df.loc[idx, 'first']: [df.loc[idx, 'second'], df.loc[idx, 'third']] for idx in range(df.shape[0])}

d
Out[5]: 
{0: [10.199999999999999, 'a'],
 1: [5.7000000000000002, 'b'],
 2: [7.4000000000000004, 'c'],
 3: [17.100000000000001, 'd'],
 4: [86.109999999999999, 'e']}

编辑：你也可以这样做：

df['new'] = list(zip(df['second'], df['third']))

df
Out[25]: 
   first  second third fourth         new
0      0   10.20     a      z   (10.2, a)
1      1    5.70     b     zz    (5.7, b)
2      2    7.40     c    zzz    (7.4, c)
3      3   17.10     d   zzzz   (17.1, d)
4      4   86.11     e  zzzzz  (86.11, e)

df = df[['first', 'new']]

df
Out[27]: 
   first         new
0      0   (10.2, a)
1      1    (5.7, b)
2      2    (7.4, c)
3      3   (17.1, d)
4      4  (86.11, e)

df.set_index('first').to_dict()
Out[28]: 
{'new': {0: (10.199999999999999, 'a'),
  1: (5.7000000000000002, 'b'),
  2: (7.4000000000000004, 'c'),
  3: (17.100000000000001, 'd'),
  4: (86.109999999999999, 'e')}}

在这种方法中，您将首先创建要保留的列表（或元组），然后“删除”其他列。这基本上是您原来的方法，经过修改。

如果你真的想要列表而不是元组，只需在 map 输入 list 到 'new' 列：

df['new'] = list(map(list, zip(df['second'], df['third'])))

【讨论】：

我真正的first 是一列编码为strings 的数字（字母数字值，老实说）。所以当移植到字典中时，它们看起来像u'112233'。如何摆脱 u (unicode)？
那个 'u' 并不会真正影响这些字符串的“完整性”，但如果你想让它消失，我会尝试map(str, df['first'])。甚至df['first'] = [str(x) for x in df['first']]
这可能应该作为一个单独的问题，但是如果你想要元组 (first,second) 作为字典的键呢？
类似于我上面的操作。在df 中创建一个'key' 列，其值由(first, second) 元组对组成。 df = df[['key', 'third']]; df.set_index('key').to_dict() 应该可以得到你想要的。只需在您的示例数据集上尝试一些不同的东西，然后再在更大的数据集上进行尝试。你有能力做那么多 :)

【解决方案2】：

你可以zip这些值：

In [118]:
b=df.set_index('first')[['second','third']].values.tolist()
dict(zip(df['first'].index,b))

Out[118]:
{0: [10.2, 'a'], 1: [5.7, 'b'], 2: [7.4, 'c'], 3: [17.1, 'd'], 4: [86.11, 'e']}

【讨论】：

【解决方案3】：

您可以通过values 创建numpy array，通过first 列创建zip 并转换为dict：

a = dict(zip(df['first'], df[['second','third']].values.tolist()))
print (a)
{0: [10.2, 'a'], 1: [5.7, 'b'], 2: [7.4, 'c'], 3: [17.1, 'd'], 4: [86.11, 'e']}

【讨论】：