在数据框中解包字典答案

【问题标题】：Unpacking Dictionaries within a Data Frame在数据框中解包字典
【发布时间】：2019-02-08 16:45:10
【问题描述】：

我有一个包含一系列字典的 Pandas 数据框，如下：

df.head()

Index                 params                    score            
0   {'n_neighbors': 1, 'weights': 'uniform'}    0.550
1   {'n_neighbors': 1, 'weights': 'distance'}   0.550
2   {'n_neighbors': 2, 'weights': 'uniform'}    0.575
3   {'n_neighbors': 2, 'weights': 'distance'}   0.550
4   {'n_neighbors': 3, 'weights': 'uniform'}    0.575

目的是为每个实例创建一个以“n_neighbors”和“weights”作为属性的数据框，并删除params 列。我通过创建空的 numpy 数组、循环和追加来实现这一点：

n_neighbors = np.array([])
weights = np.array([])

count = sum(df["score"].value_counts())

for x in range(count):
     n_neighbors = np.append(n_neighbors, df["params"][x]["n_neighbors"])

for x in range(count):
     weights = np.append(weights, df["params"][x]["weights"])

df["n_neighbors"] = n_neighbors
df["weights"] = weights
df = df.drop(["params"], axis=1)

这感觉肮脏和低效。有没有更优雅的方法来实现这一点？

【问题讨论】：

标签： python pandas numpy dictionary dataframe

【解决方案1】：

从df['params'] 构造一个新的数据框并将其加入您的原始数据框。为方便起见，pd.DataFrame.pop 同时返回一个系列并将其从您的数据框中删除。

df = pd.DataFrame({'Index': [0, 1],
                   'params': [{'n_neighbors': 1, 'weights': 'uniform'},
                              {'n_neighbors': 1, 'weights': 'distance'}],
                   'score': [0.550, 0.550]})

res = df.join(pd.DataFrame(df.pop('params').tolist()))

print(res)

   Index  score  n_neighbors   weights
0      0   0.55            1   uniform
1      1   0.55            1  distance

【讨论】：

太棒了！我的知识在这里很明显存在差距。你会推荐什么资源来帮助我解决这个问题？就目前而言，我认为我自己永远无法为您找到解决方案......
官方Pandas cookbook是一个很好的起点。当然，SO。

【解决方案2】：

简单

datapoints = list(dataframe['params'])
data = pd.DataFrame(datapoints)
data['score'] = list(dataframe['score'])

【讨论】：

谢谢！很有帮助！

【解决方案3】：

在你的情况下，你不需要 numpy.普通的python列表感觉更好。我提醒你，一个 df 实际上是一个字典列表（每一行都是一个在列表中相似的字典。检查 Doku ex : d = {'col1': [1, 2], 'col2': [3, 4 ]} . 所以遵循模式。当你让它传递给构造函数 pd.DataFrame()

我猜正确的做法。

【讨论】：