从字典键和值填充数据框答案

【问题标题】：Filling a dataframe from a dictionary keys and values从字典键和值填充数据框
【发布时间】：2019-03-20 21:51:30
【问题描述】：

我以以下数据框为例。

df_test = pd.DataFrame(data=None, index=["green","yellow","red","pink"], columns=["bear","dog","cat"], dtype=None, copy=False)

我有以下字典，其中的键和值与我的数据框的索引和列相同或相关。

d = {"green":["bear","dog"], "yellow":["bear"], "red":["bear"]}

我想根据提供的键和值填充我的数据框，如果键不存在，我想用空填充。

期望的输出

我只能考虑制作列表和循环。有没有一种简单的方法来实现这一点？或对我有帮助的功能？

【问题讨论】：

标签： python pandas dictionary dataframe

【解决方案1】：

你可以通过做得到你想要的：

# You can use elements that are not in the original dataframe
# and the row will be filled with empty

index_list = ["green", "yellow", "red", "pink", "purple"]

replace_dict = {True: 'Yes', False: 'No', np.nan:'Empty'}

df_test.loc[list(d.keys())].apply(lambda x : pd.Series(x.index.isin(d[x.name]),
        index=x.index), axis=1).reindex(index_list).replace(replace_dict) 

         bear    dog    cat
green     Yes    Yes     No
yellow    Yes     No     No
red       Yes     No     No
pink    Empty  Empty  Empty
purple  Empty  Empty  Empty

说明

您可以通过检查数据框的列是否存在于dict的相应字段中来完成您想要的：

df_test.loc[list(d.keys())].apply(lambda x : pd.Series(x.index.isin(d[x.name]),
    index=x.index), axis=1)

        bear    dog    cat
green   True   True  False
yellow  True  False  False
red     True  False  False

然后根据dict的key重新索引来填充找到缺失的颜色并用空填充：

index_list = ["green","yellow","red","pink", "purple"]

df_test.loc[list(d.keys())].apply(lambda x : pd.Series(x.index.isin(d[x.name]),
       index=x.index), axis=1).reindex(index_list)

        bear    dog    cat
green   True   True  False
yellow  True  False  False
red     True  False  False
pink     NaN    NaN    NaN
purple   NaN    NaN    NaN

然后，如果您想更改这些值，可以使用这样的字典来替换它们：

replace_dict = {True: 'Yes', False: 'No', np.nan:'Empty'}

df_test.loc[list(d.keys())].apply(lambda x : pd.Series(x.index.isin(d[x.name]),
        index=x.index), axis=1).reindex(index_list).replace(replace_dict) 

         bear    dog    cat
green     Yes    Yes     No
yellow    Yes     No     No
red       Yes     No     No
pink    Empty  Empty  Empty
purple  Empty  Empty  Empty

【讨论】：

@may - 所以它对你有用 index=["green","yellow","red","pink"] 吗？
是的，确实如此！只需将列表放入reindex。如果不存在，它将用Empty 填充。添加了一个示例。

【解决方案2】：

使用字典循环并设置True值，然后用mask替换所有缺失的行Empty，最后用fillna替换缺失的值：

for k, v in d.items():
    for x in v:
        df_test.loc[k, x] = 'Yes'

df_test = df_test.mask(df_test.isnull().all(axis=1), 'Empty').fillna('No')
print (df_test)
         bear    dog    cat
green     Yes    Yes     No
yellow    Yes     No     No
red       Yes     No     No
pink    Empty  Empty  Empty

【讨论】：

@may - 解决方案已修改，请检查。
谢谢！我认为现在有效！我会尽快接受的:)
@may - 你认为字典中的列表是空的吗？还是NaNs 值？
对我来说它工作得很好，没有最后的粉红色行。问题出在真实数据上？还是带样品？
只有你的作品！对不起，这是我问题的最佳答案：D

【解决方案3】：

这是通过pd.get_dummies 和pd.DataFrame.reindex: 提供的一个很大程度上矢量化的解决方案

df = pd.DataFrame.from_dict(d, orient='index')

res = pd.get_dummies(df.reindex(df_test.index), prefix='', prefix_sep='')\
        .reindex(columns=df_test.columns)\
        .fillna(0).applymap({0: 'No', 1: 'Yes'}.get)\
        .reindex(index=np.hstack((df_test.index, df.index.difference(df_test.index))))\
        .fillna('Empty')

print(res)

         bear    dog    cat
green     Yes    Yes     No
yellow    Yes     No     No
red       Yes     No     No
pink    Empty  Empty  Empty

【讨论】：

同样的问题。粉红色不在字典中，使用此解决方案消失
@may，不，我没有编造我的结果。pink 确实出现在最后一行。因此index=np.hstack((df_test.index, df.index.difference(df_test.index))) 部分。