【问题标题】:appending data from dictionaries to dataframe将字典中的数据附加到数据框
【发布时间】:2023-02-25 09:49:14
【问题描述】:

我正在尝试从字典创建数据框。字典可以有许多键值对。键值对的数量取决于名称列表。

假设我有以下名称的列表:

names = [["name_0", "name_1"], ["name_2", "name_3"], ["name_2", "name_3", "name_4"]]

由于我有 3 个名称列表,我将创建 3 个字典并传递一些值。这些字典中的键与上面列表中的名称相匹配。对于这个例子,我只传递了 2 个值,但列表可以比这更长。

dict_1 = {"name_0" : [1,2], "name_1" : [1,2]}
dict_2 = {"name_2" : [2,3], "name_3" : [1,3]}
dict_3 = {"name_2" : [2,3], "name_3" : [1,3], "name_4" : [2,3]}
#adding all dictionaries to a list
data_3 = [dict_1, dict_2, dict_3]

期望的输出:

                        names  values   multi
0           [name_0, name_1]   [1, 1]      1
1           [name_0, name_1]   [2, 2]      4
2           [name_2, name_3]   [2, 1]      2
3           [name_2, name_3]   [3, 3]      9
4  [name_2, name_3, name_4]   [2, 1, 2]    4
5  [name_2, name_3, name_4]   [3, 3, 3]    27

值列是字典值中所有可能值的组合。多列是这些值的乘积。

我已经尝试过的:

names = [["name_0", "name_1"], ["name_2", "name_3"], ["name_2", "name_3", "name_4"]]
dict_1 = {"name_0" : [1,2], "name_1" : [1,2]}
dict_2 = {"name_2" : [2,3], "name_3" : [1,3]}
dict_3 = {"name_2" : [2,3], "name_3" : [1,3], "name_4" : [2,3]}
#adding all dictionaries to a list
data_3 = [dict_1, dict_2, dict_3]


def dict_operation(dictionary, names):
    df_data = []
    for i in names:
        for d in dictionary:
            for v in d.values():
                if len(i) > 2:
                    x = 0  # not sure how to do this part
                    df_data.append({"names": i, "values": v, "multi": x})
                else:
                    x = 0 # not sure how to do this part
                    df_data.append({"names" : i, "values": v, "multi" : x})
    #         if len(i) > 1:
    #             df_data.append({"names": i, "values" : v, "multi" : [2]})
    #         else:
    #             df_data.append({"names": i, "values": v, "multi": [2]})
    df=pd.DataFrame(df_data)
    print(df)
    return df

dict_operation(data_3, names)

我想不出比嵌套 for 循环更好的方法。任何帮助将不胜感激!

【问题讨论】:

    标签: python dataframe dictionary


    【解决方案1】:

    我做了一些更新来简化代码,并在代码中做了 cmets 来解释这些变化。希望这会有所帮助

    import pandas as pd
    import numpy as np
    
    dict_1 = {"name_0" : [1,2], "name_1" : [1,2]}
    dict_2 = {"name_2" : [2,3], "name_3" : [1,3]}
    dict_3 = {"name_2" : [2,3], "name_3" : [1,3], "name_4" : [2,3]}
    #adding all dictionaries to a list
    data_3 = [dict_1, dict_2, dict_3]
    
    
    def dict_operation(dictionaries):
        df_data = []
        for d in dictionaries:
            # Names are already in the keys of each dict, so don't need to pass a list of names
            names = list(d.keys())
            # Zip the values (lists) within a dict to get combinations of elements by position
            for vals in zip(*d.values()):
                
                df_data.append({
                    "names": names,
                    "values": list(vals),  # zip will output a tuple, so convert to list
                    "multi": np.prod(vals)  # numpy prod will take the product of all elements
                })
        df=pd.DataFrame(df_data)
        print(df)
        return df
    
    dict_operation(data_3)
    

    感谢您提供所需的输出 - 这非常有帮助。

    【讨论】:

    • 看起来不错!我只是想过一个场景: dict_1 = {"name_0" : 2, "name_1" : 1]} -> 只是注意到 np.prod 不喜欢只有一个值
    【解决方案2】:

    我不明白 namesdict_1, dict_2, dict_3 的关系,但这是我得到的:

    import pandas as pd
    import numpy as np
    
    names = [["name_0", "name_1"], ["name_2", "name_3"], ["name_2", "name_3", "name_4"]]
    
    dict_1 = {"name_0" : [1,2], "name_1" : [1,2]}
    dict_2 = {"name_2" : [2,3], "name_3" : [1,3]}
    dict_3 = {"name_2" : [2,3], "name_3" : [1,3], "name_4" : [2,3]}
    data_3 = [dict_1, dict_2, dict_3]
    
    data_dict = {
        'names': [],
        'values': [],
        'multi': []
    }
    for dict_ in data_3:
        for i in range(2):
            data_dict['names'].append(str(list(dict_.keys())))
            values_list = [value[i] for value in dict_.values()]
            data_dict['values'].append(values_list)
            data_dict['multi'].append(np.prod(values_list))
        
    data_df = pd.DataFrame(data_dict)
    print(data_df)
    

    【讨论】:

      猜你喜欢
      • 2019-01-17
      • 1970-01-01
      • 2016-06-11
      • 2022-01-06
      • 2015-10-20
      • 2021-11-07
      • 2020-09-23
      • 2017-10-12
      • 1970-01-01
      相关资源
      最近更新 更多