一起迭代字典和数据框的更快方法？答案

【问题标题】：Faster way to iterate over dictionary and dataframe together?一起迭代字典和数据框的更快方法？
【发布时间】：2020-06-10 22:55:30
【问题描述】：

我有一个字典和一个具有相同键/列的 DataFrame。然而，DataFrame 缺少一些数据，我将使用字典填充这些数据。这是一个最小的例子，我的数据集要大得多。

mydict = {'one': ['foo', 'bar'], 'two': ['foo', 'bar']}
mydf = pd.DataFrame({'one': ['N/A', 'bar'], 'two': ['foo', 'N/A'], 'foo': ['foo', 'bar'], 'bar': ['foo', 'bar']})

def myfunc(mydict):
    for i,k in mydict.items():
            for m in k:
                mydf[i].replace(to_replace='N/A', value=mydf[m], inplace=True)


for f,g in mydf.iterrows():
        for h in g:
            if h != 'N/A':
                myfunc(mydict)

for i,v in mydict.items(): 
    mydf.drop(columns=v, inplace=True, errors='ignore')

当我在更大的数据集上运行我的函数时，内核不会停止运行。什么是更快的方法来做到这一点？我想尝试使用 df.apply() 或矢量化功能，但不知道如何。上面示例的输出如下所示：

    one two
0   foo foo
1   bar bar

【问题讨论】：

dict 的值是否与 df 中的行数一样多？
没有。 dict 有许多值作为 df 中的列。
为什么不直接从字典中制作数据框？ pd.DataFrame(mydict)
你为什么使用字符串"N/A"而不是正确的NaN值？ 我想尝试使用 df.apply() 或矢量化函数但不知道如何。 你不知道怎么做是什么意思？你有没有尝试过，做过任何研究？

标签： python pandas loops dictionary iteration

【解决方案1】：

试试这个，它应该会给你你想要的。

# Fill the values using your dictionary
for k, v in mydict.items():
    mydf[k] = v  

# Drop the columns you don't want
mydf.drop(columns=['foo','bar'], inplace=True)

你会得到这个：

    one two
0   foo foo
1   bar bar

【讨论】：

我还想删除 df[v] 列。如果您在问题中看到我的输出，那就有点不同了。
哦，您要删除foo 和bar 列吗？
@RohanGupta 这完全是另一个问题，与问题无关。如果您仍然要删除该列，为什么还要填写缺失值？