Pandas：根据类别过滤列并替换另一个数据框列中的值答案

【问题标题】：Pandas : filter column based on category and replace values from another data frame columnPandas：根据类别过滤列并替换另一个数据框列中的值
【发布时间】：2021-11-12 13:09:52
【问题描述】：

我有两个由 3 列组成的数据框 dataframe1 有 2 列，dataframe2 有 1 列

数据框 1

   name category
1  abc   fruit
2  def   animal
3  cfg    nan
4  abc   fruit
5  def   animal
6  cfg    nan
7  abc   fruit
8  def   animal
9  cfg    nan
10 abc   fruit
11 def   animal
12 cfg    nan

数据框 2

  actual_cat
1 plant
2 plant
3 plant
4 plant

现在最终的输出数据帧应该是

  name category
1 abc   fruit
2 def   animal
3 cfg    **PLANT**
4 abc   fruit
5 def   animal
6 cfg    **PLANT**
7 abc   fruit
8 def   animal
9 cfg    **PLANT**
10 abc   fruit
11 def   animal
12 cfg    **PLANT**

我尝试使用过滤条件，例如

data.loc(data(['name']=='cfg') & data2['actual])

但我面临问题。需要帮助

【问题讨论】：

您能否编辑您的问题并将数据框以文本形式放置（以便我们可以复制粘贴它们）？
@AndrejKesely 肯定会立即完成，感谢您的宝贵回复
@AndrejKesely 刚刚更新了数据，请您检查一下
您想用来自df2 的值填充所有cfg 值吗？ df2 值的数量是否与来自 df1 的 cfg 值的数量相同？
@AndrejKesely 是的，实际上是名称，类别列在 3 个数据框中，但是 cfg 和“类别”列的名称列具有空值，因为列名称值在不同的列中，所以我想加入这个cfg==plant 的“植物”值

标签： python pandas join pandas-groupby inner-join

【解决方案1】：

如果数据帧 2 与数据帧 1 的空值长度完全匹配，您可以执行以下操作：

counter = 0 
for i in data[data["category"].isnull()].index:
    data.at[i, "category"] = data2["actual_cat"][counter]
    counter+=1

这循环遍历数据帧 1 的索引，其中值为 NaN，然后将列 category 上的特定索引的值重新分配给数据帧 2 的列 actual_cat 和索引 counter，在每个循环中迭代.如果您有更多 NaN 值，那么您在数据框 2 中执行值，那么您将遇到错误。

根据您的示例，您可能能够逃脱：

data.fillna("plant", inplace=True)

这会将数据框 1 中的所有缺失值填充到“植物”

【讨论】：