比较两个数据框，找到共同元素，如果不存在则填充列值答案

【问题标题】：Compare two data frames, find the common elements, and fill column value if not present比较两个数据框，找到共同元素，如果不存在则填充列值
【发布时间】：2021-08-24 11:37:50
【问题描述】：

所以我有两个数据框，有一些常用的关键字。

例如：

df1 = {'keyword': ['Computer','Phone','Printer'],
       'Price1':   [1200,800,200],
       'category':['first','second','first']
       }


df2= {'keyword': ['Computer','Phone','Printer','chair'],
      'Price2': [1200,800,200,40]
      }

正如您在上面看到的，一个 df 具有类别功能，而另一个没有。所以我想做的是结合两个dfs，保持公共项目不变，如果一个df中存在一些关键字（在我们的例子中是'chair'），而在另一个df中没有，添加df中的值该关键字存在，并用特定值填充该分类特征（类别），例如“第三”。

【问题讨论】：

标签： python pandas dataframe sklearn-pandas data-wrangling

【解决方案1】：

虽然不完全清楚，但我想你想要combine_first：

df2.combine_first(df1)

注意。我首先使用 dfX = pd.DataFrame(dfX)

将字典转换为数据框

输出：

   Price1  Price2 category   keyword
0  1200.0    1200    first  Computer
1   800.0     800   second     Phone
2   200.0     200    first   Printer
3     NaN      40      NaN     chair

或者，使用merge：

df1.merge(df2, on='keyword', how='outer')

输出：

    keyword  Price1 category  Price2
0  Computer  1200.0    first    1200
1     Phone   800.0   second     800
2   Printer   200.0    first     200
3     chair     NaN      NaN      40

【讨论】：

【解决方案2】：

根据 mozway 的回答，如果商品的价格在 DataFrames 中没有变化，则无需在列名中指定 Price1 和 Price2。此外，在加入数据后，您可以使用 fillna() 使用您想要的任何单词在 Category 列中填充剩余的 NA。

这是为您简化的代码：

import pandas as pd


df1 = pd.DataFrame({'keyword': ['Computer','Phone','Printer'],
       'Price':   [1200,800,200],
       'category':['first','second','first']
       })


df2 = pd.DataFrame({'keyword': ['Computer','Phone','Printer','chair'],
      'Price': [1200,800,200,40]
      })

df_combined = df1.combine_first(df2)

# Arbitrarily sets the word for unknown categories
keyword = "third"

df_combined["category"].fillna(keyword, inplace=True)

这是它的输出：

    Price category   keyword
0  1200.0    first  Computer
1   800.0   second     Phone
2   200.0    first   Printer
3    40.0    third     chair

【讨论】：