【问题标题】:Flag similarities between dataframes in new column标记新列中数据框之间的相似性
【发布时间】:2016-07-07 15:24:58
【问题描述】:
我想比较两个不同长度的 pandas DataFrame 并确定匹配的索引号。当值匹配时,我想在新列中标记这些值。
df1:
Index Column 1
41660 Apple
41935 Banana
42100 Strawberry
42599 Pineapple
df2:
Index Column 1
42599 Pineapple
Output:
Index Column 1 'Matching Index?'
41660 Apple
41935 Banana
42100 Strawberry
42599 Pineapple True
【问题讨论】:
标签:
python
pandas
jupyter
【解决方案1】:
如果这些确实是索引,那么您可以在索引上使用intersection:
In [61]:
df1.loc[df1.index.intersection(df2.index), 'flag'] = True
df1
Out[61]:
Column 1 flag
Index
41660 Apple NaN
41935 Banana NaN
42100 Strawberry NaN
42599 Pineapple True
否则使用isin:
In [63]:
df1.loc[df1['Index'].isin(df2['Index']), 'flag'] = True
df1
Out[63]:
Index Column 1 flag
0 41660 Apple NaN
1 41935 Banana NaN
2 42100 Strawberry NaN
3 42599 Pineapple True
【解决方案2】:
+1 @EdChum 的回答。如果您可以在匹配列中使用与 True 不同的值,请尝试:
>>> df1.merge(df2,how='outer',indicator='Flag')
Index Column Flag
0 41660 Apple left_only
1 41935 Banana left_only
2 42100 Strawberry left_only
3 42599 Pineapple both
【解决方案3】:
使用 isin() 方法:
import pandas as pd
df1 = pd.DataFrame(data=[
[41660, 'Apple'],
[41935, 'Banana'],
[42100, 'Strawberry'],
[42599, 'Pineapple'],
]
, columns=['Index', 'Column 1'])
df2 = pd.DataFrame(data=[
[42599, 'Pineapple'],
]
, columns=['Index', 'Column 1'])
df1['Matching'] = df1['Index'].isin(df2['Index'])
print(df1)
输出:
Index Column 1 Matching
0 41660 Apple False
1 41935 Banana False
2 42100 Strawberry False
3 42599 Pineapple True