【发布时间】:2016-12-25 11:12:02
【问题描述】:
我目前正在使用外部连接合并两个数据框。但是,合并后,即使合并的列包含相同的值,我也会看到所有行都是重复的。
具体来说,我有以下代码。
merged_df = pd.merge(df1, df2, on=['email_address'], how='inner')
这是两个数据框和结果。
df1
email_address name surname
0 john.smith@email.com john smith
1 john.smith@email.com john smith
2 elvis@email.com elvis presley
df2
email_address street city
0 john.smith@email.com street1 NY
1 john.smith@email.com street1 NY
2 elvis@email.com street2 LA
merged_df
email_address name surname street city
0 john.smith@email.com john smith street1 NY
1 john.smith@email.com john smith street1 NY
2 john.smith@email.com john smith street1 NY
3 john.smith@email.com john smith street1 NY
4 elvis@email.com elvis presley street2 LA
5 elvis@email.com elvis presley street2 LA
我的问题是,不应该这样吗?
这就是我希望我的merged_df 的样子。
email_address name surname street city
0 john.smith@email.com john smith street1 NY
1 john.smith@email.com john smith street1 NY
2 elvis@email.com elvis presley street2 LA
有什么方法可以实现吗?
【问题讨论】:
-
我的评论可能看起来很傻,但你的合并不应该是 merge_list = pd.merge(list_1 , list_2 , on=['email_address'], how='inner') 吗?
-
是的,我的描述有误,已修复!无论如何,我目前在 python 中的查询正如你所说:D 谢谢!
标签: python python-2.7 python-3.x pandas merge