【发布时间】:2016-10-17 01:19:38
【问题描述】:
我正在尝试“合并连接”两个 pandas 数据帧。基本上,我想堆叠两个 DataFrame,但只保留每个 DataFrame 中与另一个 DataFrame 中的值匹配的行。比如:
data1:
+---+------------+-----------+-------+
| | first_name | last_name | class |
+---+------------+-----------+-------+
| 0 | Alex | Anderson | 1 |
| 1 | Amy | Ackerman | 2 |
| 2 | Allen | Ali | 3 |
| 3 | Alice | Aoni | 4 |
| 4 | Andrew | Andrews | 4 |
| 5 | Ayoung | Atiches | 5 |
+---+------------+-----------+-------+
data2:
+---+------------+-----------+-------+
| | first_name | last_name | class |
+---+------------+-----------+-------+
| 0 | Billy | Bonder | 4 |
| 1 | Brian | Black | 5 |
| 2 | Bran | Balwner | 6 |
| 3 | Bryce | Brice | 7 |
| 4 | Betty | Btisan | 8 |
| 5 | Bruce | Bronson | 8 |
+---+------------+-----------+-------+
那么在data1 和data2 上执行此操作后生成的数据帧应如下所示:
result:
+---+------------+-----------+-------+
| | first_name | last_name | class |
+---+------------+-----------+-------+
| 3 | Alice | Aoni | 4 |
| 4 | Andrew | Andrews | 4 |
| 5 | Ayoung | Atiches | 5 |
| 0 | Billy | Bonder | 4 |
| 1 | Brian | Black | 5 |
+---+------------+-----------+-------+
基本上,我正在尝试合并两个数据集,然后堆叠列。我可以想到几种方法来做到这一点,但它们都是 hack-y。我可以合并data1 和data2,然后将列堆叠起来,或者使用如下地图:
map1 = data1['subject_id'].map(lambda x: x in list(data2['subject_id']))
map2 = data2['subject_id'].map(lambda x: x in list(data1['subject_id']))
pd.concat([data1[map1], data2[map2]])
但是有没有更优雅的解决方案呢?
【问题讨论】: