【发布时间】:2022-01-10 13:28:22
【问题描述】:
我想实现这里描述的内容:stackoverflow question,但只能使用标准 pandas。
我有两个数据框: 拳头
first_employee target_employee relationship
0 Andy Claude 0
1 Andy Frida 20
2 Andy Georgia -10
3 Andy Joan 30
4 Andy Lee -10
5 Andy Pablo -10
6 Andy Vincent 20
7 Claude Frida 0
8 Claude Georgia 90
9 Claude Joan 0
10 Claude Lee 0
11 Claude Pablo 10
12 Claude Vincent 0
13 Frida Georgia 0
14 Frida Joan 0
15 Frida Lee 0
16 Frida Pablo 50
17 Frida Vincent 60
18 Georgia Joan 0
19 Georgia Lee 10
20 Georgia Pablo 0
21 Georgia Vincent 0
22 Joan Lee 70
23 Joan Pablo 0
24 Joan Vincent 10
25 Lee Pablo 0
26 Lee Vincent 0
27 Pablo Vincent -20
第二:
first_employee target_employee book_count
0 Vincent Frida 2
1 Vincent Pablo 1
2 Andy Claude 1
3 Andy Joan 1
4 Andy Pablo 1
5 Andy Lee 1
6 Andy Frida 1
7 Andy Georgia 1
8 Claude Georgia 3
9 Joan Lee 3
10 Pablo Frida 2
我想加入这两个数据帧,这样我的最终数据帧与第一个数据帧相同,但它还有 book_count 列和相应的值(如果不可用,则为 NaN)。
我已经写过类似的东西:joined_df = first_df.merge(second_df, on = ['first_employee', 'target_employee'], how = 'outer') 我得到了:
first_employee target_employee relationship book_count
0 Andy Claude 0.0 1.0
1 Andy Frida 20.0 1.0
2 Andy Georgia -10.0 1.0
3 Andy Joan 30.0 1.0
4 Andy Lee -10.0 1.0
5 Andy Pablo -10.0 1.0
6 Andy Vincent 20.0 NaN
7 Claude Frida 0.0 NaN
8 Claude Georgia 90.0 3.0
9 Claude Joan 0.0 NaN
10 Claude Lee 0.0 NaN
11 Claude Pablo 10.0 NaN
12 Claude Vincent 0.0 NaN
13 Frida Georgia 0.0 NaN
14 Frida Joan 0.0 NaN
15 Frida Lee 0.0 NaN
16 Frida Pablo 50.0 NaN
17 Frida Vincent 60.0 NaN
18 Georgia Joan 0.0 NaN
19 Georgia Lee 10.0 NaN
20 Georgia Pablo 0.0 NaN
21 Georgia Vincent 0.0 NaN
22 Joan Lee 70.0 3.0
23 Joan Pablo 0.0 NaN
24 Joan Vincent 10.0 NaN
25 Lee Pablo 0.0 NaN
26 Lee Vincent 0.0 NaN
27 Pablo Vincent -20.0 NaN
28 Vincent Frida NaN 2.0
29 Vincent Pablo NaN 1.0
30 Pablo Frida NaN 2.0
它有点接近我想要实现的目标。但是,first_employee 和 target_employee 中值的顺序无关紧要,所以如果在第一个数据框中我有 (Frida,Vincent) 和第二个 (Vincent, Frida),这两个应该合并在一起(重要的是值,而不是按列的顺序)。
在我生成的数据框中,我得到了三个额外的行:
28 Vincent Frida NaN 2.0
29 Vincent Pablo NaN 1.0
30 Pablo Frida NaN 2.0
这是我合并的结果,它考虑“有序”值列以进行连接:这 3 个额外的行应该合并到已经可用的对 (Frida, Vincent) (Pablo, Vincent) 和 (Frida, Pablo)。
有没有办法只使用标准的pandas 函数? (我开头引用的问题使用sqldf)
【问题讨论】:
标签: python pandas dataframe join merge