【发布时间】:2022-01-04 03:57:02
【问题描述】:
我想加入两个数据集,如下所示:
数据集 1:
PIN LOCATION
1234 Germany
2356 Poland
2894 England
3452 Bloomberg
数据集 2:
MAIL STARTLOCATION ENDLOCATION
ami@test.com 1234 2894
asd@test.com 2356 1234
cddv@test.com 3452 2894
输出应该是:
MAIL STARTLOCATION ENDLOCATION LOCATION1 LOCATION2
ami@test.com 1234 2894 Germany England
asd@test.com 2356 1234 poland Germany
cddv@test.com 3452 2894 Bloomberg England
试过了:
condi = [((df1.PIN == df2.STARTLOCATION) | (df1.PIN == df2.ENDLOCATION))]
joindata = df1.join(df2, on = condi, how = 'outer').select('*')
但它在LOCATION1 和LOCATION2 中给出NULL
【问题讨论】:
标签: python-3.x apache-spark pyspark apache-spark-sql