【发布时间】:2020-06-17 06:51:12
【问题描述】:
我有两个表,我只想从源表中读取唯一记录,这两个表都有空值。
source table:
name| age| degree| dept
aaa | 20| ece |null
bbb |20 |it |null
ccc |30 |mech| null
target table
name| age |degree |dept
aaa |20| ece |null
bbb |20 |it| null
soruce_df.join(target_df,seq("name","age","degree"),"leftanti") - >工作
soruce_df.join(target_df,seq("name","age","degree","dept"),"leftanti") ->不工作
Now i need to pick only 3rd record from source ,
If i use name ,age ,degree as my joining key , it's working as expected
But when i include dept it's picking all the records from source table.
Please help me.
【问题讨论】:
标签: apache-spark pyspark apache-spark-sql