【问题标题】:How to combine two dataframes df1 and df2 keeping common columns from df2如何组合两个数据框 df1 和 df2 保留来自 df2 的公共列
【发布时间】:2021-04-26 20:40:03
【问题描述】:

我有 df1

+-----+------------------------+---------------------------+----------------------+---------+
|JobId|TotalRecordType1Count   |TotalRecordType2Count      |TotalRecordType3Count |JobStatus|
+-----+------------------------+---------------------------+----------------------+---------+
|  100|                       0|                          0|                     0|Success  |
+-----+------------------------+---------------------------+----------------------+---------+

df2 为

+---------------------------+----------------------+
|TotalRecordType1Count      |TotalRecordType2Count |
+---------------------------+----------------------+
|                        800|                   900|
+---------------------------+----------------------+

df1 和 df2 都只有一行。

我想在常用计数列上合并 df1 和 df2 并保留 df2 的计数

+-----+------------------------+---------------------------+----------------------+---------+
|JobId|TotalRecordType1Count   |TotalRecordType2Count      |TotalRecordType3Count |JobStatus|
+-----+------------------------+---------------------------+----------------------+---------+
|  100|                     800|                        900|                     0|Success  |
+-----+------------------------+---------------------------+----------------------+---------+

【问题讨论】:

    标签: scala dataframe apache-spark apache-spark-sql


    【解决方案1】:

    您可以进行交叉连接并根据需要选择列:

    val cols = df1.columns.map(x => if(df2.columns.contains(x)) df2(x) else df1(x))
    
    result = df1.crossJoin(df2).select(cols:_*)
    
    result.show
    +-----+---------------------+---------------------+---------------------+---------+
    |JobId|TotalRecordType1Count|TotalRecordType2Count|TotalRecordType3Count|JobStatus|
    +-----+---------------------+---------------------+---------------------+---------+
    |  100|                  800|                  900|                    0|  Success|
    +-----+---------------------+---------------------+---------------------+---------+
    

    【讨论】:

      猜你喜欢
      • 2021-06-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-09-26
      • 2021-04-07
      • 2020-09-08
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多