可以将每个不同的列转换为字符串,然后将所有列连接起来:
// ---- data ---
val leftDF = Seq(
(1, 1, 5, 0),
(2, 0, 4, 2)
).toDF("ID", "Col1", "col3", "col7")
val rightDF = Seq(
(1, 2, 10, 0),
(2, 0, 2, 6)
).toDF("ID", "Col1", "col3", "col7")
def getDifferenceForColumn(name: String): Column =
when(
col("l." + name) =!= col("r." + name),
concat(lit("{" + name + ": ["), col("l." + name), lit(","), col("r." + name), lit("]}")))
.otherwise(lit(""))
val diffColumn = leftDF
.columns
.filter(_ != "ID")
.map(name => getDifferenceForColumn(name))
.reduce((l, r) => concat(l,
when(length(r) =!= 0 && length(l) =!= 0, lit(",")).otherwise(lit(""))
, r))
val diffColumnWithBraces = concat(lit("["), diffColumn, lit("]"))
leftDF
.alias("l")
.join(rightDF.alias("r"), Seq("id"))
.select(col("ID"), diffColumnWithBraces.alias("DIFF"))
输出:
+---+------------------------------+
|ID |DIFF |
+---+------------------------------+
|1 |[{Col1: [1,2]},{col3: [5,10]}]|
|2 |[{col3: [4,2]},{col7: [2,6]}] |
+---+------------------------------+
如果列不能有值“}{”,在上面的解决方案中可以改变两个变量,也许性能会更好:
val diffColumns = leftDF
.columns
.filter(_ != "ID")
.map(name => getDifferenceForColumn(name))
val diffColumnWithBraces = concat(lit("["), regexp_replace(concat(diffColumns: _*),"\\}\\{","},{"), lit("]"))