Apache Spark SQL - 多个数组爆炸和 1:1 映射

【问题标题】：Apache Spark SQL - Multiple arrays explode and 1:1 mappingApache Spark SQL - 多个数组爆炸和 1:1 映射
【发布时间】：2018-04-13 04:58:48
【问题描述】：

我对 Apache Spark SQL 很陌生，并试图实现以下目标。我有以下 DF，我想将其转换为中间 DF，然后转换为 json。

array [a,b,c,d,e] and  array [1,2,3,4,5]

需要他们成为

a 1
b 2
c 3

尝试了爆炸选项，但我只爆炸了一个数组。

感谢您的帮助..

【问题讨论】：

也许this answer 有帮助
你好@sarashan 下面的答案对你有用吗？

标签： apache-spark-sql

【解决方案1】：

要在 Spark 中连接两个数据框，您需要使用两个数据框上都存在的公共列，并且由于您没有一个列，因此您需要创建它。由于版本 1.6.0 Spark 通过monotonically_increasing_id() 函数支持此功能。下面的代码说明了这种情况：

    import org.apache.spark.sql.functions._
    import spark.implicits._

    val df = Seq("a","b","c","d","e")
      .toDF("val1")
      .withColumn("id", monotonically_increasing_id)

    val df2 = Seq(1, 2, 3, 4, 5)
      .toDF("val2")
      .withColumn("id", monotonically_increasing_id)

    df.join(df2, "id").select($"val1", $"val2").show(false)

输出：

+----+----+
|val1|val2|
+----+----+
|a   |1   |
|b   |2   |
|c   |3   |
|d   |4   |
|e   |5   |
+----+----+

祝你好运

【讨论】：