【问题标题】:How to solve the crossjoin self join output using spark dsl如何使用 spark dsl 解决交叉连接自连接输出
【发布时间】:2021-08-23 23:31:18
【问题描述】:
import org.apache.spark.SparkConf
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.expressions._

object crossjoin {
  def main(args:Array[String]):Unit= {

    val spark: SparkSession = SparkSession.builder()
      .master("local[*]")
      .appName("SparkByExamples.com")
      .getOrCreate()
      var sparkConf: SparkConf = null

 sparkConf = new SparkConf().set("spark.sql.crossJoin.enabled", "true")
      spark.sparkContext.setLogLevel("ERROR")
  import spark.implicits._
 
 
  val df1 = List("IN","PK", "AU","SL").toDF("country")
  df1.show()

//df1.withColumn("combinations", //collect_set("country").over(Window.orderBy()))
//.show(false)
  
}
}

Input:
+-------+
|country|
+-------+
|     IN|
|     PK|
|     AU|
|     SL|
+-------+

output

+--------+
|  result|
+--------+
|AU vs SL|
|AU vs PK|
|AU vs IN|
|IN vs PK|
+--------+

结果不应包含重复项。我认为应该执行一些交叉连接。我试过但无法解决。我得到了这个 sql 查询。

select concat(c1.country,'vs',c2.country) as result from country c1 
left join country c2 on c1.country!=c2.coutry 
where c1.country!='PK' and c2.country!='IN' and (c1.country!='SL' or c2.country='PK') 
order by result

【问题讨论】:

  • 重复你的意思是SL vs AUAU vs SL的重复?为什么不输出IN vs SLSL vs PK

标签: scala apache-spark apache-spark-sql


【解决方案1】:

您可以使用窗口功能来获得所需的结果。

val spark = SparkSession.builder().master("local[*]").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

import spark.implicits._
import org.apache.spark.sql.expressions._
import org.apache.spark.sql.functions._

List("IN", "PK", "AU", "SL").toDF("country")
// Combinations column will have array[country], countries will be picked up from next rows
   .withColumn("combinations", collect_set("country").
      over(Window.rowsBetween(Window.currentRow + 1, Window.unboundedFollowing)))
    // Last row will have empty array, filter that
   .where(size('combinations) > 0)
    // Concat each element of array column with country
   .withColumn("combinations",
      expr("transform(combinations, c-> concat_ws(' vs ', country, c))"))
    // Explode array to get each element of array in rows.
   .select(explode('combinations))
   .show(false)
/*
+--------+
|col     |
+--------+
|IN vs SL|
|IN vs AU|
|IN vs PK|
|PK vs SL|
|PK vs AU|
|AU vs SL|
+--------+*/

【讨论】:

  • 感谢您的解决方案。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2014-09-11
  • 2011-03-14
  • 1970-01-01
  • 2020-07-11
  • 2013-05-01
  • 1970-01-01
  • 2023-01-24
相关资源
最近更新 更多