【问题标题】:Spark Dataframes : CASE statement while using Window PARTITION function SyntaxSpark Dataframes:使用 Window PARTITION 函数语法时的 CASE 语句
【发布时间】:2018-11-06 17:16:30
【问题描述】:

我需要检查一个 Condition 是否 ReasonCode 为 "YES" ,然后使用 ProcessDate 作为 PARTITION 列之一,否则不要。

等效的 SQL 查询如下:

SELECT PNum, SUM(SIAmt) OVER (PARTITION BY PNum,
                                           ReasonCode , 
                                           CASE WHEN ReasonCode = 'YES' THEN ProcessDate ELSE NULL END 
                              ORDER BY ProcessDate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) SumAmt 
from TABLE1

到目前为止,我已经尝试了以下查询,但无法合并条件

Spark Dataframes 中的“CASE WHEN ReasonCode = 'YES' THEN ProcessDate ELSE NULL END”

val df = inputDF.select("PNum")
.withColumn("SumAmt", sum("SIAmt").over(Window.partitionBy("PNum","ReasonCode").orderBy("ProcessDate")))

输入数据:

---------------------------------------
Pnum    ReasonCode  ProcessDate SIAmt
---------------------------------------
1       No          1/01/2016   200
1       No          2/01/2016   300
1       Yes         3/01/2016   -200
1       Yes         4/01/2016   200
---------------------------------------

预期输出:

---------------------------------------------
Pnum    ReasonCode  ProcessDate SIAmt  SumAmt
---------------------------------------------
1       No          1/01/2016   200     200 
1       No          2/01/2016   300     500
1       Yes         3/01/2016   -200    -200
1       Yes         4/01/2016   200      200
---------------------------------------------

关于 Spark 数据框而不是 spark-sql 查询的任何建议/帮助?

【问题讨论】:

    标签: scala apache-spark apache-spark-sql


    【解决方案1】:

    您可以以 api 形式应用与

    完全相同的 SQL 副本
    import org.apache.spark.sql.functions._
    import org.apache.spark.sql.expressions._
    val df = inputDF
      .withColumn("SumAmt", sum("SIAmt").over(Window.partitionBy(col("PNum"),col("ReasonCode"), when(col("ReasonCode") === "Yes", col("ProcessDate")).otherwise(null)).orderBy("ProcessDate")))
    

    您也可以添加.rowsBetween(Long.MinValue, 0) 部分,这应该会给您

    +----+----------+-----------+-----+------+
    |Pnum|ReasonCode|ProcessDate|SIAmt|SumAmt|
    +----+----------+-----------+-----+------+
    |   1|       Yes|  4/01/2016|  200|   200|
    |   1|        No|  1/01/2016|  200|   200|
    |   1|        No|  2/01/2016|  300|   500|
    |   1|       Yes|  3/01/2016| -200|  -200|
    +----+----------+-----------+-----+------+
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-11-30
      • 1970-01-01
      • 1970-01-01
      • 2017-02-20
      • 1970-01-01
      相关资源
      最近更新 更多