【问题标题】:Scala spark how do I sum two columnsScala spark我如何对两列求和
【发布时间】:2020-10-14 17:45:05
【问题描述】:

我有一个如下所示的表格:

+------+-----+
|  ColA| ColB|
+------+-----+
|    5 |    1| 
|    8 |    2| 
+------+-----+

我需要添加一个汇总列,将行值广告在一起,如下所示:

+------+-----+-----+
|  ColA| ColB| SUM |
+------+-----+-----+
|    5 |    1|    6|
|    8 |    2|   10|
+------+-----+-----+

这是我的尝试:

var foo = df.withColumn("SUM", sum(df("ColA"), df("ColB")))

但我收到了error: overloaded method value sum with alternatives:

【问题讨论】:

    标签: scala apache-spark apache-spark-sql


    【解决方案1】:

    一种方法如下

    import spark.implicits._
    import org.apache.spark.sql.function._
    
    val data = List((1,5), (4,3), (6,2))
    val df = spark.sparkContext.parallelize(data).toDF("ColA", "ColB")
    
    var foo = df.select("ColA", "ColB")
        .withColumn("SUM", col("ColA") + col("ColB"))
    foo.show()
    /*
    +----+----+---+
    |ColA|ColB|SUM|
    +----+----+---+
    |   1|   5|  6|
    |   4|   3|  7|
    |   6|   2|  8|
    +----+----+---+
    */
    // or
    
    var foo2 = df.selectExpr(
        "ColA",
        "ColB",
        "ColA + ColB as SUM"
      )
    foo2.show()
    /*
    +----+----+---+
    |ColA|ColB|SUM|
    +----+----+---+
    |   1|   5|  6|
    |   4|   3|  7|
    |   6|   2|  8|
    +----+----+---+
    */
    

    【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2016-08-30
    • 2021-12-29
    • 1970-01-01
    • 2021-07-05
    • 1970-01-01
    • 2018-12-03
    • 1970-01-01
    • 2017-11-14
    相关资源
    最近更新 更多