Spark，在Scala中添加具有相同值的新列[重复]答案

【问题标题】：Spark, add new Column with the same value in Scala [duplicate]Spark，在Scala中添加具有相同值的新列[重复]
【发布时间】：2016-11-29 23:13:34
【问题描述】：

我对 Spark-Scala 环境中的 withColumn 函数有一些问题。我想像这样在我的 DataFrame 中添加一个新列：

+---+----+---+
|  A|   B|  C|
+---+----+---+
|  4|blah|  2|
|  2|    |  3|
| 56| foo|  3|
|100|null|  5|
+---+----+---+

成为：

+---+----+---+-----+
|  A|   B|  C|  D  |
+---+----+---+-----+
|  4|blah|  2|  750|
|  2|    |  3|  750|
| 56| foo|  3|  750|
|100|null|  5|  750|
+---+----+---+-----+

一个值中的列 D 为我的 DataFrame 中的每一行重复 N 次。

代码是这样的：

var totVehicles : Double = df_totVehicles(0).getDouble(0); //return 750

totVehicles 变量返回正确的值，它的工作原理！

第二个DataFrame要计算2个字段（id_zipcode，n_vehicles），并添加第三列（相同的值-750）：

var df_nVehicles =
df_carPark.filter(
      substring($"id_time",1,4) < 2013
    ).groupBy(
      $"id_zipcode"
    ).agg(
      sum($"n_vehicles") as 'n_vehicles
    ).select(
      $"id_zipcode" as 'id_zipcode,
      'n_vehicles
    ).orderBy(
      'id_zipcode,
      'n_vehicles
    );

最后，我用withColumn函数添加新列：

var df_nVehicles2 = df_nVehicles.withColumn(totVehicles, df_nVehicles("n_vehicles") + df_nVehicles("id_zipcode"))

但是 Spark 给我返回了这个错误：

 error: value withColumn is not a member of Unit
         var df_nVehicles2 = df_nVehicles.withColumn(totVehicles, df_nVehicles("n_vehicles") + df_nVehicles("id_zipcode"))

你能帮帮我吗？非常感谢！

【问题讨论】：

标签： scala apache-spark spark-dataframe

【解决方案1】：

lit 函数用于将文字值添加为列

import org.apache.spark.sql.functions._
df.withColumn("D", lit(750))

【讨论】：