【问题标题】:How to set an empty struct with all fields null, null in spark如何设置一个空结构,所有字段为空,在火花中为空
【发布时间】:2022-01-18 02:39:11
【问题描述】:

我有这个数据框:

+----+--------------------------------+
|name|dates                           |
+----+--------------------------------+
|A   |[[1994, 12, 11], [,,]]          |
|B   |[[1994, 12, 11], [1994, 12, 15]]|
+----+--------------------------------+

使用此架构:

root
 |-- name: string (nullable = true)
 |-- dates: struct (nullable = true)
 |    |-- start_date: struct (nullable = true)
 |    |    |-- year: integer (nullable = true)
 |    |    |-- month: integer (nullable = true)
 |    |    |-- day: integer (nullable = true)
 |    |-- end_date: struct (nullable = true)
 |    |    |-- year: integer (nullable = true)
 |    |    |-- month: integer (nullable = true)
 |    |    |-- day: integer (nullable = true)

我想把它作为输出 当end_date内的所有字段为null时,设置结束日期为null

+----+--------------------------------+
|name|dates                           |
+----+--------------------------------+
|A   |[[1994, 12, 11],]               |
|B   |[[1994, 12, 11], [1994, 12, 15]]|
+----+--------------------------------+

【问题讨论】:

    标签: scala apache-spark apache-spark-sql


    【解决方案1】:

    您可以通过从现有属性重新创建新结构来更新结构列dates,并使用when 表达式检查所有end_dates 属性是否为空:

    val df2 = df.withColumn(
      "dates",
      struct(
        col("dates.start_date"), // keep start_date
        when(
          Seq("year", "month", "day")
            .map(x => col(s"dates.end_date.$x").isNull)
            .reduce(_ and _),
          lit(null).cast("struct<year:int,month:int,day:int>")
        ).otherwise(col("dates.end_date")).alias("end_date") // set end_date to null if all attr are null
      )
    )
    
    df2.show(false)
    //+----+--------------------------------+
    //|name|dates                           |
    //+----+--------------------------------+
    //|A   |[[1994, 12, 11],]               |
    //|B   |[[1994, 12, 11], [1994, 12, 25]]|
    //+----+--------------------------------+
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2014-08-30
      • 1970-01-01
      • 2012-05-26
      • 2023-02-10
      • 2016-07-22
      • 1970-01-01
      • 2013-07-07
      相关资源
      最近更新 更多