【问题标题】:Spark: create a nested schemaSpark:创建嵌套模式
【发布时间】:2019-11-26 11:50:57
【问题描述】:

用火花,

import spark.implicits._
val data = Seq(
  (1, ("value11", "value12")),
  (2, ("value21", "value22")),
  (3, ("value31", "value32"))
  )

 val df = data.toDF("id", "v1")
 df.printSchema()

结果如下:

root
|-- id: integer (nullable = false)
|-- v1: struct (nullable = true)
|    |-- _1: string (nullable = true)
|    |-- _2: string (nullable = true)

现在如果我想自己创建架构,我应该如何处理?

val schema = StructType(Array(
  StructField("id", IntegerType),
  StructField("nested", ???)
))

谢谢。

【问题讨论】:

    标签: apache-spark dataframe apache-spark-sql schema


    【解决方案1】:

    根据此处的示例: https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/types/StructType.html

     import org.apache.spark.sql._
     import org.apache.spark.sql.types._
    
     val innerStruct =
       StructType(
         StructField("f1", IntegerType, true) ::
         StructField("f2", LongType, false) ::
         StructField("f3", BooleanType, false) :: Nil)
    
     val struct = StructType(
       StructField("a", innerStruct, true) :: Nil)
    
     // Create a Row with the schema defined by struct
     val row = Row(Row(1, 2, true))
    

    在你的情况下,它将是:

    import org.apache.spark.sql._
    import org.apache.spark.sql.types._
    
    val schema = StructType(Array(
      StructField("id", IntegerType),
      StructField("nested", StructType(Array(
          StructField("value1", StringType),
          StructField("value2", StringType)
      )))
    ))
    

    输出:

    StructType(
      StructField(id,IntegerType,true), 
      StructField(nested,StructType(
        StructField(value1,StringType,true), 
        StructField(value2,StringType,true)
      ),true)
    )
    

    【讨论】:

    • 谢谢,这是我的预期。现在另一个问题:从平面数据帧中,是否可以使用 .withColumn() 来构建嵌套部分(并且不使用 RDD 和 Row(Row(1, 2, true)))?
    猜你喜欢
    • 1970-01-01
    • 2022-10-16
    • 2019-05-25
    • 1970-01-01
    • 1970-01-01
    • 2019-01-26
    • 1970-01-01
    • 1970-01-01
    • 2019-11-27
    相关资源
    最近更新 更多