【发布时间】:2019-12-21 18:45:28
【问题描述】:
我想用这个架构创建一个数据框:
|-- Col1 : string (nullable = true)
|-- Col2 : string (nullable = true)
|-- Col3 : struct (nullable = true)
| |-- 513: long (nullable = true)
| |-- 549: long (nullable = true)
代码:
val someData = Seq(
Row("AAAAAAAAA", "BBBBB", Seq(513, 549))
)
val col3Fields = Seq[StructField](StructField.apply("513",IntegerType, true), StructField.apply("549",IntegerType, true))
val someSchema = List(
StructField("Col1", StringType, true),
StructField("Col2", StringType, true),
StructField("Col3", StructType.apply(col3Fields), true)
)
val someDF = spark.createDataFrame(
spark.sparkContext.parallelize(someData),
StructType(someSchema)
)
someDF.show
但是someDF.show 抛出:
错误执行程序:阶段 0.0 (TID 0) 中任务 0.0 中的异常 java.lang.RuntimeException:编码时出错: java.lang.RuntimeException: scala.collection.immutable.$colon$colon 是 不是 struct 架构的有效外部类型 if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 否则 staticinvoke(类 org.apache.spark.unsafe.types.UTF8String, 字符串类型,来自字符串, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, Col1), StringType), true, false) AS Col1#0 if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String,StringType,fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, Col2), StringType), true, false) AS Col2#1 if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else named_struct(513, if (validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, Col3), StructField(513,IntegerType,true), StructField(549,IntegerType,true)).isNullAt) null else validateexternaltype(getexternalrowfield(validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, Col3), StructField(513,IntegerType,true), StructField(549,IntegerType,true)), 0, 513), IntegerType), 549, 如果 (validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, Col3), StructField(513,IntegerType,true), StructField(549,IntegerType,true)).isNullAt) null else validateexternaltype(getexternalrowfield(validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, Col3), StructField(513,IntegerType,true), StructField(549,IntegerType,true)), 1, 549), IntegerType)) AS Col3#2 在 org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:291)
编辑:
513 和 549 应该是子列名而不是值。这是我期望的输出示例:
someDF.select("Col1","Col2","Col3.*").show
+-----------+--------+------+------+
| Col1| Col1| 513| 549|
+-----------+--------+------+------+
| AAAAAAAAA | BBBBB | 39| 38|
+-----------+--------+------+------+
【问题讨论】:
标签: java scala apache-spark