【发布时间】:2019-01-14 00:16:21
【问题描述】:
我正在尝试从 Spark 中的示例将简单的 DataFrame 转换为 DataSet: https://spark.apache.org/docs/latest/sql-programming-guide.html
case class Person(name: String, age: Int)
import spark.implicits._
val path = "examples/src/main/resources/people.json"
val peopleDS = spark.read.json(path).as[Person]
peopleDS.show()
但是会出现以下问题:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot up cast `age` from bigint to int as it may truncate
The type path of the target object is:
- field (class: "scala.Int", name: "age")
- root class: ....
谁能帮帮我?
编辑 我注意到使用 Long 而不是 Int 可以工作! 这是为什么呢?
还有:
val primitiveDS = Seq(1,2,3).toDS()
val augmentedDS = primitiveDS.map(i => ("var_" + i.toString, (i + 1).toLong))
augmentedDS.show()
augmentedDS.as[Person].show()
打印:
+-----+---+
| _1| _2|
+-----+---+
|var_1| 2|
|var_2| 3|
|var_3| 4|
+-----+---+
Exception in thread "main"
org.apache.spark.sql.AnalysisException: cannot resolve '`name`' given input columns: [_1, _2];
谁能帮我理解这里?
【问题讨论】:
标签: scala apache-spark apache-spark-sql