这会在 Dataframe 中创建一个“ArrayType”列。
来自scaladocs:An ArrayType object comprises two fields, elementType: DataType and containsNull: Boolean. The field of elementType is used to specify the type of array elements. The field of containsNull is used to specify if the array has null values.
因此,ArrayType 只接受 Array 中的一种类型的列。
如果有不同类型的值传递给array 函数,它将首先尝试将列转换为字段中最适合的类型。如果列完全不兼容,则 Spark 将抛出异常。下面的例子
val df = spark.sql("""select array(45, 46L, 45.45)""")
df.printSchema()
root
|-- array(CAST(45 AS DECIMAL(22,2)), CAST(46 AS DECIMAL(22,2)), CAST(45.45 AS DECIMAL(22,2))): array (nullable = false)
| |-- element: decimal(22,2) (containsNull = false)
df: org.apache.spark.sql.DataFrame = [array(CAST(45 AS DECIMAL(22,2)), CAST(46 AS DECIMAL(22,2)), CAST(45.45 AS DECIMAL(22,2))): array<decimal(22,2)>]
下一个,错误:
val df = spark.sql("""select array(45, 46L, True)""")
df.printSchema()
org.apache.spark.sql.AnalysisException: cannot resolve 'array(45, 46L, true)' due to data type mismatch: input to function array should all be the same type, but it's [int, bigint, boolean]; line 1 pos 7;
'Project [unresolvedalias(array(45, 46, true), None)]
+- OneRowRelation
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3.applyOrElse(CheckAnalysis.scala:126)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3.applyOrElse(CheckAnalysis.scala:111)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:77)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:303)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:301)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:301)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:354)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:352)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:301)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:94)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:94)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$3.apply(QueryPlan.scala:106)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$3.apply(QueryPlan.scala:106)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:77)