【发布时间】:2018-03-21 15:03:01
【问题描述】:
我想以编程方式给出一定数量的字段,对于某些字段,选择一列并将该字段传递给另一个函数,该函数将返回一个字符串的案例类,字符串。到目前为止我有
val myList = Seq(("a", "b", "c", "d"), ("aa", "bb", "cc","dd"))
val df = myList.toDF("col1","col2","col3","col4")
val fields= "col1,col2"
val myDF = df.select(df.columns.map(c => if (fields.contains(c)) { df.col(s"$c") && someUDFThatReturnsAStructTypeOfStringAndString(df.col(s"$c")).alias(s"${c}_processed") } else { df.col(s"$c") }): _*)
现在这给了我一个例外
org.apache.spark.sql.AnalysisException: cannot resolve '(col1 AND UDF(col1))' due to data type mismatch: differing types in '(col1 AND UDF(col1))' (string and struct< STRING1:string,STRING2:string > )
我要选择
col1 | | col2 | | col3 | col4
"a" | | "b" | | "c" | “d”
【问题讨论】:
-
你为什么不赞成这个?你介意说一下为什么吗?
标签: scala apache-spark apache-spark-sql spark-dataframe apache-spark-2.0