【发布时间】:2019-10-14 02:11:40
【问题描述】:
我正在做这样的事情
val domainList = data1.select("columnname","domainvalues").where(col("domainvalues").isNotNull).map(r => (r.getString(0), r.getList[String](1).asScala.toList)).collect()
domainList 的类型应该是 Array[(String, List[String])]
对于输入 DF:
+-------------+----------------------------------------+
|columnname |domainvalues |
+-------------+----------------------------------------+
|predchurnrisk|Very High,High,Medium,Low |
|userstatus |Active,Lapsed,Renew |
|predinmarket |Very High,High,Medium,Low |
|predsegmentid|High flyers,Watching Pennies,Big pockets|
|usergender |Male,Female,Others |
+-------------+----------------------------------------+
我得到的错误是
java.lang.ClassCastException: java.lang.String cannot be cast to scala.collection.Seq
at org.apache.spark.sql.Row$class.getSeq(Row.scala:283)
at org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
at org.apache.spark.sql.Row$class.getList(Row.scala:291)
at org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166)
at com.fis.sdi.ade.batch.SFTP.Test$$anonfun$6.apply(Test.scala:53)
at com.fis.sdi.ade.batch.SFTP.Test$$anonfun$6.apply(Test.scala:53)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.mapelements_doConsume_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.deserializetoobject_doConsume_0$(Unknown Source)
我应该如何解决这个问题?
【问题讨论】:
-
这意味着 Row 的
1元素是 String 而不是 collection。 -
请分享一些输入数据和预期输出。
-
我已经更新了这个问题。你能查一下吗?
标签: scala apache-spark