【发布时间】:2019-12-23 22:56:28
【问题描述】:
我在尝试将我的 Dataframe 转换为 Dataset 以便我可以运行 Kmeans 聚类算法时遇到问题。我的代码在上面
import org.apache.spark.sql.{Dataset, Encoder, Encoders}
case class MyCase(sId: Int, tId:Int, label:Double, sAuthors:String, sYear:Int, sJournal:String,tAuthors:String, tYear:Int,tJournal:String, yearDiff:Int,nCommonAuthors:Int,isSelfCitation:Boolean
,isSameJournal:Boolean,cosSimTFIDF:Double,sInDegrees:Int,sNeighbors:Array[Long],tInDegrees:Int,tNeighbors:Array[Long],inDegreesDiff:Int,commonNeighbors:Int,jaccardCoefficient:Double)
val men = Encoders[MyCase]
val ds: Dataset[MyCase] = transformedTrainingSetDF.as(men)
尝试这样做,我收到以下错误:
错误:(208, 23) 对象编码器不接受类型参数。
val men = 编码器[MyCase]
【问题讨论】:
-
你在哪里定义案例类?您是输入到
spark-shell的行还是它们是 Spark 应用程序的一部分?
标签: scala apache-spark apache-spark-sql