【发布时间】:2014-10-18 05:14:21
【问题描述】:
我是 Spark 的新手。我能够训练数据集。但无法使用经过训练的数据集进行预测。
这是训练 1800x4000 矩阵数据的代码。
import org.apache.spark.mllib.classification.SVMWithSGD
import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
// Load and parse the data
val data = sc.textFile("data/mllib/ridge-data/myfile.txt")
val parsedData = data.map { line =>
val parts = line.split(' ')
LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble)))
}
val firstDataPoint = parsedData.take(1)(0)
// Building the model
val numIterations = 100
val model = SVMWithSGD.train(parsedData, numIterations)
//val model = LinearRegressionWithSGD.train(parsedData,numIterations)
val labelAndPreds = parsedData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / parsedData.count
println("Training Error = " + trainErr)
现在我加载用于执行预测的数据:数据是 1800 个值的向量
val test = sc.textFile("data/mllib/ridge-data/data.txt")
但不确定如何使用这些数据进行预测。请帮忙。
【问题讨论】: