【问题标题】:Spark: How to perform prediction using trained data set (MLLIB: SVMWithSGD)Spark:如何使用经过训练的数据集进行预测(MLLIB:SVMWithSGD)
【发布时间】:2014-10-18 05:14:21
【问题描述】:

我是 Spark 的新手。我能够训练数据集。但无法使用经过训练的数据集进行预测。

这是训练 1800x4000 矩阵数据的代码。

import org.apache.spark.mllib.classification.SVMWithSGD
import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors

// Load and parse the data
val data = sc.textFile("data/mllib/ridge-data/myfile.txt")
val parsedData = data.map { line =>
  val parts = line.split(' ')
  LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble)))
}

val firstDataPoint = parsedData.take(1)(0)

// Building the model
val numIterations = 100
val model = SVMWithSGD.train(parsedData, numIterations)
//val model = LinearRegressionWithSGD.train(parsedData,numIterations)


val labelAndPreds = parsedData.map { point =>
  val prediction = model.predict(point.features)
  (point.label, prediction)
}
val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / parsedData.count
println("Training Error = " + trainErr)

现在我加载用于执行预测的数据:数据是 1800 个值的向量

val test = sc.textFile("data/mllib/ridge-data/data.txt")

但不确定如何使用这些数据进行预测。请帮忙。

【问题讨论】:

    标签: apache-spark prediction


    【解决方案1】:

    首先从 textFile 加载标记点(请记住,您必须使用 saveAsTextFile 保存 RDD):

    JavaRDD<LabeledPoint> test = MLUtils.loadLabeledPoints(init.context, "hdfs://../test/", 30).toJavaRDD();
    JavaRDD<Tuple2<Object, Object>> scoreAndLabels = test.map(
      new Function<LabeledPoint, Tuple2<Object, Object>>() {
        public Tuple2<Object, Object> call(LabeledPoint p) {
          Double score = model.predict(p.features());
          return new Tuple2<Object, Object>(score, p.label());
        }
      }
    );
    

    现在收集分数并对其进行迭代:

    List<Tuple2<Object, Object>> scores = scoreAndLabels.collect();
        for(Tuple2<Object, Object> score : scores){
        System.out.println(score._1 + " \t" + score._2);
    }
    

    它是用 Java 编写的,但也许你可以转换它:)

    但是预测值没有意义: -18.841544889249917 0.0 168.32916035523283 1.0 420.67763915879794 1.0 -974.1942589201286 0.0 71.73602841256813 1.0 233.13636224524993 1.0 -1000.5902168199027 0.0 有人知道他们的意思吗?

    【讨论】:

      猜你喜欢
      • 2020-07-04
      • 1970-01-01
      • 2022-08-19
      • 1970-01-01
      • 2017-07-01
      • 1970-01-01
      • 2017-03-13
      • 2021-02-22
      • 2020-02-27
      相关资源
      最近更新 更多