【发布时间】:2015-12-05 12:14:32
【问题描述】:
我编写了一段机器学习代码,可以在 Scala shell 上完美运行。我正在使用 SBT 编译代码并创建 JAR。我使用了示例中的一些代码(在 Spark 例如 LocalLR 和 SparkPI 中)来尝试在新项目文件夹中编译代码。他们都成功编译,但由于某些原因我的代码没有编译。我遵循所有目录约定,但仍然没有成功。
import org.apache.spark.SparkContext
import org.apache.spark.mllib.evaluation._
import org.apache.spark.mllib.tree._
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.tree.model._
import org.apache.spark.rdd._
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.mllib.classification.LogisticRegressionModel
object PredictOOS {
def getMetrics(model: DecisionTreeModel, data: RDD[LabeledPoint]):
MulticlassMetrics = {
val predictionsAndLabels = data.map(example =>
(model.predict(example.features), example.label)
)
new MulticlassMetrics(predictionsAndLabels)
}
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Predict OOS")
val spark = new SparkContext(conf)
val data = spark.textFile("D:/data/g1-svm.csv")
val parsedData = data.map { line =>
val parts = line.split(',').map(_.toDouble)
LabeledPoint(parts(0), Vectors.dense(parts.tail))
}
val splits = parsedData.randomSplit(Array(0.8, 0.2), seed = 11L)
val training = splits(0).cache()
val test = splits(1)
val model = DecisionTree.trainClassifier(training, 2, Map[Int,Int] (), "gini", 20, 300)
val metrics = getMetrics(model, test)
println(" confusionMatrix is generated")
spark.stop()
}
}
下面给出的错误
D:\ScalaApps\sparklr>cd ../oos
D:\ScalaApps\oos>sbt
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; sup
port was removed in 8.0
[info] Set current project to Proj_oos (in build file:/D:/ScalaApps/oos/)
> compile
[info] Compiling 1 Scala source to D:\ScalaApps\oos\target\scala-2.11\classes...
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:5: not found: type MulticlassM
etrics
[error] MulticlassMetrics = {
[error] ^
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:4: not found: type DecisionTre
eModel
[error] def getMetrics(model: DecisionTreeModel, data: RDD[Label
edPoint]):
[error] ^
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:4: not found: type RDD
[error] def getMetrics(model: DecisionTreeModel, data: RDD[Label
edPoint]):
[error] ^
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:9: not found: type MulticlassM
etrics
[error] new MulticlassMetrics(predictionsAndLabels)
[error] ^
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:19: not found: value LabeledPo
int
[error] LabeledPoint(parts(0), Vectors.dense(parts.tail)
)
[error] ^
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:19: not found: value Vectors
[error] LabeledPoint(parts(0), Vectors.dense(parts.tail)
)
[error] ^
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:25: not found: value DecisionT
ree
[error] val model = DecisionTree.trainClassifier(trainin
g, 2, Map[Int,Int](), "gini", 20, 300)
[error] ^
[error] 7 errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 5 s, completed Dec 4, 2015 10:39:22 PM
>
如果我遗漏了什么,请提出建议。我被困在这个编译部分很长时间了..任何帮助将不胜感激
这是对原始帖子的修改。上面的代码编译成功,但是当我将输出写入文件时它失败了。
metrics.confusionMatrix.saveAsTextFile("D:/spark4/confMatrix2")
错误
D:\ScalaApps\oos>sbt
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; sup
port was removed in 8.0
[info] Set current project to Proj_oos (in build file:/D:/ScalaApps/oos/)
> compile
[info] Compiling 1 Scala source to D:\ScalaApps\oos\target\scala-2.10\classes...
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:44: value saveAsTextFile is no
t a member of org.apache.spark.mllib.linalg.Matrix
[error] metrics.confusionMatrix.saveAsTextFile("D:/spark
4/confMatrix2")
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 5 s, completed Dec 5, 2015 9:21:03 AM
>
是否需要导入另一个包才能使 saveAsTextFile 工作?
【问题讨论】:
-
错误信息是什么?
-
刚刚编辑了带有错误的原件
-
看起来您在 build.sbt 中缺少 spark 依赖项
-
name := "Proj_oos" 版本 := "1.0" scalaVersion := "2.11.7" libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.0 "
-
这是我的 sbt 文件。我正在运行 Spark 1.4.0 和 Scala 2.11.7
标签: scala apache-spark sbt