【发布时间】:2015-07-04 07:34:57
【问题描述】:
这些是我的数据:
0,2 # Spark is more intelligent about how it operates on data.
1,5 # it always looks to limit how much work it has to do.
2,3 # Sometimes a data analyst just record for the Chicago store.
...
我想从这些数据中提取一个如下所示的矩阵:
0 2
1 5
2 3
...
我试过了:
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("prep").setMaster("local")
val sc = new SparkContext(conf)
val sample1 = sc.textFile("data.txt")
val cnt = sample1.count()
val tt = DenseMatrix.zeros[Double](cnt.toInt,1)
var doc_val = sample1.flatMap({ (line) =>
val tuple = line.split("#")
val ss = tuple(0).split(",")
val docid = ss(0).toInt
val docscore = ss(2)
tt(docid, 0) = docscore
})
println(tt)
}
但它无法编译, 有什么问题?
【问题讨论】:
-
“无法编译” - 需要更多数据。
标签: scala matrix apache-spark