【发布时间】:2016-05-03 03:42:37
【问题描述】:
我是 Spark 的新手,我当前的版本是 1.3.1。我想用PySpark实现逻辑回归,所以,我从Spark Python MLlib找到了这个例子
from pyspark.mllib.classification import LogisticRegressionWithLBFGS
from pyspark.mllib.regression import LabeledPoint
from numpy import array
# Load and parse the data
def parsePoint(line):
values = [float(x) for x in line.split(' ')]
return LabeledPoint(values[0], values[1:])
data = sc.textFile("data/mllib/sample_svm_data.txt")
parsedData = data.map(parsePoint)
# Build the model
model = LogisticRegressionWithLBFGS.train(parsedData)
# Evaluating the model on training data
labelsAndPreds = parsedData.map(lambda p: (p.label, model.predict(p.features)))
trainErr = labelsAndPreds.filter(lambda (v, p): v != p).count() / float(parsedData.count())
print("Training Error = " + str(trainErr))
我发现model的属性是:
In [21]: model.<TAB>
model.clearThreshold model.predict model.weights
model.intercept model.setThreshold
如何获得逻辑回归的系数?
【问题讨论】:
标签: python apache-spark pyspark apache-spark-mllib