【问题标题】:How to get the coefficients of the best logistic regression in a spark-ml CrossValidatorModel?如何在 spark-ml CrossValidatorModel 中获得最佳逻辑回归的系数?
【发布时间】:2017-06-14 17:28:19
【问题描述】:

我使用逻辑回归和 spark-ml 管道训练了一个简单的 CrossValidatorModel。我可以预测新数据,但我想超越黑匣子,对系数进行一些分析

 val lr = new LogisticRegression().
  setFitIntercept(true).
  setMaxIter(maxIter).
  setElasticNetParam(alpha).
  setStandardization(true).
  setFamily("binomial").
  setWeightCol("weight").
  setFeaturesCol("features").
  setLabelCol("response")

val assembler = new VectorAssembler().
  setInputCols(Array("feat1", "feat2")).
  setOutputCol("features")

val modelPipeline = new Pipeline().
  setStages(Array(assembler,lr))

val evaluator = new BinaryClassificationEvaluator()
  .setLabelCol("response")

然后我定义一个参数网格并在网格上进行训练以获得最佳模型 wrt AUC

val paramGrid = new ParamGridBuilder().
  addGrid(lr.regParam, lambdas).
  build()

val pipeline = new CrossValidator().
  setEstimator(modelPipeline).
  setEvaluator(evaluator).
  setEstimatorParamMaps(paramGrid).
  setNumFolds(nfolds)

val cvModel = pipeline.fit(train)

如何获得最佳逻辑回归模型的系数(beta)?

【问题讨论】:

    标签: scala apache-spark logistic-regression cross-validation apache-spark-ml


    【解决方案1】:

    提取最佳模型:

    val bestModel = cvModel.bestModel match {
      case pm: PipelineModel => Some(pm)
      case _ => None
    }
    

    查找逻辑回归模型:

    val lrm = bestModel
      .map(_.stages.collect { case lrm: LogisticRegressionModel => lrm })
      .flatMap(_.headOption)
    

    提取系数:

    lrm.map(m => (m.intercept, m.coefficients))
    

    快速而肮脏的等价物:

    val lrm: LogisticRegressionModel = cvModel
      .bestModel.asInstanceOf[PipelineModel]
      .stages
      .last.asInstanceOf[LogisticRegressionModel]
    
    (lrm.intercept, lrm.coefficients)
    

    【讨论】:

      猜你喜欢
      • 2018-05-04
      • 2018-07-08
      • 2016-09-26
      • 2023-03-17
      • 1970-01-01
      • 2016-09-13
      • 1970-01-01
      • 2017-12-01
      • 2016-03-24
      相关资源
      最近更新 更多