【发布时间】:2018-05-18 01:17:39
【问题描述】:
目前我正在使用以下代码来训练分类器模型:
final String iterations = "1000";
final String cutoff = "0";
InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/classifierA.txt"));
ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
TrainingParameters params = new TrainingParameters();
params.put(TrainingParameters.ITERATIONS_PARAM, iterations);
params.put(TrainingParameters.CUTOFF_PARAM, cutoff);
params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);
DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());
OutputStream modelOut = new BufferedOutputStream(new FileOutputStream("src/main/resources/models/model.bin"));
model.serialize(modelOut);
return model;
一切顺利,每次运行后我都会得到以下输出:
Indexing events with TwoPass using cutoff of 0
Computing event counts... done. 1474 events
Indexing... done.
Collecting events... Done indexing in 0,03 s.
Incorporating indexed data for training...
done.
Number of Event Tokens: 1474
Number of Outcomes: 2
Number of Predicates: 4149
Computing model parameters...
Stats: (998/1474) 0.6770691994572592
...done.
有人能解释一下这个输出是什么意思吗?如果它说明了准确性?
【问题讨论】:
标签: java text machine-learning opennlp categorization