【发布时间】:2017-11-03 19:34:22
【问题描述】:
我正在尝试使用 Mallet-Naive-Bayes 分类器 API。我已经对训练集和测试集进行了如下建模
- 培训:[ID] [标签] [数据]
- 测试:[ID] [ ] [数据]
下面是我使用的代码:
public static void main(String[] args) throws FileNotFoundException {
classify();
System.out.println("Finished");
}
public static void classify() throws FileNotFoundException{
//prepare instance transformation pipeline
ArrayList<Pipe> pipes = new ArrayList<Pipe>();
pipes.add(new Target2Label());
pipes.add(new CharSequence2TokenSequence());
pipes.add(new TokenSequence2FeatureSequence());
pipes.add(new FeatureSequence2FeatureVector());
SerialPipes pipe = new SerialPipes(pipes);
//prepare training instances
InstanceList trainingInstanceList = new InstanceList(pipe);
trainingInstanceList.addThruPipe(new CsvIterator(new FileReader("resources/training.csv"), "(\\w+)\\s+(\\w+)\\s+(.*)", 3, 2, 1)); // (data, label, name) field indices ));
//prepare test instances
InstanceList testingInstanceList = new InstanceList(pipe);
testingInstanceList.addThruPipe(new CsvIterator(new FileReader("resources/testing.csv"), "(\\w+)\\s+(\\w+)\\s+(.*)", 3, 2, 1));
ClassifierTrainer trainer = new NaiveBayesTrainer();
Classifier classifier = trainer.train(trainingInstanceList);
for(Instance testInstance :testingInstanceList){
Labeling labeling = (Labeling) classifier.classify(testInstance);
Label l = labeling.getBestLabel();
System.out.println(testInstance + " = " + l);
}
System.out.println("Accuracy: " + classifier.getAccuracy(testingInstanceList));
}
}
它以某种方式向我抛出了一个错误,即 Line 'x' does not match regex。我理解这是导入数据时的问题。但是,使用 mallet 时表示训练和测试集的实际格式是什么。
【问题讨论】:
标签: classification java mallet