【发布时间】:2013-02-16 20:17:36
【问题描述】:
我想在 J48 中设置 -C 参数并运行三个存储在哈希表中的特征选择算法。我想比较准确率、真阳性、真阴性、F1 等三者的性能。但是当我运行所有特征选择算法时,它们会返回相同的输出......我做错了什么吗?
Hashtable<String, ASEvaluation> search=new Hashtable<String, ASEvaluation>();
Instances training_data = new Instances(new BufferedReader(
new FileReader("test.arff")));
training_data.setClassIndex(training_data.numAttributes() - 1);
topAttributes = new int[training_data.numAttributes()];
AttributeSelectedClassifier classifier = new AttributeSelectedClassifier();
AttributeSelection attsel = new AttributeSelection();
search.put("Infogain", new InfoGainAttributeEval());
search.put("SymmetricalUncertAttribute",new SymmetricalUncertAttributeEval());
search.put("Chisquared",new ChiSquaredAttributeEval());
for(String key : search.keySet()) {
try{
Ranker attribute_search = new Ranker();
J48 base = new J48();
CVParameterSelection ps = new CVParameterSelection();
ps.setClassifier(base);
ps.setNumFolds(5);
ps.addCVParameter("C 0.1 0.5 5");
ps.buildClassifier(training_data);
System.out.println("---------------- " + search.get(key).toString() + " ----------------");
classifier.setClassifier(ps);
classifier.setEvaluator(search.get(key));
classifier.setSearch(attribute_search);
attsel.setEvaluator(search.get(key));
attsel.setSearch(attribute_search);
attsel.setInputFormat(training_data);
Evaluation evaluation = new Evaluation(training_data);
evaluation.crossValidateModel(ps, training_data, 10, new Random(1));
System.out.println("\nevaluation ->");
System.out.println(evaluation.toSummaryString());
System.out.println("MAE: " + evaluation.meanAbsoluteError());
} catch(Exception e) {
e.printStackTrace();
}
}
【问题讨论】:
-
您是否检查过特征选择算法是否确实对您的数据集产生了影响?他们真的可以返回所有相同的功能集。即使他们不这样做,
J48也可能只是选择要在生成的树中使用的相同属性子集。
标签: machine-learning classification weka text-mining