weka 稀疏 arff 文件答案

【问题标题】：weka sparse arff fileweka 稀疏 arff 文件
【发布时间】：2014-08-12 04:12:53
【问题描述】：

我正在制作一个稀疏的 arff 文件，但它不会加载到 Weka。我收到错误消息，我在 @attribute 类行中的值数量错误，它期望 1 并拒绝接收 12。我做错了什么？我的文件如下所示：

%ARFF file for questions data
%

@relation brazilquestions

@attribute att0 numeric
@attribute att1 numeric
@attribute att2 numeric
@attribute att3 numeric
%there are 469 attributes which represent my bag of words
@attribute class {Odontologia_coletiva, Periodontia, Pediatria, Estomatologia,   
Dentistica, Ortodontia, Endodontia, Cardiologia, Terapeutica, 
Terapeutica_medicamentosa, Odontopediatria, Cirurgia}


@data
{126 1, 147 1, 199 1, 56 1, 367 1, 400 1 , Estomatologia}
{155 1, 76 1, 126 1, 78 1, 341 1, 148 1, Odontopediatria}
%and then 81 more instances of data

关于我的语法有什么问题有什么想法吗？我完全遵循了 Witten/Frank/Hall 的《数据挖掘》一书中的示例。提前致谢！

【问题讨论】：

你能写出错误吗？
错误显示：weka.core.converters.CSVLoaderfailed to load 'ARFF file for question data.txt'。原因：值的数量错误。读取 12，预期 1，读取 Token[EOL]，第 477 行。
第 477 行为空，但第 476 行包含带有@attribute 类的行
你试过放一个有7个属性的@data吗？？
数据代表问题，所有问题都有不同的值与之关联，因为每个问题都有不同的词（这些是属性）和词频。所以数据实际上是（word1频率，word2频率，...，问题的分类）

标签： weka sparse-matrix arff

【解决方案1】：

数据部分的问题。你必须放类属性的索引

例如：

{126 1, 147 1, 199 1, 56 1, 367 1, 400 1 , 口腔科}

像下面这样更正它

{126 1, 147 1, 199 1, 56 1, 367 1, 400 1 ,470 口腔科}

【讨论】：

【解决方案2】：

在您的文档中声明了 5 个属性，但在 @data 中添加了 7 个属性，那么您应该完成 @data 中的其余值。 You can see this in the manual

【讨论】：

我的属性列表中有 469 个词，因为这是我的词袋中的总词数。我的文件应该是稀疏的。
但是你需要在@data中填写你不使用的属性0
列表是完整的，为简洁起见，我没有包括整个列表，我在原始帖子中用“%there are 469 attributes代表了我的词袋”注明了这一点。查看整个文件是否有帮助？
根据手册：“注意，稀疏实例中省略的值为0，它们不是“缺失”值！如果值未知，则必须用问号（ ?)。”

【解决方案3】：

实例类值的属性名称也需要列出。（见Sparse ARFF file description。）

您的文件：

@attribute myclass {Odontologia_coletiva, Periodontia, Pediatria, Estomatologia,   
Dentistica, Ortodontia, Endodontia, Cardiologia, Terapeutica, 
Terapeutica_medicamentosa, Odontopediatria, Cirurgia}

@data
{126 1, 147 1, 199 1, 56 1, 367 1, 400 1 , Estomatologia}

应该是：

@data
{126 1, 147 1, 199 1, 56 1, 367 1, 400 1 , myclass Estomatologia}

【讨论】：

【解决方案4】：

@ATTRIBUTE class string

试着用这个代替

@attribute class {Odontologia_coletiva, Periodontia, Pediatria, Estomatologia,  Dentistica, Ortodontia, Endodontia, Cardiologia, Terapeutica, Terapeutica_medicamentosa, Odontopediatria, Cirurgia}

【讨论】：