【问题标题】:MAXENT model in R for ClassificationR中的MAXENT模型用于分类
【发布时间】:2014-06-25 07:58:21
【问题描述】:

我正在尝试使用 R 使用 RTextTools 包对文本进行分类。

我已经使用 - SVM 完成了这项工作(下面的代码可以正常工作:)

matrix[[i]] <- create_matrix(trainingdata[[i]][,1], language="english",removeNumbers=FALSE, stemWords=FALSE,weighting=weightTf,minWordLength=3)
container[[i]] <- create_container(matrix[[i]],trainingdata[[i]][,2],trainSize=1:length(trainingdata[[i]][,1]),virgin=FALSE)
models[[i]] <- train_models(container[[i]], algorithms=c("SVM"))

但是当我用MAXENT算法做同样的事情时

models[[i]] <- train_models(container[[i]], algorithms=c("MAXENT"))

这会引发错误:

Error in Module(module, mustStart = TRUE) : 
  function 'setCurrentScope' not provided by package 'Rcpp'  

当我进行追溯时 - 得到以下详细信息

Module(module, mustStart = TRUE) 
.getModulePointer(x) 
maximumentropy$add_samples 
maximumentropy$add_samples 
train_maxent(feature_matrix, code_vector, l1_regularizer, l2_regularizer,  
maxent(container@training_matrix, as.vector(container@training_codes),  
train_model(container, algorithm, ...) 
train_models(container[[i]], algorithms = c("MAXENT")) 

更新:

sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_Singapore.1252  LC_CTYPE=English_Singapore.1252    LC_MONETARY=English_Singapore.1252
[4] LC_NUMERIC=C                       LC_TIME=English_Singapore.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tm_0.5-10        hash_3.0.1       RTextTools_1.4.2 SparseM_1.03    

loaded via a namespace (and not attached):
 [1] bitops_1.0-6       caTools_1.16       class_7.3-9        e1071_1.6-1        glmnet_1.9-5       grid_3.0.2        
 [7] ipred_0.9-3        KernSmooth_2.23-10 lattice_0.20-23    lava_1.2.4         MASS_7.3-29        Matrix_1.1-2      
[13] maxent_1.3.3.1     nnet_7.3-7         parallel_3.0.2     prodlim_1.4.2      randomForest_4.6-7 Rcpp_0.10.6       
[19] rpart_4.1-5        slam_0.1-31        splines_3.0.2      survival_2.37-7    tau_0.0-16         tools_3.0.2       
[25] tree_1.0-34

有没有办法解决这个问题。

【问题讨论】:

  • 能否请您发布sessionInfo() 的输出,它很可能与Rcpp 的缺失/过时版本有关
  • @Vivek 我已经更新了会话信息
  • 你应该向包的维护者提交错误报告。

标签: r svm document-classification maxent


【解决方案1】:

不是真正的答案,但由于sessionInfo() 的长度过长而在此处发布

library(RTextTools)
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)


attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RTextTools_1.4.1   tau_0.0-15         glmnet_1.9-5       Matrix_1.0-14      lattice_0.20-23    maxent_1.3.3      
 [7] Rcpp_0.10.5        caTools_1.14       ipred_0.9-2        e1071_1.6-1        class_7.3-9        tm_0.5-9.1        
[13] nnet_7.3-7         tree_1.0-34        randomForest_4.6-7 SparseM_1.03      

loaded via a namespace (and not attached):
 [1] bitops_1.0-6       grid_3.0.2         KernSmooth_2.23-10 MASS_7.3-29        parallel_3.0.2     prodlim_1.3.7     
 [7] rpart_4.1-3        slam_0.1-30        splines_3.0.2      survival_2.37-4    tools_3.0.2 

在我的情况下,所有必需的模块都在other attached packages 下加载,而在你的情况下,它们在loaded via a namespace (and not attached) 下加载

在情况 2 下,R 可以访问包,但用户不能。更多解释见In R, what does "loaded via a namespace (and not attached)" mean?

我不知道为什么你的情况下没有附加这些包,但作为一种解决方法,你可以试试这个:

#grab list of package names required for RTextTools
# not_attached_list<-dput(names(sessionInfo()$otherPkgs))
#c("RTextTools", "tau", "glmnet", "Matrix", "lattice", "maxent", 
#"Rcpp", "caTools", "ipred", "e1071", "class", "tm", "nnet", "tree", 
#"randomForest", "SparseM")

not_attached_list<-c("RTextTools", "tau", "glmnet", "Matrix", "lattice", "maxent", 
"Rcpp", "caTools", "ipred", "e1071", "class", "tm", "nnet", "tree", 
"randomForest", "SparseM")

#Load the packages manually
sapply(not_loaded_list, require, character.only=TRUE)

#Check in sessionInfo if they have been attached now under 'other attached packages'
sessionInfo()

让我们知道这是否有效..

【讨论】:

  • 如果 Rcpp 应该可用,那么包的作者应该将它包含在说明文件的 Imports 部分中(他们目前不这样做。)
猜你喜欢
  • 2014-09-13
  • 2016-08-08
  • 2015-06-08
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-11-06
  • 2021-12-20
  • 2013-04-22
相关资源
最近更新 更多