【问题标题】:Is the number of the predicted values correct from the test set for SVM?SVM 的测试集中预测值的数量是否正确?
【发布时间】:2020-08-16 09:10:51
【问题描述】:

所以我有一个 nrow = 218 的数据集,我正在通过 [this][https://iamnagdev.com/2018/01/02/sound-analytics-in-r-for-animal-sound-classification-using-vector-machine/] example [git here][https://github.com/nagdevAmruthnath].我已将我的数据分为训练 (nrow = 163; ~75%) 和测试 (nrow = 55; ~25%)。

当我到达“pred

一些假数据:

featuredata_all <- matrix(rexp(218, rate=.1), ncol=23)

部分代码:


library(data.table)

pt1 <- scale(featuredata_all[,1:22],center=T)
pt2 <- as.character(featuredata_all[,23]) #since the label is a string I kept it separate 

ft<-cbind.data.frame(pt1,pt2) #to preserve the label in text
colnames(ft)[23]<- "Cluster"

## 75% of the sample size
smp_size <- floor(0.75 * nrow(ft))

## set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(ft)), size = smp_size)

train <- ft[train_ind,1:22] #163 reads
test  <- ft[-train_ind,1:22] #55 reads

trainlabel<- ft[train_ind,23] #163 labels
testlabel <- ft[-train_ind,23] #55 labels

#ftID <- cbind(ft, seq.int(nrow(ft))
#colnames(ftID)[24]<- "RowID"
#ftIDtestrows <- ftID[-train_ind,24]

#Support Vector Machine for classification
model_svm <- svm(trainlabel ~ as.matrix(train) )
summary(model_svm)

#Use the predictions on the data
# ---------------- This is where the question is ---------------- #
pred <- predict(model_svm, test)
# ----------------------------------------------------------------#

print(confusionMatrix(pred[1:nrow(test)],testlabel))

#ROC and AUC curves and their plots
#-----------------also------------->  was trying to get this to work as pred doesn't naturally end up with the expected 55 nrow from test set
roc.multi<-multiclass.roc(testlabel, as.numeric(pred[1:55])) 
rs <- roc.multi[['rocs']]
plot.roc(rs[[1]])
sapply(2:length(rs),function(i) lines.roc(rs[[i]],col=i)) ```


 [1]: https://iamnagdev.com/2018/01/02/sound-analytics-in-r-for-animal-sound-classification-using-vector-machine/
 [2]: https://github.com/nagdevAmruthnath

【问题讨论】:

    标签: r testing svm predict


    【解决方案1】:

    好的,我意识到我是在我的训练数据集上训练模型,然后在我的测试集上对其进行测试。我需要先在重新预测训练集时对其进行测试,然后再将其输入测试集。

     summary(model_svm)
    #Use the predictions on the data
    pred <- predict(model_svm, train)
    
    model_svm <- svm(trainlabel ~ as.matrix(test) )
     summary(model_svm)
    #Use the predictions on the data
    pred <- predict(model_svm, test)```
    

    【讨论】:

      【解决方案2】:

      我实际上能够使用以下代码获得 55 行的结果。我所做的一些更改是针对pt2 而不是as.character 我将它变成as.factor 而不是pred &lt;- predict(model_svm, test)pred &lt;- predict(model_svm, as.matrix(test))

      # load libraries
      library(data.table)
      library(e1071)
      
      # create dataset with random values
      featuredata_all <- matrix(rnorm(23*218), ncol=23)
      
      # scale features
      pt1 <- scale(featuredata_all[,1:22],center=T)
      
      # make column as factor
      pt2 <- as.factor(ifelse(featuredata_all[,23]>0, 0,1)) #since the label is a string I kept it separate 
      
      # join data (optional)
      ft<-cbind.data.frame(pt1,pt2) #to preserve the label in text
      colnames(ft)[23]<- "Cluster"
      
      ## 75% of the sample size
      smp_size <- floor(0.75 * nrow(ft))
      
      ## set the seed to make your partition reproducible
      set.seed(123)
      train_ind <- sample(seq_len(nrow(ft)), size = smp_size)
      
      # split data to train
      train <- ft[train_ind,1:22] #163 reads
      test  <- ft[-train_ind,1:22] #55 reads
      dim(train)
      # [1] 163  22
      
      dim(test)
      # [1] 55  22
      
      # split data to test
      trainlabel<- ft[train_ind,23] #163 labels
      testlabel <- ft[-train_ind,23] #55 labels
      length(trainlabel)
      [1] 163
      
      length(testlabel)
      [1] 55
      
      #Support Vector Machine for classification
      model_svm <- svm(x= as.matrix(train), y = trainlabel, probability = T)
      summary(model_svm)
      
      # Call:
      #   svm.default(x = as.matrix(train), y = trainlabel, probability = T)
      # 
      # 
      # Parameters:
      #   SVM-Type:  C-classification 
      # SVM-Kernel:  radial 
      # cost:  1 
      # 
      # Number of Support Vectors:  159
      # 
      # ( 78 81 )
      # 
      # 
      # Number of Classes:  2 
      # 
      # Levels: 
      #   0 1
      
      #Use the predictions on the data
      # ---------------- This is where the question is ---------------- #
      pred <- predict(model_svm, as.matrix(test))
      length(pred)
      # [1] 55
      # ----------------------------------------------------------------#
      
      print(table(pred[1:nrow(test)],testlabel))
      #    testlabel
      #    0  1
      # 0 14 14
      # 1 11 16
      

      希望这会有所帮助。

      【讨论】:

      • 所以我更新了 factor 的东西,但 pred 的 as.matrix 位似乎没有做任何不同的事情。使用 55 的测试集时,Pred 仍然有 163 行。不确定是否正确。
      • 另外,@not_dave,当我在你指向的网站上的动物声音以及一些真实数据上尝试这个时,我得到的 ROC 曲线看起来像这样:imgur.com/a/XPoj3mf 我假设不是想要什么(寻找肩部清晰的曲线)。
      猜你喜欢
      • 1970-01-01
      • 2016-07-15
      • 1970-01-01
      • 1970-01-01
      • 2012-11-21
      • 2016-11-18
      • 1970-01-01
      • 2023-03-19
      • 2014-04-03
      相关资源
      最近更新 更多