【发布时间】:2018-08-17 22:53:33
【问题描述】:
我正在阅读 caret 包,我看到了该代码;
createDataPartition(y, times = 1, p = 0.5, list = TRUE, groups = min(5,
length(y)))
我想知道“times”的表达方式。所以,如果我使用这段代码,
inTrain2 <- createDataPartition(y = MyData$Class ,times=3, p = .70,list = FALSE)
training2 <- MyData[ inTrain2,] # ≈ %67 (train)
testing2<- MydData[-inTrain2[2],] # ≈ %33 (test)
是否会导致过拟合问题?还是用于某种重采样方法(无偏)?
非常感谢。
编辑:
如果我使用此代码,我想提一下;
inTrain2 <- createDataPartition(y = MyData$Class ,times=1, p = .70,list = FALSE)
training2<- MyData[ inTrain2,] #142 samples # ≈ %67 (train)
testing2<- MydData[-inTrain2,] #69 samples # ≈ %33 (test)
如果我使用此代码,我将获得 211 个样本和 ≈ %52 准确率;
inTrain2 <- createDataPartition(y = MyData$Class ,times=3,p =.70,list = FALSE)
training2<- MyData[ inTrain2,] # ≈ %67 (train) # 426 samples
testing2<- MydData[-inTrain2[2],] # ≈ %33 (test) # 210 samples
我将获得 536 个样本和 ≈ %98 的准确率。
谢谢。
【问题讨论】:
标签: r machine-learning r-caret resampling data-partitioning