【问题标题】:xgb.cv with no folds and return the results based on a split of the dataxgb.cv 没有折叠并根据数据的拆分返回结果
【发布时间】:2019-12-16 18:35:58
【问题描述】:

我有一些看起来像这样的数据:

# A tibble: 50 x 28
   sanchinarro date       holiday weekday weekend workday_on_holi… weekend_on_holi… protocol_active
         <dbl> <date>       <dbl>   <dbl>   <dbl>            <dbl>            <dbl>           <dbl>
 1      -1.01  2010-01-01       1       1       0                1                0               0
 2       0.832 2010-01-02       0       0       1                0                0               0
 3       1.29  2010-01-03       0       0       1                0                0               0
 4       1.04  2010-01-04       0       1       0                0                0               0
 5       0.526 2010-01-05       0       1       0                0                0               0
 6      -0.292 2010-01-06       1       1       0                1                0               0
 7      -0.394 2010-01-07       0       1       0                0                0               0
 8      -0.547 2010-01-08       0       1       0                0                0               0
 9      -0.139 2010-01-09       0       0       1                0                0               0
10       0.628 2010-01-10       0       0       1                0                0               0

我想在前 40 行运行 xgb.cv 并在最后 10 行验证它。

我尝试以下方法:

library(xgboost)
library(dplyr)
X_Val <- ddd %>% select(-c(1:2))
Y_Val <- ddd %>% select(c(1)) %>% pull()

dVal <- xgb.DMatrix(data = as.matrix(X_Val), label = as.numeric(Y_Val))
xgb.cv(data = dVal, nround = 30, folds = NA, params = list(eta = 0.1, max_depth = 5))

这给了我这个错误:

xgb.cv 中的错误(数据 = dVal,nround = 30,折叠 = NA,eta = 0.1, max_depth = 5) : 'folds' 必须是包含 2 个或更多元素的列表 是每个 CV 折叠的索引向量

如何在前 40 行运行一个简单的 xgb.cv 并在最后 10 行进行测试。

我最终想使用参数列表应用网格搜索并将结果保存在列表中。由于我正在处理时间序列数据,我不想混淆折叠,我只想要一个简单的火车和 40:10 的样本内测试。

数据:

        ddd <- structure(list(sanchinarro = c(-1.00742964973274, 0.832453587904369, 
1.29242439731365, 1.03688505875294, 0.525806381631517, -0.291919501762755, 
-0.394135237187039, -0.547458840323464, -0.138595898626329, 0.628022117055801, 
1.19020866188936, 1.5990716035865, 1.5990716035865, -0.70078244345989, 
2.11015028070792, 1.95682667757149, 0.985777191040795, 0.883561455616511, 
0.985777191040795, 0.270267043070807, 2.51901322240505, 2.41679748698077, 
0.372482778495091, -0.291919501762755, -0.905213914308458, -0.905213914308458, 
-0.649674575747748, 1.2413165296015, 1.54796373587436, -0.70078244345989, 
-0.905213914308458, -0.0363801632020448, 1.54796373587436, 2.00793454528363, 
1.54796373587436, -0.445243104899181, -0.445243104899181, 1.03688505875294, 
0.628022117055801, -0.496350972611323, 0.168051307646523, -0.649674575747748, 
0.0658355722222391, -1.00742964973274, -0.291919501762755, 0.0147277045100972, 
0.168051307646523, -0.189703766338471, 0.219159175358665, 0.679129984767943
), date = structure(c(14610, 14611, 14612, 14613, 14614, 14615, 
14616, 14617, 14618, 14619, 14620, 14621, 14622, 14623, 14624, 
14625, 14626, 14627, 14628, 14629, 14630, 14631, 14632, 14633, 
14634, 14635, 14636, 14637, 14638, 14639, 14640, 14641, 14642, 
14643, 14644, 14645, 14646, 14647, 14648, 14649, 14650, 14651, 
14652, 14653, 14654, 14655, 14656, 14657, 14658, 14659), class = "Date"), 
    holiday = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), weekday = c(1, 
    0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 
    1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 
    1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1), weekend = c(0, 1, 1, 0, 
    0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 
    1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 
    0, 1, 1, 0, 0, 0, 0, 0), workday_on_holiday = c(1, 0, 0, 
    0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), weekend_on_holiday = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), protocol_active = c(0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), text_broken_clouds = c(0, 
    1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 
    0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 
    1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1), text_clear = c(0, 0, 0, 
    0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 
    0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 1, 1), text_fog = c(0, 1, 0, 1, 1, 0, 
    0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
    0, 0, 1, 0, 1, 0), text_partly_cloudy = c(0, 1, 0, 0, 0, 
    1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), text_partly_sunny = c(1, 1, 1, 1, 1, 
    0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 
    0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 
    0, 0, 0, 0, 1, 1, 1), text_passing_clouds = c(1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 
    0, 0, 0, 0, 0, 1, 1, 1), text_scattered_clouds = c(1, 1, 
    0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 
    0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 
    0, 0, 0, 0, 0, 0, 0, 0, 1, 1), text_sunny = c(0, 0, 0, 0, 
    0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 
    0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 1), month_1 = c(1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0), month_2 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1), month_3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_4 = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_5 = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), month_6 = c(0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0), month_7 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0), month_8 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_9 = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_10 = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), month_11 = c(0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0), month_12 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-50L)) 

编辑:列出数据:

最终数据以列表的形式出现。

datalst <- list(structure(list(sanchinarro = c(-1.00742964973274, 0.832453587904369, 
1.29242439731365, 1.03688505875294, 0.525806381631517, -0.291919501762755, 
-0.394135237187039, -0.547458840323464, -0.138595898626329, 0.628022117055801, 
1.19020866188936, 1.5990716035865, 1.5990716035865, -0.70078244345989
), date = structure(c(14610, 14611, 14612, 14613, 14614, 14615, 
14616, 14617, 14618, 14619, 14620, 14621, 14622, 14623), class = "Date"), 
    holiday = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0), weekday = c(1, 
    0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1), weekend = c(0, 1, 
    1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0), workday_on_holiday = c(1, 
    0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0), weekend_on_holiday = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), protocol_active = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), text_broken_clouds = c(0, 
    1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0), text_clear = c(0, 
    0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0), text_fog = c(0, 1, 
    0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0), text_partly_cloudy = c(0, 
    1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0), text_partly_sunny = c(1, 
    1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1), text_passing_clouds = c(1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), text_scattered_clouds = c(1, 
    1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1), text_sunny = c(0, 
    0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0), month_1 = c(1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), month_2 = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_3 = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_4 = c(0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), month_5 = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), month_6 = c(0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), month_7 = c(0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0), month_8 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0), month_9 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0), month_10 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0), month_11 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0), month_12 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-14L)), structure(list(sanchinarro = c(0.832179838392013, 1.29225734336885, 
1.03665872949283, 0.525461501740789, -0.292454062662475, -0.394693508212883, 
-0.548052676538495, -0.139094894336863, 0.627700947291197, 1.19001789781844, 
1.59897568002007, 1.59897568002007, -0.701411844864107, 2.11017290777211
), date = structure(c(14611, 14612, 14613, 14614, 14615, 14616, 
14617, 14618, 14619, 14620, 14621, 14622, 14623, 14624), class = "Date"), 
    holiday = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0), weekday = c(0, 
    0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1), weekend = c(1, 1, 
    0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0), workday_on_holiday = c(0, 
    0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0), weekend_on_holiday = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), protocol_active = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), text_broken_clouds = c(1, 
    0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0), text_clear = c(0, 
    0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1), text_fog = c(1, 0, 
    1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0), text_partly_cloudy = c(1, 
    0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0), text_partly_sunny = c(1, 
    1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0), text_passing_clouds = c(1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), text_scattered_clouds = c(1, 
    0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0), text_sunny = c(0, 
    0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1), month_1 = c(1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), month_2 = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_3 = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_4 = c(0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), month_5 = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), month_6 = c(0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), month_7 = c(0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0), month_8 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0), month_9 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0), month_10 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0), month_11 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0), month_12 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-14L)), structure(list(sanchinarro = c(1.29293502084952, 1.03729933727253, 
0.526027970118536, -0.292006217327851, -0.394260490758649, -0.547641900904846, 
-0.138624807181653, 0.628282243549334, 1.19068074741873, 1.59969784114192, 
1.59969784114192, -0.701023311051044, 2.11096920829591, 1.95758779814971
), date = structure(c(14612, 14613, 14614, 14615, 14616, 14617, 
14618, 14619, 14620, 14621, 14622, 14623, 14624, 14625), class = "Date"), 
    holiday = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), weekday = c(0, 
    1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0), weekend = c(1, 0, 
    0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1), workday_on_holiday = c(0, 
    0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), weekend_on_holiday = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), protocol_active = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), text_broken_clouds = c(0, 
    1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1), text_clear = c(0, 
    0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0), text_fog = c(0, 1, 
    1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0), text_partly_cloudy = c(0, 
    0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0), text_partly_sunny = c(1, 
    1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1), text_passing_clouds = c(1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), text_scattered_clouds = c(0, 
    0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0), text_sunny = c(0, 
    0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0), month_1 = c(1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), month_2 = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_3 = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_4 = c(0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0), month_5 = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0), month_6 = c(0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), month_7 = c(0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0), month_8 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0), month_9 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0), month_10 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0), month_11 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0), month_12 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-14L)))

编辑:

认为这给了我我所追求的 - 我需要双重/三次检查它。 (如果您发现任何错误,请告诉我)

splt <- 0.80 * nrow(ddd)
ddd[c(1:splt), "id"] = 1
ddd$id[is.na(ddd$id)] = 2


fold.ids <- unique(ddd$id)

custom.folds <- vector("list", length(fold.ids))
i <- 1
for( id in fold.ids){
  custom.folds[[i]] <- which( ddd$id %in% id )
  i <- i+1
}

custom.folds

cv <- xgb.cv(params = list(eta = 0.1, max_depth = 5), dVal, nround = 10, folds = custom.folds, prediction = TRUE)
cv$evaluation_log

我现在需要找到一种方法将其应用于“新”添加数据中的所有 3 个列表。

【问题讨论】:

    标签: r xgboost


    【解决方案1】:

    首先,您应该将数据拆分为 dtrain(前 40 行)和 dval(最后 10 行)。其次,你需要的是xgb.train,而不是xgb.cv

    所以,你的代码应该修改成这样:

    library(xgboost)
    library(dplyr)
    # you code regarding ddd
    X <- ddd %>% select(-c(1:2))
    Y <- ddd %>% select(c(1)) %>% pull()
    dtrain <- xgb.DMatrix(data = as.matrix(X[1:40,]), label = as.numeric(Y[1:40,]))
    dval   <- xgb.DMatrix(data = as.matrix(X[41:50,]), label = as.numeric(Y[41:50,]))
    watchlist <- list(train=dtrain, val=dval)
    model <- xgb.train(data=dtrain, watchlist=watchlist, nround = 30, eta = 0.1, max_depth = 5)
    

    恕我直言,只有 40+10 行,因此稀疏特征没有希望使用 XGBoost 获得好的结果。

    【讨论】:

    • 谢谢!我有比这里显示的 40 + 10 更多的数据。我如何通过网格搜索使用xgb.train。我的目标是找到最佳参数,但我不知道如何使用 xgb.train 来做到这一点?我还对原始帖子进行了小修改。
    • 在时间序列的情况下使用基于xgb.cv的网格搜索没有意义 - 结果将被过度拟合,因为大多数有效的折叠将在内部创建,而不是在数据集的边缘.尝试手动优化,如果结果乐观,使用贝叶斯优化,包rBayesianOptimization
    • 谢谢,一旦我可以让代码工作,我将真正使用贝叶斯优化。我有 1000 个不同的时间序列数据集。每个数据集包含 365 * 5 年。每个数据集移动 1 天。假设数据从 2000 年 1 月 1 日开始,那么第一个数据集将在 2005 年 12 月 31 日结束。第二个数据集从 2000 年 1 月 2 日开始,到 2006 年 1 月 1 日结束,第三个数据集从 2000 年 1 月 3 日开始,到 2006 年 1 月 2 日结束,等等。我无法运行所有数据集上的 CV,所以我随机抽样它们以应用 CV,最终我想平均每个随机生成的交叉验证模型的结果
    • 我希望这会给我一个数据样本的“最佳”参数。这将作为我的 KFold 交叉验证,同时保留时间序列结构。但是,不确定这是否有意义。
    • 关于您使用 xgb.cv 的自定义折叠的实际版本,它不会像您希望的那样工作。它将创建两个运行,第一个具有 80% 训练和 20% 有效,第二个具有 20% 训练和 80% 有效(原文如此!)。
    猜你喜欢
    • 2012-09-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-02-19
    • 2020-11-20
    • 1970-01-01
    • 2015-01-24
    • 1970-01-01
    相关资源
    最近更新 更多