多个映射来改变数据框并添加一列答案

【问题标题】：multiple maps to mutate a data frame and add a column多个映射来改变数据框并添加一列
【发布时间】：2019-10-19 15:21:43
【问题描述】：

我正在处理一些看起来像这样的数据：

# A tibble: 2 x 3
  splits          id        inner_resamples 
  <named list>    <chr>     <named list>    
1 <split [20/20]> Resample1 <tibble [6 x 2]>
2 <split [20/20]> Resample2 <tibble [6 x 2]>

我想要做的是在inner_resamples 列上map 和map 在每个inner_resamples 列中的splits 列上再次。对于每个列表，我想再次map。

执行此操作的方法是使用 rsample 包中的 analysis 函数。

map(cv_rolling$inner_resamples$`1`$splits, ~ analysis(.x)) %>% tail()

我想做的是映射每个输出并创建新数据 7 列：

    > map(cv_rolling$inner_resamples$`1`$splits, ~ analysis(.x)) %>% tail()
[[1]]
# A tibble: 2 x 4
  time       ID    Value   out
  <date>     <chr> <dbl> <dbl>
1 2016-12-13 CAT1   796.     1
2 2016-12-14 CAT1   797.     0

[[2]]
# A tibble: 2 x 4
  time       ID    Value   out
  <date>     <chr> <dbl> <dbl>
1 2016-12-15 CAT1   798.     1
2 2016-12-16 CAT1   791.     0

[[3]]
# A tibble: 2 x 4
  time       ID    Value   out
  <date>     <chr> <dbl> <dbl>
1 2016-12-19 CAT1   794.     1
2 2016-12-20 CAT1   796.     0

[[4]]
# A tibble: 2 x 4
  time       ID    Value   out
  <date>     <chr> <dbl> <dbl>
1 2016-12-21 CAT1   795.     0
2 2016-12-22 CAT1   791.     0

[[5]]
# A tibble: 2 x 4
  time       ID    Value   out
  <date>     <chr> <dbl> <dbl>
1 2016-12-23 CAT1   790.     0
2 2016-12-27 CAT1   792.     1

[[6]]
# A tibble: 2 x 4
  time       ID    Value   out
  <date>     <chr> <dbl> <dbl>
1 2016-12-28 CAT1   785.     0
2 2016-12-29 CAT1   783.     0

预期输出将是（对于 1 个输出）

[[6]]
# A tibble: 2 x 4
  time       ID    Value   out    NEWCOL
  <date>     <chr> <dbl> <dbl>    
1 2016-12-28 CAT1   785.     0    8677 
2 2016-12-29 CAT1   783.     0    8757

但是我也想对数据中的每个N 执行此操作：

map(cv_rolling$inner_resamples$`N`$splits, ~ analysis(.x)) %>% tail()

这里的N可以通过以下方式访问：

cv_rolling$inner_resamples[[1]]
cv_rolling$inner_resamples[[2]]
cv_rolling$inner_resamples[[N]]

新数据：

structure(list(time = structure(c(17136, 17137, 17140, 17141, 
17142, 17143, 17144, 17147, 17148, 17149, 17150, 17151, 17154, 
17155, 17156, 17157, 17158, 17162, 17163, 17164, 17165, 17136, 
17137, 17140, 17141, 17142, 17143, 17144, 17147, 17148, 17149, 
17150, 17151, 17154, 17155, 17156, 17157, 17158, 17162, 17163, 
17164, 17165), class = "Date"), ID = c("CAT1", "CAT1", "CAT1", 
"CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", 
"CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT1", 
"CAT1", "CAT1", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", 
"CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", 
"CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2", "CAT2"), Value = c(747.919983, 
750.5, 762.52002, 759.109985, 771.190002, 776.419983, 789.289978, 
789.27002, 796.099976, 797.070007, 797.849976, 790.799988, 794.200012, 
796.419983, 794.559998, 791.26001, 789.909973, 791.549988, 785.049988, 
782.789978, 771.820007, 56.283112, 56.330643, 57.252861, 56.996159, 
58.346195, 58.003925, 58.916634, 59.106773, 59.876858, 59.591648, 
59.496574, 59.230362, 60.485325, 60.409275, 60.409275, 60.418777, 
60.124058, 60.162071, 59.886375, 59.800812, 59.078251), out = c(0, 
1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0)), row.names = c(NA, 
-42L), index_quo = ~date, index_time_zone = "UTC", class = c("tbl_time", 
"tbl_df", "tbl", "data.frame"))

还需要运行：

library(rsample)
library(purrr)
library(tibbletime)

periods_train <- 2
periods_test  <- 1
skip_span     <- 1

cv_rolling <- nested_cv(df, 
                        outside = group_vfold_cv(group = "ID"),
                        inside = rolling_origin(
                          initial    = periods_train,
                          assess     = periods_test,
                          cumulative = FALSE,
                          skip       = skip_span))

可以在哪里运行：

map(cv_rolling$inner_resamples$`2`$splits, ~ analysis(.x))

我正在尝试修改/创建新数据。

【问题讨论】：

你看过purrr::map_depth吗？
不，我现在就这样做:)
我不太确定这是否适用于这些数据，因为在 depth = 3 我有 cv_rolling$inner_resamples[[1]] 可以更改为 cv_rolling$inner_resamples[[2]]...cv_rolling$inner_resamples[[N]]。不过，我仍在阅读有关该功能的信息。
我正在尝试读取您的数据，但得到“is_null(vars) 中的错误：缺少参数“数据”，没有默认值”
我现在会上传更好的数据。我对原始帖子进行了相当多的编辑，因为我最初所说的是正确的，但访问了“错误”的数据。我现在将获取最新数据。

标签： r purrr

【解决方案1】：

我不确定，你想应用什么样的函数来生成NEWCOL，但这里有一些玩具示例，用于将原始Value 列除以10：

cv_rolling %>% 
  mutate(data  = map(inner_resamples, "splits"),
         data2 = map_depth(data, 2, rsample::analysis),
         data3 = map_depth(data2, 2, ~ mutate(.x, NEWCOL = Value/10)))

如果 mutate 调用相当复杂，您可以将它放在辅助函数中。

mutate_helper <- function(df) {
  mutate(df, NEWCOL = Value/10)
}

cv_rolling %>% 
  mutate(data  = map(inner_resamples, "splits"),
         data2 = map_depth(data, 2, rsample::analysis),
         data3 = map_depth(data2, 2, mutate_helper))

【讨论】：