【问题标题】:For loop for filling specific cells into data frame (large dataset)用于将特定单元格填充到数据框(大型数据集)的 For 循环
【发布时间】:2021-10-12 16:11:28
【问题描述】:

版本 R 版本 4.0.5 (2021-03-31) 操作系统 Windows 10 x64
系统 x86_64、mingw32
ui RStudio
语言 (EN)
整理 English_United Kingdom.1252 ctype English_United Kingdom.1252 tz 欧洲/伦敦
日期 2021-08-08

大家好,

我正在尝试计算从 excel 导入的数据框中的一些变量,但我缺少一些我似乎找不到的东西。我想这是一个非常具体的情况,因为我在 youtube 教程中搜索“for loop”、StackOverflow 帖子和 Google,总的来说,到目前为止没有帮助。因此,我想发帖作为我向更有经验的程序员寻求解决方案的最后手段。

我有一个包含 1392 行和多列的数据集:

> summary(twente_1)
 Player.Number   Playing.Position       Date                Week      Training.type      Total.Distance 
 Min.   : 1.00   Length:3192        Length:3192        Min.   : 1.0   Length:3192        Min.   :    0  
 1st Qu.: 3.75   Class :character   Class :character   1st Qu.:10.0   Class :character   1st Qu.:    0  
 Median : 7.00   Mode  :character   Mode  :character   Median :19.5   Mode  :character   Median : 3669  
 Mean   : 9.25                                         Mean   :19.5                      Mean   : 3757  
 3rd Qu.:15.50                                         3rd Qu.:29.0                      3rd Qu.: 5500  
 Max.   :19.00                                         Max.   :38.0                      Max.   :19226  
 NA's   :3180                                          NA's   :2736                                     
      HSR             SD         High.Intensity.Actions..acc.dec.  Player.Load     SUM.Weekly.Total.Distance
 Min.   :   0   Min.   :  0.00   Min.   : 0.00                    Min.   :   0.0   Min.   :    0            
 1st Qu.:   0   1st Qu.:  0.00   1st Qu.: 0.00                    1st Qu.:   0.0   1st Qu.:21304            
 Median :  22   Median :  0.00   Median :12.00                    Median :  89.0   Median :27969            
 Mean   : 123   Mean   : 23.21   Mean   :15.18                    Mean   : 191.2   Mean   :26298            
 3rd Qu.: 168   3rd Qu.: 20.00   3rd Qu.:24.00                    3rd Qu.: 240.0   3rd Qu.:32727            
 Max.   :1590   Max.   :475.00   Max.   :90.00                    Max.   :1777.0   Max.   :50194            
                                                                                   NA's   :2736             
    SUM.HSR           SUM.SD        SUM.ACC.DEC    SUM.Player.Load Daily.Mean     St.Deviation  
 Min.   :   0.0   Min.   :  0.00   Min.   :  0.0   Min.   :   0    Mode:logical   Mode:logical  
 1st Qu.: 552.0   1st Qu.: 57.25   1st Qu.: 67.0   1st Qu.: 876    NA's:3192      NA's:3192     
 Median : 843.5   Median :142.00   Median :104.0   Median :1318                                 
 Mean   : 861.0   Mean   :162.50   Mean   :106.3   Mean   :1339                                 
 3rd Qu.:1164.2   3rd Qu.:235.00   3rd Qu.:147.0   3rd Qu.:1799                                 
 Max.   :3504.0   Max.   :711.00   Max.   :259.0   Max.   :3373                                 
 NA's   :2736     NA's   :2736     NA's   :2736    NA's   :2736                                 
 Monotony.Total.Distance Monotony.HSR   Monotony.SD    Monotony.High.Intensity.Actions Monotony.Player.Load
 Mode:logical            Mode:logical   Mode:logical   Mode:logical                    Mode:logical        
 NA's:3192               NA's:3192      NA's:3192      NA's:3192                       NA's:3192           
                                                                                                           
                                                                                                           
                                                                                                           
                                                                                                           
                                                                                                           
 Strain.Total.Distance Strain.HSR     Strain.SD      Strain.High.Intensity.Actions Strain.Player.Load
 Mode:logical          Mode:logical   Mode:logical   Mode:logical                  Mode:logical      
 NA's:3192             NA's:3192      NA's:3192      NA's:3192                     NA's:3192
> head(twente_1)
  Player.Number Playing.Position       Date Week Training.type Total.Distance HSR  SD
1             1               ED 11/08/2018    1         'OFF'              0   0   0
2            NA                  12/08/2018   NA         'OFF'              0   0   0
3            NA                  13/08/2018   NA          'TT'           4599  72   0
4            NA                  14/08/2018   NA          'TT'           6328 213 104
5            NA                  15/08/2018   NA          'TT'           5522 264  22
6            NA                  16/08/2018   NA          'TT'           2873  14   0
  High.Intensity.Actions..acc.dec. Player.Load SUM.Weekly.Total.Distance SUM.HSR SUM.SD SUM.ACC.DEC
1                                0           0                     31953    1205    298         113
2                                0           0                        NA      NA     NA          NA
3                               16         141                        NA      NA     NA          NA
4                               25         362                        NA      NA     NA          NA
5                               15         283                        NA      NA     NA          NA
6                               16          66                        NA      NA     NA          NA
  SUM.Player.Load Daily.Mean St.Deviation Monotony.Total.Distance Monotony.HSR Monotony.SD
1            1843         NA           NA                      NA           NA          NA
2              NA         NA           NA                      NA           NA          NA
3              NA         NA           NA                      NA           NA          NA
4              NA         NA           NA                      NA           NA          NA
5              NA         NA           NA                      NA           NA          NA
6              NA         NA           NA                      NA           NA          NA
  Monotony.High.Intensity.Actions Monotony.Player.Load Strain.Total.Distance Strain.HSR Strain.SD
1                              NA                   NA                    NA         NA        NA
2                              NA                   NA                    NA         NA        NA
3                              NA                   NA                    NA         NA        NA
4                              NA                   NA                    NA         NA        NA
5                              NA                   NA                    NA         NA        NA
6                              NA                   NA                    NA         NA        NA
  Strain.High.Intensity.Actions Strain.Player.Load player_load_sd
1                            NA                 NA             NA
2                            NA                 NA             NA
3                            NA                 NA             NA
4                            NA                 NA             NA
5                            NA                 NA             NA
6                            NA                 NA             NA

我想创建计算一些新变量并将它们存储在特定的单元格中。比如我想求每周的标准差(一共1392行,也就是456周)。

我“想出了”手动操作的代码:

twente_1$player_load_sd[1] = sd(twente_1$Player.Load[1:7])
twente_1$player_load_sd[2] = sd(twente_1$Player.Load[8:14])
twente_1$player_load_sd[3] = sd(twente_1$Player.Load[15:21])
twente_1$player_load_sd[4] = sd(twente_1$Player.Load[22:28])
twente_1$player_load_sd[5] = sd(twente_1$Player.Load[29:35])
twente_1$player_load_sd[6] = sd(twente_1$Player.Load[36:42])
twente_1$player_load_sd[7] = sd(twente_1$Player.Load[43:49])
twente_1$player_load_sd[8] = sd(twente_1$Player.Load[50:56])
twente_1$player_load_sd[9] = sd(twente_1$Player.Load[57:63])
twente_1$player_load_sd[10] = sd(twente_1$Player.Load[64:70])

我确信我可以使用“for 循环”来做到这一点,但我无法成功。我已经尝试了下面的代码,但它给了我 NA:

x <- 1
y <- 7
for (i in 1:456) {
        twente_1$player_load_sd[i] = sd(twente_1$Player.Load[x:y])
        x <- x+7
        y <- y+7
}

提前感谢您的时间和帮助。

【问题讨论】:

    标签: r loops for-loop dplyr data-wrangling


    【解决方案1】:

    我会 1) 创建一个周变量,2) 按周对数据集进行分组,以及 3) 使用分组数据集计算每周的 SD,而不是 for 循环。这就是它的样子:

    这是一个包含 10 周数据的示例数据集。 (如果您还没有 tidyverse 库,请安装它。)

    library(tidyverse)
    df <- tibble(
      day = 1:70,
      x = runif(70, 0, 100)
    )
    

    首先,让我们通过将行分成 7 组来创建一个星期变量。

    df <- 
      df %>% 
      mutate(
        week = rep(1:(nrow(df)/7), each = 7)
      )
    

    接下来,按周对数据集进行分组并计算 x 的 SD。最后别忘了取消组合!

    df <- 
      df %>% 
      group_by(week) %>% 
      mutate(week_sd = sd(x)) %>% 
      ungroup()
    

    我们可以查看前 14 天(即两周),看看每周的 SD 保存在每一行中。

    head(df, 14)
    

    如果您想要一个每周一行的新数据集,您可以改为分组和汇总:

    df_week <- 
      df %>% 
      group_by(week) %>% 
      summarize(week_sd = sd(x)) %>% 
      ungroup()
    
    df_week
    

    【讨论】:

    • 太棒了@Jacob!非常感谢你,我真的很感激!我现在自己试过了,效果很好。你是救世主!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-08-31
    • 2014-11-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多