【问题标题】:Create new data frame with mean of multiple rows创建具有多行平均值的新数据框
【发布时间】:2017-10-10 16:17:16
【问题描述】:

我有一个由三列组成的数据框:ID、试验和差异度量 (diff_DT)。我有 19 名参与者,每个人完成了 30 次试验。这就是我的数据框的样子:

    ID     Trial     diff_DT
    01      005       37,5
    01      006       40,5
    01      007       16,5
    ...     ...       ...
    02      005       16,5 
    ...     ...       ...
    02      016       27,9

30 个试验中的 6 个总是属于一个块:块 1:试验 5-10,块 2:试验 16-21,块 3:试验 26-31,块 4:试验 36-41,块 5:试验46-51(注意:试验人数> 30,因为参与者总共完成了更多试验)

现在我需要每个块的每个参与者的变量 diff_DT 的平均值,从而为每个参与者生成五个平均值。而且我不知道如何正确地做到这一点。 感谢您的建议!

【问题讨论】:

    标签: r dataframe mean


    【解决方案1】:

    您可以为块/试验创建单独的关键数据框或矩阵,将其合并到您的原始表中,然后运行聚合以获得平均分数。

     ID <- c(rep(1, 3), 2, 2)
     Trial <- c(5, 6, 7, 5, 16)
     diff_DT <- c(37.5, 40.5, 16.5, 16.5, 27.9)
     Trial.key <- c(5:10, 16:21, 26:31, 36:41, 46:51)
     block <- rep(1:5, each = 6)
    
     df <- data.frame(ID, Trial, diff_DT)
     blocks <- data.frame(Trial.key, block)
    
     df.blocks <- merge(df, blocks, by.x = "Trial", by.y = "Trial.key", all.x = TRUE,
                        all.y = FALSE)
     df.blocks
    #  Trial ID diff_DT block
    #     5  1    37.5     1
    #     5  2    16.5     1
    #     6  1    40.5     1
    #     7  1    16.5     1
    #    16  2    27.9     2
    
     df.agg <- with(df.blocks, aggregate(diff_DT, by = list(ID, Trial), 
                                         FUN = "mean"))
     names(df.agg) <- c("ID", "Trial", "mean.diff_DT")
     df.agg
    #  ID Trial mean.diff_DT
    #  1     5         37.5
    #  2     5         16.5
    #  1     6         40.5
    #  1     7         16.5
    #  2    16         27.9
    

    【讨论】:

      【解决方案2】:

      如果您只想使用基本 R,一种方法是在您的数据框中创建一个列 block,然后为每个块中的每个参与者应用 mean 函数。 如果 Trial 是数字(鉴于您的试验是 001、002...,情况可能并非如此),您可以

      df$block = ifelse(df$trial>=5 & df$trial <=10, 1, 
                    ifelse(df$trial>=16 & df$trial <=21,2,
                    ifelse(df$trial>=26 & df$trial <=31,3,
                    ifelse(df$trial>=36 & df$trial <=41,4,
                    ifelse(df$trial>=46 & df$trial <=51,5,0))))
                 )
      

      如果 Trial 不是数字(例如字符或因子),则应先将其转换为数字

      df$trial = as.numeric(as.character(df$trial))
      

      那你只需要

      aggregate(df$trial, by=list(df$block,df$id), mean)
      

      【讨论】:

        【解决方案3】:

        看看这对你有没有帮助。

        bd <- data.frame(ID = rep(1:6, each = 30),
                     Trial = c(sample(c(5:10,16:21,26:31,36:41,46:51), 30), 
                               sample(c(5:10,16:21,26:31,36:41,46:51), 30),
                               sample(c(5:10,16:21,26:31,36:41,46:51), 30), 
                               sample(c(5:10,16:21,26:31,36:41,46:51), 30),
                               sample(c(5:10,16:21,26:31,36:41,46:51), 30), 
                               sample(c(5:10,16:21,26:31,36:41,46:51), 30)),
                     diff_DT = rnorm(n = 180, mean = 30, sd = 2))
        
        library(dplyr)
        bd <- bd %>% 
          mutate(block = ifelse(Trial <= 10, 1, 
                            ifelse(Trial <= 21, 2, 
                                   ifelse(Trial <= 31, 3,
                                          ifelse(Trial <= 41, 4, 5)))))
        bd %>% 
        group_by(ID, block) %>% 
        summarise(Mean = mean(diff_DT))
        

        【讨论】:

          【解决方案4】:

          我写了这个数据框作为示例(您应该提供生成数据的代码,以便更容易和更准确地回答):

          ID <- rep(1:3, 47)
          trial <- rep(5:51, 3)
          diff_DT <- sample(1:10, 47*3, replace = T)
          df <- data.frame(ID, trial, diff_DT)
          

          然后我写了一个函数来计算块,块的分配就像你在问题中写的一样,如果你需要一些精确度,请问:

          computeBlocks <- function(df){
            block <- rep(NA, nrow(df))
            for(i in 1:length(block)){
              for(j in 1:4){
                if(as.numeric(df$trial[i]) >= 6+10*j && as.numeric(df$trial[i]) <= 11+10*j){
                  block[i] <- j+1
                  break
                }
              }
              if(as.numeric(df$trial[i]) >= 5 && as.numeric(df$trial[i]) <= 10){
                block[i] <- 1
              }
            }
            df <- cbind(df, block)
            return(df)
          }
          

          我计算了块:

          df <- computeBlocks(df)
          

          最后使用包 reshape2 我计算了每个参与者每个区块的平均值:

          #install.packages("reshape2")
          require(reshape2)
          df_melt <- melt(df, id = c("ID", "block"))
          means <- dcast(df_melt, ID + block ~ variable, mean)[,-3]
          means
          

          您的问题不太清楚,如果需要改进,请告诉我。

          【讨论】:

            【解决方案5】:

            我认为这可能是一种简单的方法:

            library(dplyr)
            
            # Create a table to map which Block each trial refers to 
            Trial <- c(5:10,16:21,26:31,36:41,46:51)   
            Block <- rep(1:5, each = 6)
            map <- data_frame(Trial, Block)
            
            # Take original data frame and join the map to add what Block it belongs to. Then group it first by participant ID, then Block, and summarise by mean
            df2 <- df %>%
                     left_join(map, by = "Trial") %>%
                     group_by(ID, Block) %>%
                     summarise(mean = mean(diff_DT))
            

            【讨论】:

              猜你喜欢
              • 2021-06-28
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 2016-02-26
              • 1970-01-01
              • 2020-07-12
              • 1970-01-01
              • 2020-12-26
              相关资源
              最近更新 更多