【问题标题】:Averaging column values based on multiple criteria in other columns R根据其他列中的多个条件平均列值 R
【发布时间】:2018-01-10 03:59:44
【问题描述】:

my.df1 是一个data.frame,具有许多独特的观察结果,但具有相似的特征(在此示例中为ColourTypeSize)。对于my.df2 中的每个特征组合,我想计算my.df1 中符合标准的所有观察值的meanSD。因此,例如在my.df2 的第一行中,我想计算来自my.df1 的所有观察值中的meanSD 的PriceOne 和PriceTwo 具有以下特征:颜色蓝色、类型1 和大小S . 注意:对于第 5 行,我想计算来自my.df1 的所有观察值中的meanSD,它们的颜色为蓝色,因此无论它们的类型和大小如何。我的原始数据集有更多观察值、标准变量和价格列,因此非常感谢可扩展的解决方案。

    my.df1 <- data.frame(Colour = c('Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red'),
                         Type = c(1,1,2,2,1,2,1,1,2,2,1,2,1,1,2,2,1,2,1,1,2,2,1,2),
                         Size = c('S','S','S','S','S','S','M','M','M','M','M','M','S','S','S','S','S','S','M','M','M','M','M','M'),
                         PriceOne = c(10,15,20,18,19,11,12,16,20,21,10,11,10,15,10,18,20,14,21,15,28,19,10,11),
                         PriceTwo = c(10,15,10,18,20,14,21,15,28,19,10,11,10,15,20,18,19,11,12,16,20,21,10,11))

    my.df1(head)
                     Colour Type Size PriceOne PriceTwo
                1    Blue    1    S       10       10
                2    Blue    1    S       15       15
                3    Blue    2    S       20       10
                4    Blue    2    S       18       18
                5    Blue    1    S       19       20

my.df2 <- data.frame(Colour = c('Blue','Blue','Blue','Blue','Blue','Blue','Red','Red','Red','Red','Red','Red'),
                     Type = c(1,1,2,2,2,'-',1,1,2,2,2,'-'),
                     Size = c('S','M','S','M','-','-','S','M','S','M','-','-'),
                     PriceOneMean = NA,
                     PriceOneStDev = NA,
                     PriceTwoMean = NA,
                     PriceTwoStDev = NA)

    my.df2
Colour Type Size PriceOneMean PriceOneStDev PriceTwoMean PriceTwoStDev
1    Blue    1    S           NA            NA           NA            NA
2    Blue    1    M           NA            NA           NA            NA
3    Blue    2    S           NA            NA           NA            NA
4    Blue    2    M           NA            NA           NA            NA
5    Blue    2    -           NA            NA           NA            NA
6    Blue    -    -           NA            NA           NA            NA
7     Red    1    S           NA            NA           NA            NA
8     Red    1    M           NA            NA           NA            NA
9     Red    2    S           NA            NA           NA            NA
10    Red    2    M           NA            NA           NA            NA
11    Red    2    -           NA            NA           NA            NA
12    Red    -    -           NA            NA           NA            NA

编辑:我已将第 5 行和第 11 行添加到 my.df2,以便更好地匹配我的原始数据集。我怎样才能让我上面的问题也适用于这些行?

【问题讨论】:

    标签: r tidyverse


    【解决方案1】:

    你可以试试

    library(tidyverse)
    as.tbl(my.df1) %>% 
      mutate(Type=NA, Size=NA) %>% 
      bind_rows(my.df1) %>% 
      group_by(Colour, Type, Size) %>% 
      summarise_all(c("mean", "sd"))
    # A tibble: 10 x 7
    # Groups:   Colour, Type [?]
       Colour  Type   Size PriceOne_mean PriceTwo_mean PriceOne_sd PriceTwo_sd
       <fctr> <dbl> <fctr>         <dbl>         <dbl>       <dbl>       <dbl>
     1   Blue     1      M      12.66667      15.33333    3.055050    5.507571
     2   Blue     1      S      14.66667      15.00000    4.509250    5.000000
     3   Blue     2      M      17.33333      19.33333    5.507571    8.504901
     4   Blue     2      S      16.33333      14.00000    4.725816    4.000000
     5   Blue    NA   <NA>      15.25000      15.91667    4.287932    5.534328
     6    Red     1      M      15.33333      12.66667    5.507571    3.055050
     7    Red     1      S      15.00000      14.66667    5.000000    4.509250
     8    Red     2      M      19.33333      17.33333    8.504901    5.507571
     9    Red     2      S      14.00000      16.33333    4.000000    4.725816
    10    Red    NA   <NA>      15.91667      15.25000    5.534328    4.287932
    

    参考您的编辑我会做的:

    as.tbl(my.df1) %>% 
      bind_rows(mutate(my.df1, Type=NA, Size=NA)) %>% 
      bind_rows(mutate(my.df1, Size=NA)) %>% 
      group_by(Colour, Type, Size) %>% 
      summarise_all(c("mean", "sd"))
    

    【讨论】:

    • 谢谢,完美!我已经编辑了我的 MRE,请参阅以“EDIT”开头的最后一行。如何使您的代码适用于第 5 行和第 11 行?
    【解决方案2】:

    dplyr 库可让您分组、汇总和绑定。编辑添加额外的分组。为了简洁起见,我更喜欢 @Jimbou 的回答 - 这可能是他/她的单行编辑。

    my.df1 <- data.frame(Colour = c('Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red'),
                         Type = c(1,1,2,2,1,2,1,1,2,2,1,2,1,1,2,2,1,2,1,1,2,2,1,2),
                         Size = c('S','S','S','S','S','S','M','M','M','M','M','M','S','S','S','S','S','S','M','M','M','M','M','M'),
                         PriceOne = c(10,15,20,18,19,11,12,16,20,21,10,11,10,15,10,18,20,14,21,15,28,19,10,11),
                         PriceTwo = c(10,15,10,18,20,14,21,15,28,19,10,11,10,15,20,18,19,11,12,16,20,21,10,11))
    
    library(dplyr)
    # make detailed summaries
    my.df1.ColourTypeSize = my.df1 %>%
      group_by(Colour, Type, Size) %>%
      summarise(
        PriceOneMean = mean(PriceOne),
        PriceOneStDev = sd(PriceOne),
        PriceTwoMean = mean(PriceTwo),
        PriceTwoStDev = sd(PriceTwo))
    
    my.df1.ColourType = my.df1 %>%
      group_by(Colour, Type) %>%
      summarise(
        PriceOneMean = mean(PriceOne),
        PriceOneStDev = sd(PriceOne),
        PriceTwoMean = mean(PriceTwo),
        PriceTwoStDev = sd(PriceTwo)) %>%
      mutate(Size = NA)
    
    # Make summary for colour alone and add NA for Size and Type
    my.df1.Colour = my.df1 %>% 
      group_by(Colour) %>%
      summarise(
        PriceOneMean = mean(PriceOne),
        PriceOneStDev = sd(PriceOne),
        PriceTwoMean = mean(PriceTwo),
        PriceTwoStDev = sd(PriceTwo)) %>%
      mutate(Type = NA, Size = NA)
    
    # Bind the summaries together and sort and arrange to make it look nice
    my.df2 = 
      my.df1.Colour %>% 
      bind_rows(my.df1.ColourTypeSize) %>%
      bind_rows(my.df1.ColourType) %>%
      arrange(Colour, Type, Size) %>%
      select(Colour, Type, Size, everything())
    

    【讨论】:

      【解决方案3】:

      创建要在子集函数内调用的所有可用特征组合:

      call_combo <- function(frame) {
      combo_list <- list()
      for(i in 1:nrow(frame)) {
          combo <- frame[i,c(1,2,3)]
          combo_left <- combo[combo != '-']
          combo_left_cols <- names(combo[1:length(combo_left)])
          call_string <- paste(combo_left_cols, '==', combo_left, '&', sep=' ', collapse=' ')
          ind <- unlist(gregexpr('&',call_string))
          res <- substring(call_string, 1, ind[length(ind)]-1)
          combo_list[i] <- list(res)
      }
          return(combo_list)
      }
      

      特性组合:

      combo_list <- call_combo(my.df2)
      
      combo_list
      

      评估子集中的所有组合并附加到第二个数据框:

      # define attributes as objects 
      Blue <- 'Blue'
      Red <- 'Red'
      S <- 'S'
      M <- 'M'
      L <- 'L'
      
      # evaluate combo_list entries inside subset function
      for(p in 1:length(combo_list)) {
      sub_frame <- subset(my.df1, eval(parse(text=combo_list[[p]])))
      
      # calculate sd and mean for each combination and attach to 2nd frame 
      my.df2[p,]$PriceOneStDev <- sd(sub_frame$PriceOne)
      my.df2[p,]$PriceTwoStDev <- sd(sub_frame$PriceTwo)
      my.df2[p,]$PriceOneMean <- mean(sub_frame$PriceOne)
      my.df2[p,]$PriceTwoMean <- mean(sub_frame$PriceTwo)
      }
      

      结果:

      my.df2
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2021-07-29
        • 2014-11-04
        • 2018-11-24
        • 1970-01-01
        • 2015-04-11
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多