【问题标题】:Means multiple columns by multiple groups [duplicate]表示多个组的多个列[重复]
【发布时间】:2022-01-14 18:31:40
【问题描述】:

我正在尝试为包含多个组的数据框的多个列找到方法,不包括 NAs

airquality <- data.frame(City = c("CityA", "CityA","CityA",
                                  "CityB","CityB","CityB",
                                  "CityC", "CityC"),
                         year = c("1990", "2000", "2010", "1990", 
                                  "2000", "2010", "2000", "2010"),
                         month = c("June", "July", "August",
                                   "June", "July", "August",
                                   "June", "August"),
                         PM10 = c(runif(3), rnorm(5)),
                         PM25 = c(runif(3), rnorm(5)),
                         Ozone = c(runif(3), rnorm(5)),
                         CO2 = c(runif(3), rnorm(5)))
airquality

所以我得到了一个带有数字的名称列表,所以我知道要选择哪些列:

nam<-names(airquality)
namelist <- data.frame(matrix(t(nam)));namelist

我想按城市和年份计算 PM25、臭氧和二氧化碳的平均值。这意味着我需要第 1,2,4,6:7 列)

acast(datadf, year ~ city, mean, na.rm=TRUE)

但这并不是我真正想要的,因为它包含了我不需要的东西的平均值,而且它不是数据框格式。我可以转换它然后放弃,但这似乎是一种非常低效的方法。

有没有更好的办法?

【问题讨论】:

  • 或许library(dplyr); airquality %&gt;% group_by(City, year) %&gt;% summarise_at(vars("PM25", "Ozone", "CO2"), mean)

标签: r dplyr sapply dcast


【解决方案1】:

我们可以使用dplyrsummarise_at来得到关注列分组后的mean

library(dplyr)
airquality %>%
   group_by(City, year) %>% 
   summarise_at(vars("PM25", "Ozone", "CO2"), mean)

或者使用dplyrdevel版本(版本-‘0.8.99.9000’

airquality %>%
     group_by(City, year) %>%
     summarise(across(PM25:CO2, mean))

【讨论】:

  • 所以我通过添加 2010 年城市 A 的 2 个数据点和 2000 年城市 C 的两个数据点来测试建议的答案。
【解决方案2】:

所以我测试了上面的 cmets 并向原始数据集添加了更多复制,因为我想按城市和年份计算平均值。这是更新的数据集

airquality <- data.frame(City = c("CityA", "CityA","CityA","CityA",
                              "CityB","CityB","CityB","CityB",
                              "CityC", "CityC", "CityC"),
                     year = c("1990", "2000", "2010", "2010", 
                              "1990", "2000", "2010", "2010",   
                              "1990", "2000", "2000"),
                              month = c("June", "July", "August", "August",
                              "June", "July", "August","August",
                              "June", "August", "August"),
                              PM10 = c(runif(6), rnorm(5)),
                              PM25 = c(runif(6), rnorm(5)),
                              Ozone = c(runif(6), rnorm(5)),
                              CO2 = c(runif(6), rnorm(5)))
                              airquality

在上面的答案中,AK run 和 Colin 有效。

【讨论】:

    【解决方案3】:

    Colin 的summarise_at 解决方案是最简单的,当然也有几个。 这是另一种解决方案,使用tidyr 重新排列并计算平均值:

    airquality %>%  
      select(City, year, PM25, Ozone, CO2) %>% 
      gather(var, value, -City, -year) %>%
      group_by(City, year, var) %>% 
      summarise(avg = mean(value, na.rm=T)) %>% # can stop here if you want
      spread(var, avg) # optional to make this into a wider table
    # A tibble: 8 x 5
    # Groups:   City, year [8]
        City   year          CO2       Ozone         PM25
    * <fctr> <fctr>        <dbl>       <dbl>        <dbl>
    1  CityA   1990  0.275981522  0.19941717  0.826008441
    2  CityA   2000  0.090342153  0.50949094  0.005052771
    3  CityA   2010  0.007345704  0.21893117  0.625373926
    4  CityB   1990  1.148717447 -1.05983482 -0.961916973
    5  CityB   2000 -2.334429324  0.28301220 -0.828515418
    6  CityB   2010  1.110398814 -0.56434523 -0.804353609
    7  CityC   2000 -0.676236740  0.20661529 -0.696816058
    8  CityC   2010  0.229428142  0.06202997 -1.396357288
    

    【讨论】:

      【解决方案4】:

      你应该试试dplyr::mutate_at

      library(dplyr)
      airquality %>%
        group_by(City, year) %>%
        summarise_at(.vars = c("PM10", "PM25", "Ozone", "CO2"), .funs = mean)
      
      # A tibble: 8 x 6
      # Groups:   City [?]
          City   year         PM10       PM25      Ozone         CO2
        <fctr> <fctr>        <dbl>      <dbl>      <dbl>       <dbl>
      1  CityA   1990  0.004087379  0.5146409 0.44393422  0.61196671
      2  CityA   2000  0.039414194  0.8865582 0.06754322  0.69870187
      3  CityA   2010  0.116901563  0.6608619 0.51499227  0.32952099
      4  CityB   1990 -1.535888778 -0.9601897 1.17183649  0.08380664
      5  CityB   2000  0.226046487  0.4037230 0.86554997 -0.05698204
      6  CityB   2010 -0.824719956  0.1508471 0.32089806 -0.12871853
      7  CityC   2000 -0.824509111 -0.6928741 0.85553837  0.12137923
      8  CityC   2010 -1.626150294  1.5176198 0.21183149 -0.63859910
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2020-08-23
        • 2017-07-23
        • 1970-01-01
        • 2019-04-04
        • 2016-12-02
        • 1970-01-01
        • 2021-08-31
        相关资源
        最近更新 更多