【问题标题】:How to calculate average of data fields of different stations separately [duplicate]如何分别计算不同站点数据字段的平均值[重复]
【发布时间】:2020-03-10 04:12:30
【问题描述】:

我正在尝试根据 HOUR 来平均 RAIN。数据包括 1000 多个站点 24 小时记录的降雨量。每个 HOUR 有 4 个记录,但在某个地方它会变化为 1、2 或 3。我必须为每个 STATION 平均每个 HOUR 的 RAIN。示例数据如下:

STN,     HOBLINAME,   LATI,      LONG_,    RAINDATE, HOUR,  RAIN
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  0,    3.5
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  0,    3
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  0,    3
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  0,    2.5
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  1,    0
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  1,    1
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  1,    2
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  2,    0
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  2,    0
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  2,    0
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  2,    0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  0,   7.5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  1,   7
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  1,   6.5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  2,   6
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  2,   6
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  2,   5.5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  2,   5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  21,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  21,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  21,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  21,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  22,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  22,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  22,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  22,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  23,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  23,   2
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  23,   2.5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  23,   3

我试过了:

copy14   <- read.csv("/home/14copy.csv")
aggregate( RAIN ~ HOUR, copy14, FUN = mean )

但它并没有给出所有站点的所有特定小时的平均值(比如所有站点的 0 小时一起平均)。我想要的是每个站点分别每小时的平均值,即这里对于站点 4471 RAIN 必须单独平均,对于站点 804 单独平均。最后,我应该如何写出这个包含所有相关字段的最终平均值。

【问题讨论】:

  • 请分享dput(head(copy14))的输出
  • 结构(列表(STN = c(4471L, 4471L, 4471L, 4471L, 4471L, 4471L), HOBLINAME = 结构(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = C(“Badami”,“Adagal(GP)”),class=“因子”),Lati = C(15.952089,15.952089,15.952089,15.952089,15.952089,152089,15.952089),Long_ = C(75.673282,75.673282,75.673282,75.673282,75.673282,75.673282,75.673282 , 75.673282, 75.673282), RAINDATE = 结构(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "14-08-17", class= "因子"), HOUR = c(0L, 0L , 0L, 0L, 1L, 1L), RAIN = c(3.5, 3, 3, 2.5, 0, 1)), row.names = c(NA, 6L), class= "data.frame")
  • aggregate( RAIN ~ STN + HOUR, copy14, FUN = mean )

标签: r average mean


【解决方案1】:

使用data.table:

require(data.table); setDT(copy14)

copy14[, .(MeanRain = mean(RAIN)), .(STN, HOUR)]

【讨论】:

    【解决方案2】:

    为了继续您的第一次尝试使用聚合,我给出了这个解决方案。 aggregateby 参数中请求列表或数据框,然后将其应用于给定数据。在我看来, group_by plus summarise 是一个更流畅的解决方案。 尽管如此,这个解决方案也应该在这里显示。

    library(dplyr)
    
    
    copy14 <- read.csv("R/rain.csv")
    
    data <- copy14 %>%
      aggregate(by = copy14 %>%
                  select(STN, HOUR),
                FUN=mean)
    

    【讨论】:

      【解决方案3】:

      使用dplyr 库,我们简单地分组和总结如下:

      library(dplyr)
      copy14 <- read.csv("rain.csv")
      copy14 %>%
      group_by(HOUR, STN) %>%
      summarise(RAIN = mean(RAIN))
      

      【讨论】:

      • OP 想要按 STN-HOUR 对分组 - 您应该添加 STN。
      • @JDG Right 没有看到,但感谢 thomas 将其添加进来。
      猜你喜欢
      • 2014-06-26
      • 1970-01-01
      • 1970-01-01
      • 2020-05-20
      • 2018-07-19
      • 1970-01-01
      • 2019-11-01
      • 1970-01-01
      • 2014-04-20
      相关资源
      最近更新 更多