【问题标题】:Calculating two different means In R在 R 中计算两个不同的均值
【发布时间】:2020-06-05 08:27:59
【问题描述】:

我正在尝试从“R”中的以下数据集中计算两种不同的方法

Plot   Date     Time Canopyheight     mean    pre       post    Diff
103B1 11/12/2019 1    50
103B1 11/12/2019 4    50
103B1 11/12/2019 6    78
103B1 11/12/2019 22   100            69.5
103B1 11/13/2019 1     60
103B1 11/13/2019 4     70
103B1 11/13/2019 6     80 
103B1 11/13/2019 22   100            77.5     73.5
103B1 11/14/2019 1    50
103B1 11/14/2019 4    50
103B1 11/14/2019 6    78
103B1 11/14/2019 22  100            69.5
103B1 11/15/2019 1    60
103B1 11/15/2019 4    80
103B1 11/15/2019 6    90
103B1 11/15/2019 22  120            87.5               78.5     5.0

我能够获得平均值,但我无法获得前值和后值。

预期结果

使用代码,我们应该可以得到 '73.5' 的值,它是 '69.5 和 77.5' 的平均值,其他值也是这样计算的。差值将计算为 Pre 和 Post 值之间的差值。

代码

Prepost <- Prepost %>% group_by(Plot, Date) %>% 
  mutate(meancanopyheight = mean(Canopyheight, na.rm = T))
Prepost$Preharvest <- lapply(Prepost$Date, function(m) mean(Prepost$meanCanopyheight[Prepost$Date >= m |Prepost$Date <= m+4| Prepost$Date == m+8], na.rm = TRUE))

我尝试计算但无法计算,我已在此处添加代码供您参考。

感谢您的帮助。

【问题讨论】:

    标签: r mean


    【解决方案1】:

    你可以像这样使用dplyr

    library(dplyr)
    
    df %>% 
      group_by(Date) %>% 
      summarize(mean = mean(Canopyheight)) %>%
      mutate(group = rep(c("pre", "post"), each = 2)) %>%
      group_by(group) %>%
      summarize(mean = mean(mean))
    #> # A tibble: 2 x 2
    #>   group  mean
    #>   <chr> <dbl>
    #> 1 post   78.5
    #> 2 pre    73.5
    

    reprex package (v0.3.0) 于 2020 年 2 月 20 日创建

    基于来自 OP 的进一步数据,使该解决方案更通用:

    library(dplyr)
    
    
    df <- structure(list(Plot = c("TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", 
    "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", 
    "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", 
    "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", 
    "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", 
    "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", 
    "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", "TF_103B1", 
    "TF_103B1", "TF_103B1"), Date = structure(c(18217, 18217, 18217, 
    18217, 18218, 18218, 18218, 18218, 18219, 18219, 18219, 18219, 
    18220, 18221, 18221, 18221, 18221, 18222, 18222, 18222, 18222, 
    18246, 18246, 18246, 18246, 18247, 18247, 18247, 18247, 18248, 
    18248, 18248, 18248, 18249, 18250, 18250, 18250, 18250, 18251, 
    18251, 18251, 18251), class = "Date"), Time = c("1", "4", "6", 
    "22", "1", "4", "6", "22", "1", "4", "6", "22", "22", "1", "4", 
    "6", "22", "1", "4", "6", "22", "1", "4", "6", "22", "1", "4", 
    "6", "22", "1", "4", "6", "22", "22", "1", "4", "6", "22", "1", 
    "4", "6", "22"), Canopyheight = c(2064.55, 2064.51, 2063.03, 
    2063.62, 2065.94, 2064.83, 2061.58, 2064.07, 2066.97, 2063.99, 
    2065.37, 2064.7, 2067.8, 2065.6, 2067.05, 2064.95, 2075.76, 2073.06, 
    2079.23, 2072.75, 2068.81, 2065.66, 2065.85, 2065.65, 2063.65, 
    2063.44, 2068.05, 2072.38, 2067.2, 2068.1, 2067.26, 2069.27, 
    2063.05, 2088.45, 2086.24, 2088.91, 2092.04, 2092, 2092.67, 2090.7, 
    2091.59, 2090.99)), row.names = c(NA, 42L), class = "data.frame")
    
      df <- df   %>% 
      group_by(Date) %>% 
      summarize(mean = mean(Canopyheight)) %>%
      mutate(prepost = rep(rep(c("pre", "post"), each = 3), length.out = n()))
    
      df$start_date <- rep(df$Date[seq(nrow(df)) %% 6 == 0], each = 6)
    
      df %>%
      group_by(start_date, prepost) %>%
      summarize(mean = mean(mean))
    #> # A tibble: 4 x 3
    #> # Groups:   start_date [2]
    #>   start_date prepost  mean
    #>   <date>     <chr>   <dbl>
    #> 1 2019-11-22 post    2070.
    #> 2 2019-11-22 pre     2064.
    #> 3 2019-12-21 post    2090.
    #> 4 2019-12-21 pre     2067.
    

    reprex package (v0.3.0) 于 2020 年 2 月 21 日创建

    【讨论】:

    • 非常感谢。我在我的数据中使用了这个错误“列 group 的长度必须为 18(行数)或 1,而不是 2”。关于我为什么得到它的任何想法。
    • 啊,是的 @SonisaSharma 一定是因为您的数据中有 18 个不同的日期,而不仅仅是 2 个。您可以将 each = 2 更改为 each = 18。您可能想要一种更强大的方式来改变它。这将要求您分配一个分组变量,该变量在逻辑上将匹配的“前”和“后”测量值结合起来。
    • 谢谢,Allan 和我也尝试过,但仍然收到相同的消息。
    • @SonisaSharma 我认为您可以到达summarize(mean = mean(CanopyHeight)) 吗?输出多少行?
    • 我有 846 行
    猜你喜欢
    • 2012-02-11
    • 2018-05-05
    • 2021-10-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-08-09
    • 1970-01-01
    相关资源
    最近更新 更多