计算特定列规范的最后一次之后的平均值答案

【问题标题】：Calculate mean after last time of specific column specification计算特定列规范的最后一次之后的平均值
【发布时间】：2018-10-03 07:57:26
【问题描述】：

example.df <- data.frame(GY = sample(300:600, 200, replace = T), sacc 
                     = rep("f", each = 100), trial.number = rep(1:2, 
each = 100), stringsAsFactors = F)
example.df$sacc[50:70] <- "s"
example.df$sacc[164:170] <- "s"

我的数据看起来与此类似。对于 sacc 为 f 的 GY 的所有其余值，我想在最后一次出现“s”后计算 GY 的平均值。在此示例中，我当然可以仅对索引号 71:100 进行平均，但在实际数据中并非如此。

在 Ronak 的评论之后我尝试了什么（谢谢！）：

library(dplyr)
example.df %>%
   group_by(trial.number) %>%
   summarise(mean_tr = mean(GY[(max(which(sacc == "s")) + 1) : n()])) 
%>%
   data.frame()

我无法让它工作。有人可以帮我吗？我的原始 data.frame 是 70k 行，由很多变量组成。 class= 数据帧。

【问题讨论】：

标签： r

【解决方案1】：

更新

由于我们需要按组执行此操作，我们可以在 trial.number 上 split 它，然后对每个组应用相同的操作。

sapply(split(example.df, example.df$trial.number), function(x)
         mean(x$GY[(max(which(x$sacc == "s")) + 1) : nrow(x)]))

#   1        2 
#446.2333 471.7000

同样的使用dplyr可以通过

library(dplyr)
example.df %>%
   group_by(trial.number) %>%
   summarise(mean_tr = mean(GY[(max(which(sacc == "s")) + 1) : n()])) %>%
   data.frame()

# trial.number  mean_tr
#1            1 446.2333
#2            2 471.7000

再次确认，

mean(example.df$GY[71:100])
#[1] 446.2333

mean(example.df$GY[171:200])
#[1] 471.7

原答案

我们可以做

mean(example.df$GY[(max(which(example.df$sacc == "s")) + 1) : nrow(example.df)])
#[1] 443.6667

在这里，我们首先获取sacc 为“s”的所有索引，然后取其中的max 以获取最后一次出现。我们从该索引到数据帧末尾 (nrow(example.df)) 获得 GY 值的平均值。

确认，

mean(example.df$GY[71:100])
#[1] 443.6667

【讨论】：

我应该知道的。试图对 dplyr "last" 做同样的事情，但是这会产生一个值而不是索引号。谢谢，很抱歉，这很简单；）
我无法让它在 dplyr 中工作。我为拥有与 example.df 相似的 data.frames 所做的是 group_by(trial.number) %>% 然后我总结(meanGY = mean(GY[max(which(sacc == "s"))+1:nrow （GY）））但是，不起作用:(有什么建议吗？
@BartR 您分享的示例中没有组。你想按组来表示这个意思吗？你能用这个例子更新帖子吗？
这确实适用于 example.df。奇怪的是，当我尝试与我的正常 data.frame 完全相同时，它会导致评估错误：结果将是一个向量太长和警告：在 max(which(sacc == "s)); no non-missing arguments到最大值；返回 -Inf.. 不知道为什么
@BartR 看起来您的数据非常大，导致计算平均值出现问题。