如何从分组数据中的最后一个条目中减去第一个条目答案

【问题标题】：How to subtract first entry from last entry in grouped data如何从分组数据中的最后一个条目中减去第一个条目
【发布时间】：2014-05-13 14:21:13
【问题描述】：

我希望对以下任务提供一些帮助：从下面的数据框 (C) 中，对于每个 id，我想从最终条目中减去 d_2 列下的第一个条目，然后将结果存储在另一个包含相同 ID 的数据框。然后我可以将它与我的初始数据框合并。请注意，减法必须按此顺序（最后一个条目减去每个id 的第一个条目）。

代码如下：

id <- c("A1", "A1", "B10","B10", "B500", "B500", "C100", "C100", "C100", "D40", "D40", "G100", "G100")

d_1 <- c( rep(1.15, 2), rep(1.44, 2), rep(1.34, 2), rep(1.50, 3), rep(1.90, 2), rep(1.59, 2))

set.seed(2)

d_2 <- round(runif(13, -1, 1), 2)

C <- data.frame(id, d_1, d_2)

id   d_1   d_2
A1   1.15 -0.63
A1   1.15  0.40
B10  1.44  0.15
B10  1.44 -0.66
B500 1.34  0.89
B500 1.34  0.89
C100 1.50 -0.74
C100 1.50  0.67
C100 1.50 -0.06
D40  1.90  0.10
D40  1.90  0.11
G100 1.59 -0.52
G100 1.59  0.52

想要的结果：

id2 <- c("A1", "B10", "B500", "C100", "D40", "G100")

difference <- c(1.03, -0.81, 0, 0.68, 0.01, 1.04)

diff_df <- data.frame(id2, difference)

id2    difference
A1        1.03
B10      -0.81
B500      0.00
C100      0.68
D40       0.01
G100      1.04

我尝试使用ddply 来获取第一个和最后一个条目，但我真的很难在第二个代码（如下）中索引“函数参数”以获得所需的结果。

C_1 <- ddply(C, .(id), function(x) x[c(1, nrow(x)), ])

ddply(C_1, .(patient), function )

说实话，我对 ddply 包不是很熟悉——上面的代码是从另一个 post on stack exchange 那里得到的。

我的原始数据是一个 groupedData，我相信另一种解决方法是使用 gapply，但我再次遇到第三个参数（通常是一个函数）

grouped_C <- groupedData(d_1 ~ d_2 | id, data = C, FUN = mean, labels = list( x = "", y = ""), units = list(""))

x1 <- gapply(grouped_C, "d_2", first_entry)

x2 <- gapply(grouped_C, "d_2", last_entry)

first_entry 和 last_entry 是帮助我获取第一个和最后一个条目的函数。然后我可以得到区别：x2 - x1。但是，我不确定在上述代码中输入什么作为 first_entry 和 last_entry （可能与 head 或 tail 有关？）。

任何帮助将不胜感激。

【问题讨论】：

标签： r dataframe

【解决方案1】：

这可以通过dplyr 轻松完成。 last 和 first 函数对这项任务非常有帮助。

library(dplyr)               #install the package dplyr and load it into library 

diff_df <- C %>%             #create a new data.frame (diff_df) and store the output of the following operation in it. The %.% operator is used to chain several operations together but you dont have to reference the data.frame you are using each time. so here we are using your data.frame C for the following steps
  group_by(id) %>%            #group the whole data.frame C by id
  summarize(difference = last(d_2)-first(d_2))     #for each group of id, create a single line summary where the first entry of d_2 (for that group) is subtracted from the last entry of d_2 for that group

#    id difference             #this is the result stored in diff_df
#1   A1       1.03
#2  B10      -0.81
#3 B500       0.00
#4 C100       0.68
#5  D40       0.01
#6 G100       1.04

编辑说明：使用%>% 更新帖子，而不是已弃用的%.%。

【讨论】：

感谢您的回答，非常感谢。您介意解释代码的语法和各个部分吗？等等。这是我第一次使用这个包。
@John Sure。我添加了 cmets 来解释我的答案中的操作。有关dplyr的更多详细信息，请查看此包介绍（cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html）
@beginneR-cool name 顺便说一句！非常感谢，cmets 非常有帮助，但我也会浏览文档。谢谢！
@docendo 警告消息：1：'%.%' 已弃用。请改用“%>%”。

【解决方案2】：

如果您有任何单身人士并且需要让他们独自一人，那么这将解决您的问题。这与 docendo discimus 的答案相同，但使用 if-else 组件来处理单例情况：

library(dplyr)               
diff_df <- C %>%             
   group_by(id) %>%
   summarize(difference = if(n() > 1) last(d_2) - first(d_2) else d_2)

【讨论】：