【发布时间】:2016-06-19 07:35:03
【问题描述】:
我的数据结构如下:
Athletes = c("Gus", "Hudson", "Bobby", "Tom")
set.seed(400)
RawData <- data.frame(Name = rep((Athletes), each = 400),
Quarter = as.numeric(rep(1:4, each = 100)),
Sample = as.numeric(rep(1:100, each = 1)),
X = runif(400, 26, 30),
Y = runif(400, 12, 16))
我希望为每个Athlete 每个Sample 每个Quarter 计算每个X 和Y 对的位移。为此,我设置了以下代码:
DistanceOutput <- RawData %>%
arrange(Name, Sample, Quarter) %>%
group_by(Name, Quarter) %>%
mutate( lagX = lag(X, order_by=Sample), lagY = lag(Y, order_by=Sample)) %>%
rowwise() %>%
mutate(Distance = dist( matrix( c(X,Y,lagX,lagY),nrow=2,byrow=TRUE) )) %>%
select(-lagX, -lagY)
但是,这会返回一个data.frame,其结构如下:
> head(DistanceOutput, n=10)
Source: local data frame [10 x 6]
Name Quarter Sample X Y Distance
(fctr) (dbl) (dbl) (dbl) (dbl) (dbl)
1 Bobby 1 1 27.82656 13.85830 NA
2 Bobby 2 1 27.37298 15.67940 NA
3 Bobby 3 1 28.74274 12.25703 NA
4 Bobby 4 1 26.63564 13.07924 NA
5 Bobby 1 2 26.32446 12.64722 1.929508
6 Bobby 2 2 26.88957 14.52096 NA
7 Bobby 3 2 27.53932 15.57959 3.533781
8 Bobby 4 2 28.03031 12.70763 1.443328
9 Bobby 1 3 29.68239 13.82739 3.559287
10 Bobby 2 3 29.43869 12.60890 3.186531
相反,我希望我的数据设置如下:
> head(DistanceOutput, n=3)
Source: local data frame [10 x 6]
Name Quarter Sample X Y Distance
(fctr) (dbl) (dbl) (dbl) (dbl) (dbl)
1 Bobby 1 1 27.82656 13.85830 NA
2 Bobby 1 2 26.32446 12.64722 1.929508
3 Bobby 1 3 29.68239 13.82739 3.559287
我如何正确设置 group_by 并在 dplyr 中安排语句以正确反映我想要的输出?
谢谢。
【问题讨论】:
-
抱歉,感谢您通知我没有包含
set.seed。