【发布时间】:2018-04-04 14:44:08
【问题描述】:
我想创建一个滚动函数,有条件地计算前行中两列的出现次数。
例如,我有一个如下所示的数据集。
# Generate data
set.seed(123)
test <- data.frame(
Round = rep(1:5, times = 3),
Team = rep(c("Team 1", "Team 2", "Team 3"), each = 5),
Venue = sample(sample(c("Venue A", "Venue B"), 15, replace = T))
)
Round Team Venue
1 1 Team 1 Venue B
2 2 Team 1 Venue B
3 3 Team 1 Venue A
4 4 Team 1 Venue A
5 5 Team 1 Venue B
6 1 Team 2 Venue B
7 2 Team 2 Venue B
8 3 Team 2 Venue A
9 4 Team 2 Venue A
10 5 Team 2 Venue A
11 1 Team 3 Venue B
12 2 Team 3 Venue A
13 3 Team 3 Venue B
14 4 Team 3 Venue B
15 5 Team 3 Venue B
我想要一个新列,显示每一行中该行的球队在过去 3 轮比赛中在该行的场地进行比赛的次数。
我可以很容易地用 for 循环做到这一点。
window <- 3
for (i in 1:nrow(dat)){
# Create index to search (if i is less than window, start at 1)
index <- max(i - window, 1):i
# Search when current row matches both team and venue
dat$VenueCount[i] <- sum(dat$Team[i] == dat$Team[index] & dat$Venue[i] == dat$Venue[index])
}
Round Team Venue VenueCount
1 1 Team 1 Venue B 1
2 2 Team 1 Venue B 2
3 3 Team 1 Venue A 1
4 4 Team 1 Venue A 2
5 5 Team 1 Venue B 2
6 1 Team 2 Venue B 1
7 2 Team 2 Venue B 2
8 3 Team 2 Venue A 1
9 4 Team 2 Venue A 2
10 5 Team 2 Venue A 3
11 1 Team 3 Venue B 1
12 2 Team 3 Venue A 1
13 3 Team 3 Venue B 2
14 4 Team 3 Venue B 3
15 5 Team 3 Venue B 3
但是,我想避免 for 循环(主要是因为我的实际数据集相对较大,约为 30k 行)。我认为使用zoo、dplyr、purrr 或apply 之一应该是可行的,但无法解决。
谢谢
【问题讨论】: