好的,这适用于示例数据。最好在我们有更多主题并且列中的值大于 1 的地方运行它。我假设它是一个名为 dt 的 data.table 对象
1.索引
使用merge 操作更改行排序真的很容易,所以不要依赖行号,而是通过subject 创建一个rowid。 .N 是用于长度/行数的 data.table 语法。
# order matters, so make a rowid
dt[, rowid := 1:.N, by=subject]
# sets orders and indexing to make it quicker
setkey(dt, subject, rowid)
2。见过的cols
需要将stim1 和stim2 合并为一列。通过使用melt 从宽格式到长格式来做到这一点。
seen:=0:(.N-1) 然后按这些值分组以按行查找累积出现次数。但是当我们查看先前的值时,我们会减去 1。
然后我们进行两次合并,因为我们有兴趣将其与两个 stim cols 进行比较
# for seen, melt wide to long
dt_seen <- melt(dt,
id.vars = c("subject", "rowid"),
measure.vars = c("stim1", "stim2"))
# interested in finding occurences
dt_seen <- unique(dt_seen[, .(subject, rowid, value)])
setorder(dt_seen, rowid)
dt_seen[, seen:=0:(.N-1), by=.(subject, value)]
# merge across twice
dt <- merge(dt, dt_seen,
by.x=c("subject", "rowid", "stim1"),
by.y=c("subject", "rowid", "value"),
all.x=TRUE, sort=FALSE)
setnames(dt, "seen", "stim1_seen")
dt <- merge(dt, dt_seen,
by.x=c("subject", "rowid", "stim2"),
by.y=c("subject", "rowid", "value"),
all.x=TRUE, sort=FALSE)
setnames(dt, "seen", "stim2_seen")
dt[]
3.选择
我一直很懒惰,并有效地完成了与第 (2) 节相同的操作,但首先过滤到 Chosen 与 stim 值匹配的行。并且一个一个地做而不是一起做,因为这些cols是独立的。 stim1 和 stim2 的过程是相同的,所以可以稍微整理一下。
# turn Chosen from wide to long
dt_chosen <- melt(dt,
id.vars = c("subject", "rowid"),
measure.vars = c("Chosen"))
# interested in finding occurences
# need to expand
dt_chosen[, variable := NULL]
# going to expand the grid, so can look at e.g. value 50 for all rowids
library(tidyr)
dt_chosen[, chosen_row := 1]
dt_chosen_full <- expand(dt_chosen, nesting(subject, rowid), value) %>% setDT
# pull in the actual data and fill rest with 0's
dt_chosen_full <- merge(dt_chosen_full, dt_chosen, by=c("subject", "rowid", "value"),
all.x=TRUE)
dt_chosen_full[is.na(chosen_row), chosen_row := 0]
# use cumsum to identify now the cumulative count of these across the full row set
dt_chosen_full[, chosen := cumsum(chosen_row), by=.(subject, value)]
# as its prior, on the row itself, subtract one so the update happens after the row
dt_chosen_full[chosen_row==1, chosen := chosen-1]
# merge across twice
dt <- merge(dt, dt_chosen_full[, -"chosen_row"],
by.x=c("subject", "rowid", "stim1"),
by.y=c("subject", "rowid", "value"),
all.x=TRUE, sort=FALSE)
setnames(dt, "chosen", "stim1_chosen")
dt[is.na(stim1_chosen), stim1_chosen := 0]
dt <- merge(dt, dt_chosen_full[, -"chosen_row"],
by.x=c("subject", "rowid", "stim2"),
by.y=c("subject", "rowid", "value"),
all.x=TRUE, sort=FALSE)
setnames(dt, "chosen", "stim2_chosen")
dt[is.na(stim2_chosen), stim2_chosen := 0]
输出
dt[]
subject rowid stim2 stim1 Chosen stim1_seen stim2_seen stim1_chosen stim2_chosen
1: 1021 1 50 51 50 0 0 0 0
2: 1021 2 50 48 50 0 1 0 1
3: 1021 3 47 49 49 0 0 0 0
4: 1021 4 46 48 48 1 0 0 0
5: 1021 5 51 49 49 1 1 1 0
6: 1021 6 47 46 46 1 1 0 0