【发布时间】:2020-09-19 05:32:14
【问题描述】:
我有 2 个数据框
das <- data.frame(val=1:20,
type =c("A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C","C"),
weigh=c(20,22,23,32,34,54,19,22,24,26,31,34,36,37,51,54,31,35,43,45))
mapper <- data.frame(type=c("A","A","A","A","B","B","B","B","C","C","C","C"),start = c(19,23,27,37 ,17,25,39,50, 17,23,33,39),end = c(23,27,37,55 ,25,39,50,60, 23,33,39,48))
预期的输出是
val type weigh labelweight
1 1 A 20 A_19
2 2 A 22 A_19
3 3 A 23 A_23
4 4 A 32 A_27
5 5 A 34 A_27
6 6 A 54 A_37
7 7 B 19 B_17
8 8 B 22 B_17
9 9 B 24 B_17
10 10 B 26 B_25
11 11 B 31 B_25
12 12 B 34 B_25
13 13 B 36 B_25
14 14 B 37 B_25
15 15 B 51 B_50
16 16 B 54 B_50
17 17 C 31 C_23
18 18 C 35 C_33
19 19 C 43 C_39
20 20 C 45 C_39
我可以使用以下代码获得预期的输出
p <- left_join(das,mapper)
q <- p%>%filter(weigh>=start & weigh<end)%>%mutate(labelweight= paste0(type,"_",start))
我想出的代码在处理大型数据集时会抛出“错误:向量内存已用尽(达到限制?)”。
我正在考虑是否有任何更有效的方法来获得所需的输出而不进行连接。
【问题讨论】: