【发布时间】:2021-02-10 07:02:38
【问题描述】:
你好,我正在努力查字典,
这是一个头:
V1 V2 V3 scaf_name
1: scaffold_0 1 1 scaffold_0
2: scaffold_0 2 1 scaffold_0
3: scaffold_0 3 1 scaffold_0
4: scaffold_0 4 1 scaffold_0
5: scaffold_0 5 1 scaffold_0
6: scaffold_0 6 1 scaffold_0
这是我尝试过的代码:
tab3<-tab %>%
group_by(scaf_name) %>%
summarise(Avg_group=mean(V3),Length=last(V2))
这是我收到的错误消息
Error: Internal error: Dictionary is full!
这是标签的尺寸
> dim(tab)
[1] 852355422 4
看来使用 dplyr 的数据框太大了,有人知道我该如何解决这个问题吗?
非常感谢
这是df的一小部分
> dput(tab_bis)
structure(list(V1 = c("scaffold_0", "scaffold_0", "scaffold_0",
"scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0",
"scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0",
"scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0",
"scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0",
"scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0",
"scaffold_0", "scaffold_0"), V2 = 1:30, V3 = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), scaf_name = c("scaffold_0",
"scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0",
"scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0",
"scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0",
"scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0",
"scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0",
"scaffold_0", "scaffold_0", "scaffold_0", "scaffold_0")), row.names = c(NA,
-30L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x556f4666b340>)
【问题讨论】:
-
你能用
dput展示一个可重现的小例子吗 -
@akrun 确定我在末尾添加了 df 的简短摘录
-
有了这些数据,我没有收到错误消息。可能是尺寸很重要
-
是的,当然,看到真实数据中有 852 355 422 行,也许有人知道一种方法来做同样的事情,但数据如此庞大? ...
-
因为它是一个data.table,你有没有试过
data.table方法,即tab[, avg : mean(V3), scaf_name]