【发布时间】:2020-11-19 23:54:12
【问题描述】:
我有一个包含 2 列的数据框(30 行):ID 和 foldChange。 我想为每个 ID 计算它总共获得了多少值,以及有多少更小、更大或介于 -2.5 和 2.5 之间。
dput(df)
structure(list(ID = c("GeneA", "GeneA", "GeneA", "GeneA", "GeneB",
"GeneA", "GeneC", "GeneA", "GeneA", "GeneA", "GeneC", "GeneB",
"GeneD", "GeneD", "GeneD", "GeneB", "GeneC", "GeneC", "GeneB",
"GeneE", "GeneB", "GeneC", "GeneE", "GeneD", "GeneD", "GeneD",
"GeneD", "GeneD", "GeneA", "GeneA"), foldChange = c(-5.1600815,
0.2356138, 0.2994572, -1.5287992, 1.1800347, 1.1895113, 0.9141108,
0.9755535, 1.8635915, 3.2866096, -0.8132076, 3.6282988, 0.9746175,
2.023966, -2.1919911, 0.5949673, 1.2257918, -1.3623925, -0.2271354,
1.2196725, 0.8754267, -2.2295773, 1.1893983, 1.5627226, 1.5744269,
0.7333871, 10.8201467, 0.7695394, -1.3149008, -1.3092684)), class = "data.frame", row.names = c(NA,
-30L))
ID foldChange
GeneA -5.1600815
GeneA 0.2356138
GeneA 0.2994572
GeneA -1.5287992
GeneB 1.1800347
GeneA 1.1895113
GeneC 0.9141108
GeneA 0.9755535
GeneA 1.8635915
这样可以看到每个ID出现的频率
freq_df = df %>%
group_by(ID) %>%
dplyr::summarise(n = n())
ID n
GeneA 10
GeneB 5
GeneC 5
GeneD 8
GeneE 2
为了获得每个 ID 有多少个值,请设置 foldChange 2.5 并在这两个值之间我这样做:
df %>%
group_by(ID) %>%
dplyr::summarise(n = n()) %>%
summarize(up = sum(df$foldChange >= 2.5),
down = sum(df$foldChange <= -2.5),
nosig = sum(df$foldChange > -2.5 & df$foldChange < 2.5))
`summarise()` ungrouping output (override with `.groups` argument)
up down nosig
1 3 1 26
但正如您所见,它不起作用,它只是在计算整个 df。
想要的输出:
ID n up down nosig
GeneA 10 1 1 8
GeneB 5 1 0 4
GeneC 5 0 0 5
GeneD 8 1 0 7
GeneE 2 0 0 2
希望有人能帮我解决这个问题。 谢谢!
【问题讨论】:
-
尝试删除所有
df$。对于他们,您指的是整个 df,而不是每个组。 -
不工作
Error in eval(cols[[col]], .data, parent.frame()) : object 'foldChange' not found -
@Amaranta_Remedios 您是否在两种情况下都指定
dplyr::summarize? -
@AllanCameron 做到了!谢谢