【问题标题】:Count frequency by ID and numbers in range按 ID 和范围内的数字计算频率
【发布时间】:2020-11-19 23:54:12
【问题描述】:

我有一个包含 2 列的数据框(30 行):ID 和 foldChange。 我想为每个 ID 计算它总共获得了多少值,以及有多少更小、更大或介于 -2.5 和 2.5 之间。

dput(df)
structure(list(ID = c("GeneA", "GeneA", "GeneA", "GeneA", "GeneB", 
"GeneA", "GeneC", "GeneA", "GeneA", "GeneA", "GeneC", "GeneB", 
"GeneD", "GeneD", "GeneD", "GeneB", "GeneC", "GeneC", "GeneB", 
"GeneE", "GeneB", "GeneC", "GeneE", "GeneD", "GeneD", "GeneD", 
"GeneD", "GeneD", "GeneA", "GeneA"), foldChange = c(-5.1600815, 
0.2356138, 0.2994572, -1.5287992, 1.1800347, 1.1895113, 0.9141108, 
0.9755535, 1.8635915, 3.2866096, -0.8132076, 3.6282988, 0.9746175, 
2.023966, -2.1919911, 0.5949673, 1.2257918, -1.3623925, -0.2271354, 
1.2196725, 0.8754267, -2.2295773, 1.1893983, 1.5627226, 1.5744269, 
0.7333871, 10.8201467, 0.7695394, -1.3149008, -1.3092684)), class = "data.frame", row.names = c(NA, 
-30L))


ID  foldChange
GeneA   -5.1600815
GeneA   0.2356138
GeneA   0.2994572
GeneA   -1.5287992
GeneB   1.1800347
GeneA   1.1895113
GeneC   0.9141108
GeneA   0.9755535
GeneA   1.8635915

这样可以看到每个ID出现的频率

freq_df = df %>%
    group_by(ID) %>%
    dplyr::summarise(n = n()) 

ID      n
GeneA   10
GeneB   5
GeneC   5
GeneD   8
GeneE   2

为了获得每个 ID 有多少个值,请设置 foldChange 2.5 并在这两个值之间我这样做:

df %>%
    group_by(ID) %>%
    dplyr::summarise(n = n()) %>%
    summarize(up = sum(df$foldChange >= 2.5),
              down = sum(df$foldChange <= -2.5),
              nosig = sum(df$foldChange > -2.5 & df$foldChange < 2.5))
`summarise()` ungrouping output (override with `.groups` argument)
  up down nosig
1  3    1    26

但正如您所见,它不起作用,它只是在计算整个 df。

想要的输出:

ID  n   up  down    nosig
GeneA   10  1   1   8
GeneB   5   1   0   4
GeneC   5   0   0   5
GeneD   8   1   0   7
GeneE   2   0   0   2

希望有人能帮我解决这个问题。 谢谢!

【问题讨论】:

  • 尝试删除所有df$。对于他们,您指的是整个 df,而不是每个组。
  • 不工作Error in eval(cols[[col]], .data, parent.frame()) : object 'foldChange' not found
  • @Amaranta_Remedios 您是否在两种情况下都指定dplyr::summarize
  • @AllanCameron 做到了!谢谢

标签: r dplyr


【解决方案1】:

您已经接近了,您可以将n() 包含在summarise() 中,您可以在其中计算updownnosig,并且正如@Rui Barradas 提到的那样,删除summarise 中的df$

library(dplyr)
df <- structure(list(ID = c("GeneA", "GeneA", "GeneA", "GeneA", "GeneB", 
                      "GeneA", "GeneC", "GeneA", "GeneA", "GeneA", "GeneC", "GeneB", 
                      "GeneD", "GeneD", "GeneD", "GeneB", "GeneC", "GeneC", "GeneB", 
                      "GeneE", "GeneB", "GeneC", "GeneE", "GeneD", "GeneD", "GeneD", 
                      "GeneD", "GeneD", "GeneA", "GeneA"), foldChange = c(-5.1600815, 
                                                                          0.2356138, 0.2994572, -1.5287992, 1.1800347, 1.1895113, 0.9141108, 
                                                                          0.9755535, 1.8635915, 3.2866096, -0.8132076, 3.6282988, 0.9746175, 
                                                                          2.023966, -2.1919911, 0.5949673, 1.2257918, -1.3623925, -0.2271354, 
                                                                          1.2196725, 0.8754267, -2.2295773, 1.1893983, 1.5627226, 1.5744269, 
                                                                          0.7333871, 10.8201467, 0.7695394, -1.3149008, -1.3092684)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                          -30L))
df %>% 
  group_by(ID) %>% 
  summarize(
    n = n(),
    up = sum(foldChange >= 2.5),
    down = sum(foldChange <= -2.5),
    nosig = sum(foldChange > -2.5 & foldChange < 2.5)
  )

# A tibble: 5 x 5
  ID        n    up  down nosig
  <chr> <int> <int> <int> <int>
1 GeneA    10     1     1     8
2 GeneB     5     1     0     4
3 GeneC     5     0     0     5
4 GeneD     8     1     0     7
5 GeneE     2     0     0     2

【讨论】:

  • Error: `n()` must only be used inside dplyr verbs. Run `rlang::last_error()` to see where the error occurred.
  • 我没有收到错误消息,您使用的是什么版本的 R 和 dplyr?还加载了其他包?
  • 此 dplyr_1.0.2 R 版本 4.0.3 (2020-10-10) 谢谢
  • 当我添加dplyr::group_by(ID) %&gt;% dplyr::summarize(....)时它工作了谢谢!!
  • 很高兴听到,可能与summarize 函数的命名冲突
【解决方案2】:

请试试这个

df%>%
    group_by(ID)%>%
    summarise(n = length(foldChange),
              up = length(foldChange[foldChange>=2.5]),
              down = length(foldChange[foldChange<= -2.5]),
              nosig = length(foldChange[foldChange> -2.5 & foldChange < 2.5])
    )

【讨论】:

  • 非常感谢!所以在我的机器上它不起作用,但我认为这是一个 dplyr 问题。只有当我写 dplyr::summarize 时它才有效
  • @Amaranta_Remedios 你用library(dplyr)加载了包吗?
  • 是的,但我通常也是这样加载其他的。我还加载了 plyr,我想知道这是否会造成问题?
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-01-04
  • 1970-01-01
  • 2023-02-22
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多