【问题标题】:Find the number of specific value where is greater than a specific frequency in r查找大于 r 中特定频率的特定值的数量
【发布时间】:2020-12-16 03:32:34
【问题描述】:

如果列表超过一定数量,我正在尝试获取列表的频率分布。在我的数据中,我有多个列,我想生成一个代码来标识“0”大于 3 的每一列中“0”的频率。

我的数据集是这样的:


a   b   c   d   e   f   g   h 
0   1   0   1   1   1   1   1
2   0   0   0   0   0   0   0
0   1   2   2   2   1   0   1
0   0   0   0   1   0   0   0
1   0   2   1   1   0   0   0
1   1   0   0   1   0   0   0
0   1   2   2   2   2   2   2
```

The output of the code that I need is :
```
Variable     Frequency
a            4 
c            4 
f            4
g            5
h            4
```

So this will show us the numbers of "0" in the data frame in each column when it is greater than 3.

Thank you.

【问题讨论】:

    标签: r dplyr data.table data-cleaning


    【解决方案1】:

    您可以使用colSums 计算每列中 0 的数量,并将大于 3 的值作为子集。

    subset(stack(colSums(df == 0, na.rm = TRUE)), values > 3)
    

    tidyverse 方式是:

    library(dplyr)
    df %>%
      summarise(across(.fns = ~sum(. == 0, na.rm = TRUE))) %>%
      tidyr::pivot_longer(cols = everything()) %>%
      filter(value > 3)
    
    #  name  value
    #  <chr> <int>
    #1 a         4
    #2 c         4
    #3 f         4
    #4 g         5
    #5 h         4
    

    数据

    df <- structure(list(a = c(0L, 2L, 0L, 0L, 1L, 1L, 0L), b = c(1L, 0L, 
    1L, 0L, 0L, 1L, 1L), c = c(0L, 0L, 2L, 0L, 2L, 0L, 2L), d = c(1L, 
    0L, 2L, 0L, 1L, 0L, 2L), e = c(1L, 0L, 2L, 1L, 1L, 1L, 2L), f = c(1L, 
    0L, 1L, 0L, 0L, 0L, 2L), g = c(1L, 0L, 0L, 0L, 0L, 0L, 2L), h = c(1L, 
    0L, 1L, 0L, 0L, 0L, 2L)), class = "data.frame", row.names = c(NA, -7L))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2013-09-19
      • 2023-02-21
      • 2019-12-18
      • 1970-01-01
      • 2020-06-08
      • 1970-01-01
      • 2014-04-12
      • 1970-01-01
      相关资源
      最近更新 更多