查找大于 r 中特定频率的特定值的数量

【问题标题】：Find the number of specific value where is greater than a specific frequency in r查找大于 r 中特定频率的特定值的数量
【发布时间】：2020-12-16 03:32:34
【问题描述】：

如果列表超过一定数量，我正在尝试获取列表的频率分布。在我的数据中，我有多个列，我想生成一个代码来标识“0”大于 3 的每一列中“0”的频率。

我的数据集是这样的：


a   b   c   d   e   f   g   h 
0   1   0   1   1   1   1   1
2   0   0   0   0   0   0   0
0   1   2   2   2   1   0   1
0   0   0   0   1   0   0   0
1   0   2   1   1   0   0   0
1   1   0   0   1   0   0   0
0   1   2   2   2   2   2   2
```

The output of the code that I need is :
```
Variable     Frequency
a            4 
c            4 
f            4
g            5
h            4
```

So this will show us the numbers of "0" in the data frame in each column when it is greater than 3.

Thank you.

【问题讨论】：

标签： r dplyr data.table data-cleaning

【解决方案1】：

您可以使用colSums 计算每列中 0 的数量，并将大于 3 的值作为子集。

subset(stack(colSums(df == 0, na.rm = TRUE)), values > 3)

tidyverse 方式是：

library(dplyr)
df %>%
  summarise(across(.fns = ~sum(. == 0, na.rm = TRUE))) %>%
  tidyr::pivot_longer(cols = everything()) %>%
  filter(value > 3)

#  name  value
#  <chr> <int>
#1 a         4
#2 c         4
#3 f         4
#4 g         5
#5 h         4

数据

df <- structure(list(a = c(0L, 2L, 0L, 0L, 1L, 1L, 0L), b = c(1L, 0L, 
1L, 0L, 0L, 1L, 1L), c = c(0L, 0L, 2L, 0L, 2L, 0L, 2L), d = c(1L, 
0L, 2L, 0L, 1L, 0L, 2L), e = c(1L, 0L, 2L, 1L, 1L, 1L, 2L), f = c(1L, 
0L, 1L, 0L, 0L, 0L, 2L), g = c(1L, 0L, 0L, 0L, 0L, 0L, 2L), h = c(1L, 
0L, 1L, 0L, 0L, 0L, 2L)), class = "data.frame", row.names = c(NA, -7L))

【讨论】：