plyr 汇总计数错误行长度答案

【问题标题】：plyr summarize count error row lengthplyr 汇总计数错误行长度
【发布时间】：2018-10-26 23:36:27
【问题描述】：

假设我有以下数据：

A <- c(4,4,4,4,4)
B <- c(1,2,3,4,4)
C <- c(1,2,4,4,4)
D <- c(3,2,4,1,4)

filt <- c(1,1,10,8,10)


data <- as.data.frame(rbind(A,B,C,D,filt))
data <- t(data)
data <- as.data.frame(data)

> data
    A B C d filt
 V1 4 1 1 3    1
 V2 4 2 2 2    1
 V3 4 3 4 4   10
 V4 4 4 4 1    8
 V5 4 4 4 4   10

我想在过滤后计算每个变量出现 1、2、3 和 4 的次数。在我尝试在下面实现这一点时，我得到错误：长度（行）== 1 不是 TRUE。

  data %>%
     dplyr::filter(filt ==1) %>%
      plyr::summarize(A_count = count(A),
                      B_count = count(B))

我收到错误 - 这是因为我的某些列不包含所有值 1-4。有没有办法指定它应该寻找什么&如果没有找到则给出 0 值？如果可能，我不确定如何执行此操作，或者是否有其他解决方法。

非常感谢任何帮助！！！

【问题讨论】：

引发错误的不是data %>% dplyr::filter(filt ==1) 部分，因此您可以摆脱它并简化问题，使其更切中要害（更小的样本数据、单个函数调用等））。这将增加您获得答案的机会。

标签： r count plyr summarize

【解决方案1】：

这有点奇怪，我没有使用经典的plyr，但我认为这大致就是你要找的。我删除了过滤列，filt，以免得到计数：

library(dplyr)

data %>% 
  filter(filt == 1) %>% 
  select(-filt) %>%
  purrr::map_df(function(a_column){
    purrr::map_int(1:4, function(num) sum(a_column == num))
    })

# A tibble: 4 x 4
      A     B     C     D
  <int> <int> <int> <int>
1     0     1     1     0
2     0     1     1     1
3     0     0     0     1
4     2     0     0     0

【讨论】：

谢谢！！我以前从未使用过 purrr，你能解释一下函数（a_column）在做什么吗？ @zack
不用担心。它通常是一个使函数式编程适合 R 中 tidyverse 编码风格的包。我的代码的function(a_column)... 部分将定义的函数应用于 data.frame 中的每一列。然后，我确定了一个匿名函数来映射您关心的值 (1:4)，并检查它们在每列中出现的频率。我还用大括号编辑了答案以使其更清晰，希望对您有所帮助。