【问题标题】:How can I summarise numbers of levels (nlevels) in grouped data using dplyr?如何使用 dplyr 汇总分组数据中的级别数(nlevels)?
【发布时间】:2017-12-22 13:19:24
【问题描述】:

我想在分组后使用 dplyr 中的 summarise 函数来提取数据框中每个变量的级别数。 这是数据框的副本:

x=c("A","A","A","A","A","B","B","B","B","C","C","C","D","D","D","E","E")
y=c("a","b","c","a","b","a","b","c","d","c","b","e","b","d","f","a","b")
z=c("x","x","x","y","y","p","p","p","p","t","v","v","m","m","n","o","o")
d=data.frame(x,y,z)

这是我正在使用的代码

   library(dplyr)
   d %>%
   group_by(x) %>%
   summarise(total=n(),
          Y=nlevels(y),
          Z=nlevels(z))

但是,这会生成 Y 和 Z 列,汇总数据框“d”中的级别,而不是分组数据中的级别。

我想生成的数据框如下所示:

 x=c("A","B","C","D","E")
 total=c(5,4,3,3,2)
 Y=c(3,4,3,3,2)
 Z=c(2,1,2,2,1)
 d2=data.frame(x,total,Y,Z)
 d2

谢谢!

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    为此您需要n_distinct

    d %>%
      group_by(x) %>%
      summarise(total = n(),
                Y = n_distinct(y),
                Z = n_distinct(z))
    

    结果:

    # A tibble: 5 x 4
           x total     Y     Z
      <fctr> <int> <int> <int>
    1      A     5     3     2
    2      B     4     4     1
    3      C     3     3     2
    4      D     3     3     2
    5      E     2     2     1
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-12-05
      • 2018-04-23
      • 2017-05-31
      • 2021-01-20
      • 2021-03-03
      • 1970-01-01
      相关资源
      最近更新 更多