计算变量的唯一级别数答案

【问题标题】：Count number of unique levels of a variable计算变量的唯一级别数
【发布时间】：2016-07-21 00:15:28
【问题描述】：

我正在尝试一种简单的方法来计算数据框列中不同类别的数量。

例如，在 iris 数据框中，有 150 行，其中一列是物种，其中有 3 个不同的物种。我希望能够运行这段代码并确定该列中有 3 个不同的物种。我不在乎每个唯一条目对应多少行，只关心有多少不同的变量，这主要是我在研究中发现的。

我在想这样的事情：

df <- iris
choices <- count(unique(iris$Species))

是否存在如此简单的解决方案？我看过这些帖子，但它们要么检查整个数据框而不是该数据框中的单个列，要么提供比我希望的更复杂的解决方案。

count number of instances in data frame

Count number of occurrences of categorical variables in data frame (R)

How to count number of unique character vectors within a subset of data

【问题讨论】：

试试choices <- length(unique(iris$Species))
@ImranAli 只要我指定choices <- as.numeric(length(unique(iris$Species))) 就完美了如果您将评论作为答案，我会将其标记为正确。
我已将我的评论添加为答案
获取所有列的计数：lengths(lapply(iris, unique))stackoverflow.com/questions/22196078/…

标签： r dataframe

【解决方案1】：

以下应该做的工作：

choices <- length(unique(iris$Species))

【讨论】：

【解决方案2】：

如果您需要计算 data.frame 每一列的唯一实例数，您可以使用sapply：

sapply(iris, function(x) length(unique(x)))
#### Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
####  35           23          43            22               3

对于一个特定的列，@Imran Ali（在 cmets 中）建议的代码非常好。

【讨论】：

【解决方案3】：

如果我们使用dplyr，n_distinct 将获得每列中唯一元素的数量

library(dplyr)
iris %>%
      summarise_each(funs(n_distinct))
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1           35          23           43          22       3

【讨论】：

【解决方案4】：

使用 data.table 更容易：

require(data.table)
uniqueN(iris$Species)

【讨论】：

【解决方案5】：

另一种计算 'iris' 中所有列的唯一值的方法：

> df <- iris

> df$Species <- as.character(df$Species)

> aggregate(values ~ ind, unique(stack(df)), length)
           ind values
1 Petal.Length     43
2  Petal.Width     22
3 Sepal.Length     35
4  Sepal.Width     23
5      Species      3
>

【讨论】：

【解决方案6】：

使用 Tidyverse 包计数的另一种简单方法：

iris %>% 
  count(Species)

     Species  n
1     setosa 50
2 versicolor 50
3  virginica 50

【讨论】：

【解决方案7】：

Dplyr version 1 引入了across，这使得这项任务与n_distinct() 一起相对简单：

library(dplyr)

# for a specific column
iris %>% 
  summarise(across(Species, n_distinct))
#   Species
# 1       3

# only for factors
iris %>% 
  summarise(across(where(is.factor), nlevels))
#   Species
# 1       3

# for all columns 
iris %>% 
  summarise(across(everything(), n_distinct))
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1           35          23           43          22       3

【讨论】：