在不指定列名的情况下将字符串转换为 R 中的分类变量 [重复]答案

【问题标题】：Turn strings into categorical variables in R without specifying column names [duplicate]在不指定列名的情况下将字符串转换为 R 中的分类变量 [重复]
【发布时间】：2020-07-22 15:32:34
【问题描述】：

我有一个名为 df 的数据框，其中包含 70 个字符变量。我正在尝试创建一个函数来将所有这些字符列转换为分类变量，而无需指定每个列名。这方面的一个例子是这样的：

df
  fruits   cars 
1 apple    volvo
2 pear     bwm
3 apple    bwm
4 orange   volvo
5 orange   fiat

我想要的输出如下所示：

df
  fruits   cars 
1 1        1
2 2        2
3 1        2
4 3        1
5 3        3

我尝试转换为因子，然后通过不使用 apply 来指定在单个列上执行时有效的级别。这是我的尝试：

x <- apply(df$fruit, 2, factor)
levels(x) <- 1:length(levels(x))

在函数中失败

label_num <- function(x){
assigned <- 1:length(levels(x))
return(assigned)
}
x <- apply(df, 2, factor)
apply(levels(x), 2, label_num)

我收到以下错误：

Error in apply(levels(x), 2, label_num) : 
  dim(X) must have a positive length

有人可以帮我解决这个问题，因为我对 R 很陌生。非常感谢。

【问题讨论】：

标签： r function dataframe apply

【解决方案1】：

我建议查看dplyr 包。您可以使用mutate_if 快速完成此操作

df <- data.frame(
  fruits = c('apple', 'pear', 'apple', 'orange', 'orange'),
  cars = c('volvo', 'bwm', 'bmw', 'volvo', 'fiat'),
  stringsAsFactors = FALSE
)

str(df)

'data.frame':   5 obs. of  2 variables:
 $ fruits: chr  "apple" "pear" "apple" "orange" ...
 $ cars  : chr  "volvo" "bwm" "bmw" "volvo" ...

library(dplyr)
dfFactors <- df %>% 
  mutate_if(is.character, as.factor)

str(dfFactors)

'data.frame':   5 obs. of  2 variables:
 $ fruits: Factor w/ 3 levels "apple","orange",..: 1 3 1 2 2
 $ cars  : Factor w/ 4 levels "bmw","bwm","fiat",..: 4 2 1 4 3

【讨论】：

mutate_if 已被 dplyr 1.0.0 中的 across() 使用所取代。所以请改用mutate(across(is.character, as.factor))。

【解决方案2】：

试试这个base R 解决方案：

#Data
df <- structure(list(fruits = c("apple", "pear", "apple", "orange", 
"orange"), cars = c("volvo", "bwm", "bwm", "volvo", "fiat")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

#Code
as.data.frame(apply(df,2,function(x) {x<-as.numeric(factor(x,levels = unique(x)))}))

它会产生：

  fruits cars
1      1    1
2      2    2
3      1    2
4      3    1
5      3    3

【讨论】：

【解决方案3】：

基础 R 解决方案：

df <- read.table(text="  fruits   cars 
apple    volvo
pear     bwm
apple    bwm
orange   volvo
orange   fiat", header=TRUE, stringsAsFactors=FALSE)

x <- as.data.frame(lapply(df, function(x) factor(x, labels = seq_along(unique(x)))))
x
#  fruits cars
#1      1    3
#2      3    1
#3      1    1
#4      2    3
#5      2    2

【讨论】：