如何使用 for 循环覆盖 R 中多个因子变量的级别？答案

【问题标题】：How to overwrite levels of multiple factor variables in R using for loop?如何使用 for 循环覆盖 R 中多个因子变量的级别？
【发布时间】：2021-06-17 12:27:55
【问题描述】：

我有一个数据框，其中有多个具有相同水平的因子变量。列出以下级别

"Completely Agree (3)" "Do not Agree (1)" "Somewhat Agree (2)"

大约有 18 个变量具有相同的三个级别。我想使用 for 循环并以下列方式覆盖这些级别

不同意和有点同意应该是0

完全同意应该是1

我尝试使用以下代码

for (i in LoopVec.St){
  levels(data[,i]) <- c(1,0,0)
}

LoopVec.St 具有所有 18 个具有相同级别的变量的列名。

levels(data[,i]) <- c(1,0,0) 在我将其用于单个变量时起作用。但是当我在 for 循环中使用它时，它会引发以下错误。

levels<-(*tmp*, value = c(1, 0, 0)) 中的错误：因子水平 [3] 重复

请帮帮我。

【问题讨论】：

没有您的数据站点，很难给您具体的建议。但是使用 across 的 tidyverse 解决方案将允许您在一行中修改所有 18 个变量，而无需使用循环。顺便说一句，这几乎肯定是tidying 您的数据会有所帮助的情况。
@Limey - 你能详细说明一下吗？

标签： r loops for-loop

【解决方案1】：

响应 OP 在 cmets 中的请求...

library(tidyverse)

# Test data.  3 questions just to demonstrate the principle.
d <- tibble(
       Participant=1:10,
       Q1=factor(sample(1:3, 10, TRUE), labels=c("Do not Agree (1)", "Somewhat Agree (2)","Completely Agree (3)")),
       Q2=factor(sample(1:3, 10, TRUE), labels=c("Do not Agree (1)", "Somewhat Agree (2)","Completely Agree (3)")),
       Q3=factor(sample(1:3, 10, TRUE), labels=c("Do not Agree (1)", "Somewhat Agree (2)","Completely Agree (3)"))
)

重新编码因素

# Recode untidy data
d %>% mutate(
        across(
          starts_with("Q"), 
          function(x) factor(as.numeric(x) == 3, labels=c("Do not completely agree (1&2)", "Completely agree (3)"))
        )
      )

# Tidy the data
dTidy <- d %>% 
           pivot_longer(
             cols=starts_with("Q"),
             values_to="Response",
             names_to="Question"
           )
dTidy

# Recode tidy data            
dTidy %>% 
   mutate(
     Response=factor(
                as.numeric(Response) == 3, 
                labels=c("Do not completely agree (1&2)", "Completely agree (3)")
              )
   )

到目前为止没有太大区别。当我们尝试用它做一些事情时，整洁数据的好处变得更加明显。作为一个简单的示例，绘制问题的直方图。杂乱无章的数据并不是特别适合。这里有一个简单的总结：

# Plot untidy data
doPlots <- function(data) {
  print(data %>% ggplot() + geom_bar(aes(x=Q1)))
  print(data %>% ggplot() + geom_bar(aes(x=Q2)))
  print(data %>% ggplot() + geom_bar(aes(x=Q3)))
}

d %>% doPlots()

其他任何事情都很尴尬。有了整齐的数据，就很简单了：

# Plot tidy data
dTidy %>% 
  ggplot() +
  geom_bar(aes(x=Response)) +
  facet_grid(rows=vars(Question))

# Or
dTidy %>% 
  ggplot() +
  geom_bar(aes(x=Response, fill=Question))

此外，假设到达了不同的数据集，其中的问题比原始数据集更多。

# Now add another Question  
d <- d %>% mutate(Q4=factor(sample(1:3, 10, TRUE), labels=c("Do not Agree (1)", "Somewhat Agree (2)", "Completely Agree (3)")))

dTidy  <- dTidy %>% 
            bind_rows(
              tibble(
                Participant=1:10, 
                Question="Q4", 
                Response=factor(sample(1:3, 10, TRUE), labels=c("Do not Agree (1)", "Somewhat Agree (2)", "Completely Agree (3)"))
              )
            )

doPlot 函数需要重写：它忽略了 Q4。

d %>% doPlot()

但是整洁的代码很健壮，不需要修改

dTidy %>% 
  ggplot() +
  geom_bar(aes(x=Response)) +
  facet_grid(rows=vars(Question))

在我看来，使用整洁的数据意味着你的代码是

更紧凑
更容易理解
更健壮
更易于维护
更灵活

【讨论】：

【解决方案2】：

我认为你应该使用fct_recode():

library(tidyverse)
var <- factor(rep(letters[1:3], each = 5))
fct_recode(var, "new1" = "a", "new2" = "b", "new2" = "c")

#> [1] new1 new1 new1 new1 new1 new2 new2 new2 new2 new2 new2 new2 new2 new2 new2
#> Levels: new1 new2

更多示例，请查看r4ds book。

【讨论】：

嗨，我试过了。同样，如果我单独运行它，这是可行的，但是当使用 for 循环执行时，它会引发以下错误。 >错误：f 必须是一个因子（或字符向量）。

【解决方案3】：

你使用函数因子如下：

data[,i] = factor(data[,i], 
                  levels= c("Do not Agree (1)",
                            "Somewhat Agree (2)",
                            "Completely Agree (3)"
                  ))

您还可以在tidyverse 中查看不同的重新调平功能。例如fct_relevel 函数。 https://forcats.tidyverse.org/reference/fct_relevel.html

【讨论】：

我不想在这里重新调整因素。相反，我正在尝试重新编码它们。我想合并级别有点同意和不同意并将它们编码为 0 和完全同意为 1。