【问题标题】:How to create a new variable and assign it a value corresponding to another variable in R?如何创建一个新变量并为其分配一个与 R 中另一个变量相对应的值?
【发布时间】:2021-06-04 18:39:08
【问题描述】:

这里是一些与我正在使用的真实数据集相对应的模拟数据:

模拟数据集

    a <- c("a","b","c","d","e","f","g","h","i","j")
    b <- 1:10
    names <-c("Alex","Ale","Alexandra","Alexander","Ali","Amanda","Alix","Ajax","Aley","Ajay")
    data <- data.frame(a,b,names)

创建新变量性别

    data <- data %>% 
      mutate(gender = NA)

我想为我的数据集中的names 变量分配一个“性别”值。我不想手动执行此操作,因为我正在处理 1000 次观察。然而,我确实有这些变量,其中包含对应于正确性别的“名称”值:

male <- c("Alex", "Ale", "Alexander")
female <- c("Alexandra", "Ali", "Amanda")
noanswer <- c("Alix", "Ajax", "Aley", "Ajay")

但是我不知道如何使用它们来分配“性别”值以与我的数据集中的特定“名称”相对应。

这是我尝试过的:

data$gender[data$names== male] <- "Male"

还有:

data$gender[data$names== c("Alex", "Ale", "Alexander")] <- "Male" 

此代码并未将“男性”分配给所有值。我收到一条警告消息:

"Warning message:
In data$names == c("Alex", "Ale", "Alexander") :
  longer object length is not a multiple of shorter object length"

有谁知道我如何为与names 变量对应的gender 变量赋值?

【问题讨论】:

    标签: r dataframe factors


    【解决方案1】:

    我们可以创建一个命名为list 然后stack 将它用于我们在连接中使用的两列数据集

    new <- stack(list(male = male, female = female, noanswer = noanswer))
    names(new) <- c("names", "gender")
    data <- data %>% 
        left_join(new, by = "names")
    

    -输出

    data
       a  b     names   gender
    1  a  1      Alex     male
    2  b  2       Ale     male
    3  c  3 Alexandra   female
    4  d  4 Alexander     male
    5  e  5       Ali   female
    6  f  6    Amanda   female
    7  g  7      Alix noanswer
    8  h  8      Ajax noanswer
    9  i  9      Aley noanswer
    10 j 10      Ajay noanswer
    

    关于 OP 的 warning,只是 == 是元素比较,这主要适用于当数据集 1 的 length 为 1(被回收)或相同 length 时作为另一个。在这里,lengths 是不同的。因此,它会被回收,并且由于它不是其他向量长度的倍数,因此会发出警告。但是,有时我们没有收到警告,但它仍然是不正确的,因为它的作用类似于下面的那个。如果第二个向量的长度为 3,第一个为 5

    v1[1] == v2[1]
    v1[2] == v2[2]
    v1[3] == v2[3]
    v1[4] == v2[1]
    ...
    

    相反,我们可以使用%in%

    data$gender[data$names %in% male] <- "Male"
    data$gender[data$names %in% female] <- "Female"
    data$gender[data$names %in% noanswer] <- "noanswer"
    

    数据

    data <- structure(list(a = c("a", "b", "c", "d", "e", "f", "g", "h", 
    "i", "j"), b = 1:10, names = c("Alex", "Ale", "Alexandra", "Alexander", 
    "Ali", "Amanda", "Alix", "Ajax", "Aley", "Ajay")),
      class = "data.frame", row.names = c(NA, 
    -10L))
    

    【讨论】:

      【解决方案2】:

      您也可以使用以下解决方案:

      library(dplyr)
      
      male <- c("Alex", "Ale", "Alexander")
      female <- c("Alexandra", "Ali", "Amanda")
      noanswer <- c("Alix", "Ajax", "Aley", "Ajay")
      
      data %>%
        mutate(gender = case_when(
          names %in% male ~ "Male",
          names %in% female ~ "Female",
          names %in% noanswer ~ "Noanswer"
        ))
      
         a  b     names   gender
      1  a  1      Alex     Male
      2  b  2       Ale     Male
      3  c  3 Alexandra   Female
      4  d  4 Alexander     Male
      5  e  5       Ali   Female
      6  f  6    Amanda   Female
      7  g  7      Alix Noanswer
      8  h  8      Ajax Noanswer
      9  i  9      Aley Noanswer
      10 j 10      Ajay Noanswer
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2016-10-21
        • 1970-01-01
        • 2021-08-18
        • 2018-03-17
        • 1970-01-01
        相关资源
        最近更新 更多