【问题标题】:Recode data values in a dataframe into combined values in R将数据框中的数据值重新编码为 R 中的组合值
【发布时间】:2020-02-03 01:00:32
【问题描述】:

我试图比较婚姻状况,我的变量有“已婚”、“未婚”、“订婚”、“单身”和“未婚”的名称。我怎样才能让这些数据只读作“已婚”和“未婚”? (订婚算已婚,单身不单身算未婚)

样本数据集

data.frame(mstatus = sample(x = c("married", 
                                  "not married", 
                                  "engaged", 
                                  "single", 
                                  "not married"), 
                            size = 15, replace = TRUE))

这是我目前所拥有的

df2 <- df%>%mutate(
  mstatus = (tolower(mstatus))
)

【问题讨论】:

    标签: r dataframe


    【解决方案1】:

    你可以使用dplyr(tidyverse packge)中的mutate()函数:

    df <- df %>% dplyr::mutate(mstatus = case_when(
        mstatus == "married" | mstatus == "engaged"  ~ "married",
        mstatus == "not married" | mstatus == "single" ~ "not married"
    ))
    

    【讨论】:

      【解决方案2】:

      我想最简单的基本 R 方法是使用 ifelse 语句:

      df2$mstatus_new <- ifelse(df2$mstatus=="engaged"|df2$mstatus=="married", "married", "not married")
      

      数据:

      df2 <- data.frame(
        mstatus = c("married", "not married", "engaged", "single", "nota married"))
      df2
             mstatus
      1      married
      2  not married
      3      engaged
      4       single
      5 nota married
      

      结果:

      df2
             mstatus mstatus_new
      1      married     married
      2  not married not married
      3      engaged     married
      4       single not married
      5 nota married not married
      

      【讨论】:

        【解决方案3】:

        如果我们需要重新编码'mstatus,一个选项是forcats

        library(dplyr)
        library(forcats)
        df2 %>%
              mutate(mstatus = fct_recode(mstatus, married = "engaged",
                 `not married` = "single"))
        #      mstatus
        #1     married
        #2 not married
        #3     married
        #4 not married
        #5 not married
        

        或者如果要更改的值很多,请使用fct_collapse,它可以采用值向量

        df2 %>%
           mutate(mstatus = fct_collapse(mstatus, married = c('engaged'), 
                 `not married` = c("single")))
        

        数据

        df2 <- structure(list(mstatus = structure(c(2L, 3L, 1L, 4L, 3L), .Label = c("engaged", 
        "married", "not married", "single"), class = "factor")),
        class = "data.frame", row.names = c(NA, 
        -5L))
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2019-03-06
          • 2020-01-26
          • 2020-07-23
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多