【问题标题】:create multiple columns (dummies) out of multiple answer survey question [duplicate]从多个答案调查问题中创建多个列(假人)[重复]
【发布时间】:2020-08-26 12:23:09
【问题描述】:

如何从多个答案列中创建多个(虚拟)列?我想完全自动执行此操作,以便它自动检测答案。像这样的函数: create_multiple_columns(df$posessions, sep = " ")

为了说明,它会自动创建“帽子”、“桌子”和“笔”假人..


df = data.frame(person = c(1, 2, 3, 4), 
                posessions = c("hat table pen", "hat", "table", "hat pen"), 
                hat = c(1,1,0,1),
                table = c(1,0,1,0),
                pen = c(1,0,0,1)
                )
      
              

【问题讨论】:

    标签: r


    【解决方案1】:

    我建议采用tidyverse 方法,在这种方法中重塑数据可以获得更接近您想要的结果:

    library(tidyverse)
    #Data
    df2 <- structure(list(person = c(1, 2, 3, 4), posessions = structure(c(3L, 
    1L, 4L, 2L), .Label = c("hat", "hat pen", "hat table pen", "table"
    ), class = "factor")), class = "data.frame", row.names = c(NA, 
    -4L))
    

    代码:

    df2 %>% separate(posessions,into = c('v1','v2','v3'),sep = ' ') %>%
      pivot_longer(cols = -1) %>% filter(!is.na(value)) %>%
      group_by(person,value) %>% summarise(N=n()) %>%
      pivot_wider(names_from = value, values_from=N) %>%
      replace(is.na(.),0)
    

    输出:

    # A tibble: 4 x 4
    # Groups:   person [4]
      person   hat   pen table
       <dbl> <int> <int> <int>
    1      1     1     1     1
    2      2     1     0     0
    3      3     0     0     1
    4      4     1     1     0
    

    【讨论】:

      【解决方案2】:

      使用data.table:

      df = data.table(
        person = c(1, 2, 3, 4), 
        posessions = c("hat table pen", "hat", "table", "hat pen")
      )
      
      all_words <- df$posessions %>% str_split(" ") %>% unlist() %>% unique()
      df[, (all_words) := map(all_words, ~str_detect(posessions, .x) * 1L)]
      

      【讨论】:

        【解决方案3】:

        您可以使用strsplit 拆分字符串,获取unique 单词并使用%in% 测试它们是否存在。

        x <- strsplit(df$posessions, " ")
        y <- unique(unlist(x))
        z <- +(do.call(rbind, lapply(x, "%in%", x=y)))
        colnames(z) <- y
        cbind(df[1:2], z)
        #  person    posessions hat table pen
        #1      1 hat table pen   1     1   1
        #2      2           hat   1     0   0
        #3      3         table   0     1   0
        #4      4       hat pen   1     0   1
        

        【讨论】:

          【解决方案4】:

          这是使用strsplit + table 的基本 R 选项

          p <- strsplit(df$posessions, " ")
          cbind(
            df,
            do.call(
              rbind,
              Map(
                function(x) table(factor(x, levels = unique(unlist(p)))),
                p
              )
            )
          )
          

          给了

            person    posessions hat table pen
          1      1 hat table pen   1     1   1
          2      2           hat   1     0   0
          3      3         table   0     1   0
          4      4       hat pen   1     0   1
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 2013-09-24
            • 2021-06-19
            • 2017-10-06
            • 1970-01-01
            • 2023-03-09
            • 1970-01-01
            • 1970-01-01
            相关资源
            最近更新 更多