【问题标题】:How to add indicator variable columns and add names?如何添加指标变量列并添加名称?
【发布时间】:2020-10-12 06:27:39
【问题描述】:

我正在使用一个数据框,我在其中创建了二进制变量,这些变量指示某个人是否出现在“玩家”列中。

Layer       Grade       Players                    Var 2             NYAL 08   NYAL 27        
Top           A         NYAL 08; NYAL 27; NYAL 80  NYAL 08; MAAC 48    1       1      ...
Bottom        D         MAAC 27; MAAC 45; MAAC 65  NYAL 27             0       0      ...    
Middle        B         NYAL 08; MAAC 48; NYAL 66  MAAC 48;MAAC 22     0       0      ...       
...

我想将二进制变量添加到相同的数据集中,以简单地指示个体是否存在于 Var 2 中。但是,由于大多数个体都是相同的,我想将字母“B”添加到列名称将这些新的指标列与现有的列分开。怎么可能做到这一点?

Layer       Grade       Players             Var 2            NYAL 08 NYAL 27 NYAL 08B NYAL 27B    
Top           A         NYAL 08; NYAL 27   NYAL 08; MAAC 48    1       1      1       0
Bottom        D         MAAC 27; MAAC 45   NYAL 27             0       0      0       1
Middle        B         NYAL 08; MAAC 48   NYAL 27; MAAC 22    0       0      0       1

【问题讨论】:

    标签: r dataframe


    【解决方案1】:

    基于示例显示

    library(qdapTools)
    #players_out <- mtabulate(strsplit(df1$Players, ";\\s+"))
    var2_out <- mtabulate(strsplit(df1$Var2, ";\\s+"))
    nm1 <- intersect(names(players_out), names(df1)[-(1:4)])
    df1[paste0(nm1, "B")] <- var2_out[nm1]
    

    -输出

    df1
    #    Layer Grade          Players             Var2 NYAL 08 NYAL 27 NYAL 08B NYAL 27B
    #1    Top     A NYAL 08; NYAL 27 NYAL 08; MAAC 48       1       1        1        0
    #2 Bottom     D MAAC 27; MAAC 45          NYAL 27       0       0        0        1
    #3 Middle     B NYAL 08; MAAC 48 NYAL 27; MAAC 22       0       0        0        1
    

    数据

    df1 <- structure(list(Layer = c("Top", "Bottom", "Middle"), Grade = c("A", 
    "D", "B"), Players = c("NYAL 08; NYAL 27", "MAAC 27; MAAC 45", 
    "NYAL 08; MAAC 48"), Var2 = c("NYAL 08; MAAC 48", "NYAL 27", 
    "NYAL 27; MAAC 22"), `NYAL 08` = c(1L, 0L, 0L), `NYAL 27` = c(1L, 
    0L, 0L)), row.names = c(NA, -3L), class = "data.frame")
    

    【讨论】:

      【解决方案2】:

      基本 R 选项

      u <- t(sapply(strsplit(df$Var2,";\\s+"),function(v) +sapply(tail(names(df),2),`%in%`, v)))
      df <- cbind(df,`colnames<-`(u,paste0(colnames(u),"B")))
      

      给了

         Layer Grade          Players             Var2 NYAL 08 NYAL 27 NYAL 08B
      1    Top     A NYAL 08; NYAL 27 NYAL 08; MAAC 48       1       1        1
      2 Bottom     D MAAC 27; MAAC 45          NYAL 27       0       0        0
      3 Middle     B NYAL 08; MAAC 48 NYAL 27; MAAC 22       0       0        0
        NYAL 27B
      1        0
      2        1
      3        1
      

      数据

      > dput(df)
      structure(list(Layer = c("Top", "Bottom", "Middle"), Grade = c("A", 
      "D", "B"), Players = c("NYAL 08; NYAL 27", "MAAC 27; MAAC 45",
      "NYAL 08; MAAC 48"), Var2 = c("NYAL 08; MAAC 48", "NYAL 27",
      "NYAL 27; MAAC 22"), `NYAL 08` = c(1L, 0L, 0L), `NYAL 27` = c(1L,
      0L, 0L)), row.names = c(NA, -3L), class = "data.frame")
      

      【讨论】:

        猜你喜欢
        • 2019-02-21
        • 1970-01-01
        • 2013-04-28
        • 2016-06-24
        • 2021-06-28
        • 1970-01-01
        • 2023-03-13
        • 1970-01-01
        • 2016-03-27
        相关资源
        最近更新 更多