【问题标题】:Selecting multiple columns based on col names R根据列名R选择多列
【发布时间】:2020-05-11 09:46:57
【问题描述】:

我想创建一个myID 数据框,其中包含merged 的列SNP, trait, protein, protein.x, protein.y, metabolite, metabolite.x, metabolite.y。以下代码有效:

myID <- subset(merged, select = c(SNP, trait, protein, protein.x, protein.y, metabolite, metabolite.x, metabolite.y))  

但是,我希望这可以使用一段不需要写出所有列名的代码(因为我稍后需要选择 100 多列)。

starts_with("SNP","trai","protei","metabolit") 这样的东西会很完美(这行不通)

我的数据:

dput(merged[1:4,])
structure(list(SNP = c("rs1001567", "rs1002076", "rs1002365", 
"rs1002480"), trait = c("complex", "complex", "complex", "complex"
), beta_g = c(-0.0021, 2e-04, -0.0141, -0.0082), df_g = c(699247, 709315, 708183, 695786
), protein.x = c("IL16", "IL16", 
"IL16", "IL16"
), beta_p.x = c(-0.0874, 0.0335, -0.0268, 0.0923), df_p.x = c(3392, 3392, 3392, 3392), 
    protein.y = c("IL18", "IL18", 
    "IL18", "IL18"
    ), beta_p.y = c(-0.0542, 0.0257, 0.0124, 0.0846), df_p.y = c(3392, 3392, 3392, 3392
    ), protein = c("IL6", "IL6", 
    "IL6", "IL6"
    ), beta_p = c(0.0323, 0.0371, -0.0368, 6e-04), df_p = c(3392, 3392, 3392, 3392), 
    metabolite.x = c("Ile", "Ile", 
    "Ile", "Ile"), beta_m.x = c(0.006018, 
    -0.01177, 0.008134, 0.001025), df_m.x = c(21354, 23576, 21355, 
    23577), metabolite.y = c("Leu", "Leu", 
    "Leu", "Leu"), beta_m.y = c(0.010107, 
    -0.000184, 0.017055, -0.000436), df_m.y = c(21306, 23528, 21306, 
    23530), metabolite = c("Val", "Val", 
    "Val", "Val"), beta_m = c(0.007908, 
    -0.002337, 0.01489, 0.0028), df_m = c(21478, 23700, 21479, 
    23704)), .internal.selfref = <pointer: (nil)>, sorted = "SNP", row.names = c(NA, 
4L), class = c("data.table", "data.frame"))

【问题讨论】:

    标签: r dataframe select subset


    【解决方案1】:

    对 Base R 中的代码稍作修改怎么样:

    subset(merged, select = grep("^SNP|^rai|^protei|^metabolit", names(merged), value = T))  
    
             SNP   trait protein.x protein.y protein metabolite.x metabolite.y metabolite
    1: rs1001567 complex      IL16      IL18     IL6          Ile          Leu        Val
    2: rs1002076 complex      IL16      IL18     IL6          Ile          Leu        Val
    3: rs1002365 complex      IL16      IL18     IL6          Ile          Leu        Val
    4: rs1002480 complex      IL16      IL18     IL6          Ile          Leu        Val
    

    【讨论】:

      【解决方案2】:

      由于您似乎使用data.table,因此您可以使用.SD 来选择列。首先,使用正则表达式选择列名(可能是更优雅的方式)

      cols <- colnames(merged)[grepl(pattern = paste0("^(", paste(c("SNP","trai","protein","metabolit"), collapse = "|"), ")"), colnames(merged))]
      

      然后选择您的列

      merged[,.SD,
               .SDcols = cols]
      SNP   trait protein.x protein.y protein metabolite.x metabolite.y metabolite
      1: rs1001567 complex      IL16      IL18     IL6          Ile          Leu        Val
      2: rs1002076 complex      IL16      IL18     IL6          Ile          Leu        Val
      3: rs1002365 complex      IL16      IL18     IL6          Ile          Leu        Val
      4: rs1002480 complex      IL16      IL18     IL6          Ile          Leu        Val
      

      【讨论】:

        猜你喜欢
        • 2018-08-11
        • 1970-01-01
        • 2016-10-03
        • 1970-01-01
        • 1970-01-01
        • 2021-09-08
        • 1970-01-01
        • 2014-10-04
        • 1970-01-01
        相关资源
        最近更新 更多