【问题标题】:Conditional replacement of column name in tibble using dplyr使用 dplyr 有条件地替换 tibble 中的列名
【发布时间】:2017-09-20 13:58:15
【问题描述】:

我有以下小标题:

    df <- structure(list(gene_symbol = c("0610005C13Rik", "0610007P14Rik", 
"0610009B22Rik", "0610009L18Rik", "0610009O20Rik", "0610010B08Rik"
), foo.control.cv = c(1.16204038288333, 0.120508045270669, 0.205712615954009, 
0.504508040948641, 0.333956330117591, 0.543693011377001), foo.control.mean = c(2.66407458486012, 
187.137728870855, 142.111269303428, 16.7278587043453, 69.8602872478098, 
4.77769028710622), foo.treated.cv = c(0.905769898934564, 0.186441944401973, 
0.158552512842753, 0.551955061149896, 0.15743983656006, 0.290447431974039
), foo.treated.mean = c(2.40658723367692, 180.846795140269, 139.054032348287, 
11.8584348984435, 76.8141734599118, 2.24088124240385)), .Names = c("gene_symbol", 
"foo.control.cv", "foo.control.mean", "foo.treated.cv", "foo.treated.mean"
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
6L))

看起来像这样:

# A tibble: 6 × 5
    gene_symbol foo.control.cv foo.control.mean foo.treated.cv foo.treated.mean
*         <chr>          <dbl>            <dbl>          <dbl>            <dbl>
1 0610005C13Rik      1.1620404         2.664075      0.9057699         2.406587
2 0610007P14Rik      0.1205080       187.137729      0.1864419       180.846795
3 0610009B22Rik      0.2057126       142.111269      0.1585525       139.054032
4 0610009L18Rik      0.5045080        16.727859      0.5519551        11.858435
5 0610009O20Rik      0.3339563        69.860287      0.1574398        76.814173
6 0610010B08Rik      0.5436930         4.777690      0.2904474         2.240881

我要做的是将其中所有带有mean 的列名替换为mean_expr。导致

    gene_symbol foo.control.cv foo.control.mean_expr foo.treated.cv foo.treated.mean_expr

1 0610005C13Rik      1.1620404         2.664075      0.9057699         2.406587
2 0610007P14Rik      0.1205080       187.137729      0.1864419       180.846795
3 0610009B22Rik      0.2057126       142.111269      0.1585525       139.054032
4 0610009L18Rik      0.5045080        16.727859      0.5519551        11.858435
5 0610009O20Rik      0.3339563        69.860287      0.1574398        76.814173
6 0610010B08Rik      0.5436930         4.777690      0.2904474         2.240881

我怎样才能做到这一点?

【问题讨论】:

    标签: r dplyr tidyverse


    【解决方案1】:

    使用magritrr,您可以拥有

    library(magrittr)
    names(df)[df %>% names %>% grep(pattern = "mean")] %<>% paste0("_expr")
    df
    # A tibble: 6 x 5
      gene_symbol   foo.control.cv foo.control.mean_expr foo.treated.cv foo.treated.mean_expr
    * <chr>                  <dbl>                 <dbl>          <dbl>                 <dbl>
    1 0610005C13Rik          1.16                   2.66          0.906                  2.41
    2 0610007P14Rik          0.121                187.            0.186                181.  
    3 0610009B22Rik          0.206                142.            0.159                139.  
    4 0610009L18Rik          0.505                 16.7           0.552                 11.9 
    5 0610009O20Rik          0.334                 69.9           0.157                 76.8 
    6 0610010B08Rik          0.544                  4.78          0.290                  2.24
    

    【讨论】:

    • 我不明白为什么投反对票,请提供反馈。
    【解决方案2】:

    使用当前版本的 dplyr,您可以使用 rename_at:

    library(dplyr)
    
    df %>% rename_at(vars(contains('mean')), funs(sub('mean', 'mean_expr', .)))
    #> # A tibble: 6 × 5
    #>     gene_symbol foo.control.cv foo.control.mean_expr foo.treated.cv
    #> *         <chr>          <dbl>                 <dbl>          <dbl>
    #> 1 0610005C13Rik      1.1620404              2.664075      0.9057699
    #> 2 0610007P14Rik      0.1205080            187.137729      0.1864419
    #> 3 0610009B22Rik      0.2057126            142.111269      0.1585525
    #> 4 0610009L18Rik      0.5045080             16.727859      0.5519551
    #> 5 0610009O20Rik      0.3339563             69.860287      0.1574398
    #> 6 0610010B08Rik      0.5436930              4.777690      0.2904474
    #> # ... with 1 more variables: foo.treated.mean_expr <dbl>
    

    真的,您也可以使用rename_all,因为不匹配的名称无论如何都不会受到影响。此外,对于.funs,您可以使用 quosure 或任何可以被rlang::as_function 强制转换为函数的东西,因此您可以使用 purrr 样式的表示法:

    df %>% rename_all(~sub('mean', 'mean_expr', .x))
    

    由于数据框是一个列表,purrrset_names 可以做同样的事情:

    library(purrr)    # or library(tidyverse)
    
    df %>% set_names(~sub('mean', 'mean_expr', .x))
    

    所有返回相同的东西。

    【讨论】:

      【解决方案3】:

      另一个选项是dplyr::select_all():

      df %>% select_all(~gsub("mean", "mean_expr", .))
      

      【讨论】:

        【解决方案4】:

        另一种选择是在rename_at 中使用paste(使用dplyr 的devel 版本)

        library(dplyr)
        df %>%
            rename_at(vars(matches('mean')), funs(sprintf('%s_expr', .)))
        # A tibble: 6 × 5
        #    gene_symbol foo.control.cv foo.control.mean_expr foo.treated.cv foo.treated.mean_expr
        #*         <chr>          <dbl>                 <dbl>          <dbl>                 <dbl>
        #1 0610005C13Rik      1.1620404              2.664075      0.9057699              2.406587    
        #2 0610007P14Rik      0.1205080            187.137729      0.1864419            180.846795
        #3 0610009B22Rik      0.2057126            142.111269      0.1585525            139.054032
        #4 0610009L18Rik      0.5045080             16.727859      0.5519551             11.858435
        #5 0610009O20Rik      0.3339563             69.860287      0.1574398             76.814173
        #6 0610010B08Rik      0.5436930              4.777690      0.2904474              2.240881
        

        或使用rename_if

        df %>%
           rename_if(grepl("mean", names(.)), funs(sprintf("%s_expr", .)))
        

        【讨论】:

          【解决方案5】:

          这是一个非 dplyr 基础的 R 方法:

          names(df) <- sub("mean$", "mean_expr", names(df))
          # or names(df) <- sub("mean", "mean_expr", names(df)) if the mean doesn't have to be at the 
          # end of the string
          
          names(df)
          #[1] "gene_symbol"           "foo.control.cv"        "foo.control.mean_expr"
          #[4] "foo.treated.cv"        "foo.treated.mean_expr"
          

          如果你想让它成为 pipe 的一部分,你可以使用 setNames 函数:

          df %>% setNames(sub("mean", "mean_expr", names(.))) %>% names(.)
          #[1] "gene_symbol"           "foo.control.cv"        "foo.control.mean_expr"
          #[4] "foo.treated.cv"        "foo.treated.mean_expr"
          

          【讨论】:

            猜你喜欢
            • 2016-06-07
            • 2020-04-03
            • 1970-01-01
            • 2020-11-17
            • 1970-01-01
            • 2018-10-18
            • 2022-08-17
            • 1970-01-01
            • 1970-01-01
            相关资源
            最近更新 更多