【问题标题】:increment colnames of a data.frame by 1将 data.frame 的 colnames 增加 1
【发布时间】:2015-03-13 06:02:24
【问题描述】:

拥有一个带有 colnames 的 data.frame

nam <- c("a", paste0("a_", seq(12)))
"a" "a_1" "a_2" "a_3" "a_4" "a_5" "a_6" "a_7" "a_8" "a_9" "a_10" "a_11" "a_12"

如何将带有数字的名称的数字加 1?

预期的结果是

"a" "a_2" "a_3" "a_4" "a_5" "a_6" "a_7" "a_8" "a_9" "a_10" "a_11" "a_12" "a_13"

到目前为止,我的解决方案看起来很复杂......有没有比

更简单的方法
increment_names <- function(nam){
  where <- regexpr("\\d", nam)
  ind <- which(where > 0)
  increment <- as.numeric(substring(nam[ind], where[ind])) + 1
  substring(nam[ind], where[ind]) <- as.character(increment)
  nam
}

> increment_names(nam)
 [1] "a" "a_2" "a_3" "a_4" "a_5" "a_6" "a_7" "a_8" "a_9" "a_10" "a_11" "a_12" "a_13"

【问题讨论】:

    标签: regex r string dataframe increment


    【解决方案1】:

    regmatches解决方案:

    r <- regexpr("\\d+", nam)
    regmatches(nam, r) <- as.numeric(regmatches(nam, r)) + 1
    nam
    # [1] "a"    "a_2"  "a_3"  "a_4"  "a_5"  "a_6"  "a_7"  "a_8"  ...
    

    【讨论】:

    • 太棒了,没想到要用这个的替换形式。
    【解决方案2】:

    你可以试试“ore”包,用它你的替换可以是函数,像这样:

    nam <- c("a", paste0("a_", seq(12)))
    nam
    library(ore)
    ore.subst("-?\\d+", function(x) as.numeric(x) + 1, nam, all = TRUE)
    #  [1] "a"    "a_2"  "a_3"  "a_4"  "a_5"  "a_6"  "a_7"  "a_8"  "a_9" 
    # [10] "a_10" "a_11" "a_12" "a_13"
    

    这与“gsubfn”包的功能相似,但(至少在这种情况下)效率要高得多。以下是一些基准:

    library(stringi)
    set.seed(1)
    nam <- stri_rand_strings(10000, 5, pattern = "[A-J0-9]")
    
    f_ORE <- function(invec = nam) {
      ore.subst("-?\\d+", function(x) as.numeric(x) + 1, invec, all = TRUE)
    } 
    
    f_GSUBFN <- function(invec = nam) {
      gsubfn("\\d+", function(x) as.numeric(x) + 1, invec)
    }
    
    f_BASE <- function(invec = nam) {
      r <- regexpr("\\d+", invec)
      regmatches(invec, r) <- as.numeric(regmatches(invec, r))+1
      invec
    }
    
    system.time(f_GSUBFN())
    #    user  system elapsed 
    #    5.48    0.01    5.50 
    
    library(microbenchmark)
    microbenchmark(f_BASE(), f_ORE())
    # Unit: milliseconds
    #      expr       min        lq      mean    median        uq      max neval
    #  f_BASE() 141.79743 149.58914 161.49041 152.81038 162.10550 357.6483   100
    #   f_ORE()  57.35309  59.58433  65.84678  60.92218  68.40062 116.7714   100
    

    请注意,虽然“ore”方法和“gsubfn”方法区​​域相同,但它们似乎与基本 R 方法略有不同。

    考虑:

    > identical(f_ORE(), f_GSUBFN())
    [1] TRUE
    
    ## Edge case...
    > nam[988]
    [1] "0G019"
    > f_ORE()[988]     ## 019 becomes 20 (without the leading zero)
    [1] "1G20"
    > f_GSUBFN()[988]  ## Same
    [1] "1G20"
    > f_BASE()[988]    ## This seems off...
    [1] "1G019"
    

    【讨论】:

      【解决方案3】:

      使用gsubfn 包你可以做一些简单的事情

      library(gsubfn) 
      gsubfn("\\d+", function(x) as.numeric(x) + 1, nam)
      ## [1] "a"    "a_2"  "a_3"  "a_4"  "a_5"  "a_6"  "a_7"  "a_8"  "a_9"  "a_10" "a_11" "a_12" "a_13"
      

      这适用于 any 模式,您不需要假设上述“nonnumbers_numbers”模式,例如

      (nam <- c("a", paste0(seq(12), "_a")))
      ## [1] "a"    "1_a"  "2_a"  "3_a"  "4_a"  "5_a"  "6_a"  "7_a"  "8_a"  "9_a"  "10_a" "11_a" "12_a"
      gsubfn("\\d+", function(x) as.numeric(x) + 1, nam)
      ## [1] "a"    "2_a"  "3_a"  "4_a"  "5_a"  "6_a"  "7_a"  "8_a"  "9_a"  "10_a" "11_a" "12_a" "13_a"
      

      【讨论】:

      • (+1) 不需要“nonnumbers_numbers”模式。因为我有这种模式,所以我使用 base 解决方案
      • 有趣的 gsubfn 将 NA 转换为 "" 而不是 "NA" as.character。 +1 提醒我gsubfn
      • @BrodieG,仅供参考,我正在发布与您完全相同的解决方案,但后来看到您的编辑并意识到我们将发布相同的内容,所以感谢我或者想出了这个宝石:)
      • 几天前你已经对此进行了投票,但我认为你可能对我最近遇到的一个包感兴趣,它与你在这里所做的功能相似,但是方式更快。见my answer
      • @AnandaMahto 有趣 (+1) 你对它进行了基准测试吗?你知道是什么让它更有效率吗?
      【解决方案4】:

      只要您的模式是“nonnumbers_numbers”,就会这样做:

      nums <- as.numeric(gsub("[^0-9]", "", nam))
      nam[!is.na(nums)] <- paste0(gsub("[0-9]", "", nam), nums + 1)[!is.na(nums)]
      

      生产:

       [1] "a"    "a_2"  "a_3"  "a_4"  "a_5"  "a_6"  "a_7"  "a_8"  "a_9"  "a_10" "a_11" "a_12"
       [13] "a_13"
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2011-09-12
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多