【问题标题】:Repeating certain part of string conditionally有条件地重复字符串的某些部分
【发布时间】:2020-11-24 18:57:06
【问题描述】:

我想在]; 之间重复字符串的某些部分,作为[] 中前面由; 分隔的元素的数量。所以[A1, AB11; A2, AB22] I1, C1 的期望输出是[A1, AB11] I1, C1; [A2, AB22] I1, C1。任何开始的提示。谢谢

df1 <-
  data.frame(
   String = c(
    "[A1, AB11; A2, AB22] I1, C1; [A3, AB33] I3, C1"
  , "[A4, AB44] I4, C4; [A5, AB55; A6, AB66; A7, AB77] I7, C7"
  )
  )
df1

                                                    String
1           [A1, AB11; A2, AB22] I1, C1; [A3, AB33] I3, C1
2 [A4, AB44] I4, C4; [A5, AB55; A6, AB66; A7, AB77] I7, C7


df2 <-
  data.frame(
   String = c(
    "[A1, AB11] I1, C1; [A2, AB22] I1, C1; [A3, AB33] I3, C1"
  , "[A4, AB44] I4, C4; [A5, AB55] I7, C7;[A6, AB66] I7, C7; [A7, AB77] I7, C7"
  )
  )

df2

                                                                     String
1                   [A1, AB11] I1, C1; [A2, AB22] I1, C1; [A3, AB33] I3, C1
2 [A4, AB44] I4, C4; [A5, AB55] I7, C7;[A6, AB66] I7, C7; [A7, AB77] I7, C7

【问题讨论】:

  • 你想重复多少次
  • 作为[];分隔的元素个数。
  • 好的,我明白了

标签: r dplyr tidyverse stringr tidytable


【解决方案1】:

这是一个基本的 R 解决方案:

sapply(strsplit(paste0(df1$String, ";"), "\\[|\\]"), function(x) {
  for(i in seq_along(x))
  {
    if(i %% 2 == 0) {
      x[i] <- paste0("[", gsub(";", paste0("]", x[i + 1], " ["), x[i]), "]")
    }
  }
  paste(x, collapse = "")
})
#> [1] "[A1, AB11] I1, C1;  [ A2, AB22] I1, C1; [A3, AB33] I3, C1;"                   
#> [2] "[A4, AB44] I4, C4; [A5, AB55] I7, C7; [ A6, AB66] I7, C7; [ A7, AB77] I7, C7;"

【讨论】:

    【解决方案2】:

    我过去曾尝试过类似的事情,并认为使用 glueunglue 包进行调整可能会很有趣。

    最初的strsplit 用分号分隔,忽略括号之间的分号。

    unglue 将针对每一行区分括号之间重复的内容以及括号外附加的内容。

    library(glue)
    library(unglue)
    library(purrr)
    
    my_fun <- function(inside, outside) {
      glue("[{inside}] {outside}")
    }
    
    sapply(strsplit(df1$String, '\\[[^]]*\\](*SKIP)(*F)|;\\s', perl = T), function(x) {
      ud <- unglue_data(x, patterns = "[{Inside}] {Outside}")
      ud_in <- map(ud[['Inside']], strsplit, split = "; ")
      ud_map <- map(seq_along(ud[['Inside']]), function(y) {
        map2(unlist(ud_in[y]), ud[['Outside']][y], my_fun)
      })
      paste(unlist(ud_map), collapse = '; ')
    })
    

    输出

    [1] "[A1, AB11] I1, C1; [A2, AB22] I1, C1; [A3, AB33] I3, C1"                   
    [2] "[A4, AB44] I4, C4; [A5, AB55] I7, C7; [A6, AB66] I7, C7; [A7, AB77] I7, C7"
    

    【讨论】:

      【解决方案3】:

      不是最整洁的解决方案,但它使用的是 stringr

      str_split(df1$String, ";(?= *\\[)") %>%
        map(str_match, "\\[(.+?)\\] (.+)") %>%
         map( ~ paste(unlist(map2(paste0(str_split(.x[,2], "; ?")), .x[,3], ~ paste0("[", .x,"] ",.y ))), collapse="; ")) 
      

      更好的解决方案:

      as_tibble(df1) %>%
        mutate(splits=str_split(String, "; *(?=\\[)")) %>%
         unnest_longer(col=splits) %>%
          mutate(splits=map(str_split(splits,"\\[|\\] ?"), str_split, "; ?"))  %>%
           unnest_wider(splits) %>%
            mutate(val=map2(...2, ...3, ~ paste0("[", .x ,"] ", .y, collapse="; ") )) %>%
             group_by(String) %>%
              summarise(val=paste0(val, collapse="; "))
      # A tibble: 2 x 2
        String                             val                                        
        <fct>                              <chr>
      1 [A1, AB11; A2, AB22] I1, C1; [A3,… [A1, AB11] I1, C1; [A2, AB22] I1, C1; [A3, AB33] I3, C1
      2 [A4, AB44] I4, C4; [A5, AB55; A6,… [A4, AB44] I4, C4; [A5, AB55] I7, C7; [A6, AB66] I7, C7; [A7, AB77] I7, C7
      

      【讨论】:

        猜你喜欢
        • 2013-02-17
        • 1970-01-01
        • 2012-01-26
        • 2014-09-07
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多