【问题标题】:R - sequentially replace string using a data frame of stringsR - 使用字符串数据框顺序替换字符串
【发布时间】:2016-12-29 12:01:50
【问题描述】:

我正在尝试构建一个函数 F 来替换 stings 'df' 数据框中的目标字符串 'str', 逐列逐行,根据列名作为要替换的子串, 和列值作为替换。 结果是替换字符串的字符串向量长度“rownum” 将每个字符串的 'colnum' 替换为输出。

举个例子最能说明问题:

str <- "Hi, I am name and I am age years old! - said name "

df <- data.frame(name = c('John', 'Richard','Edward'), age =c('10','26','12'))

F(str,df)

"Hi, I am John and I am 10 years old! - said John "

"Hi, I am Richard and I am 26 years old! - said Richard "

"Hi, I am Edward and I am 12 years old! - said Edward "

我已经为这个工作写了一个函数:

F <- function(str,df)
{
  x <- str
  for(i in names(df)){
    x <- unname(mapply(gsub,i,df[[i]],x))
  }
  return(x)
}

它似乎有效,但我的印象是它既不高效也不优雅。

  1. 有没有办法避免循环?
  2. mapply 是必需品吗?
  3. 当“str”是多行文本时,F 可以工作,而不仅仅是一个 单行?

感谢您的帮助

【问题讨论】:

  • 实际上,最好使用单词边界来匹配nameage 作为整个单词。
  • 如果可能的话,我会使用str &lt;- "Hi, I am %s and I am %s years old! - said %s "; sprintf(str, df$name, df$age, df$name)
  • 您可以使用sprintf(gsub("name|age", "%s", str), df$name, df$age, df$name)以编程方式执行此操作
  • @RomanLuštrik 我已将您的建议添加为 cw-answer。希望你不要介意。
  • @h3rm4n 我不介意。我写下的一切都是开源的。 :)

标签: r regex


【解决方案1】:

也许是另一个选项,它“隐藏”了 for 循环:

library(stringi)
f <- function(str, df) 
  apply(df, 1, stri_replace_all, str=str, fixed=names(df), merge=T, vec=F)  
f("Hi, I am name and I am age years old! - said name ", df)
# [1] "Hi, I am John and I am 10 years old! - said John "      
# [2] "Hi, I am Richard and I am 26 years old! - said Richard "
# [3] "Hi, I am Edward and I am 12 years old! - said Edward "

str <- "Hi, I am name and I am age years old! - said name\n
Hi, I am name and I am age years old! - said name"
f(str, df)
# [1] "Hi, I am John and I am 10 years old! - said John\n\nHi, I am John and I am 10 years old! - said John"            
# [2] "Hi, I am Richard and I am 26 years old! - said Richard\n\nHi, I am Richard and I am 26 years old! - said Richard"
# [3] "Hi, I am Edward and I am 12 years old! - said Edward\n\nHi, I am Edward and I am 12 years old! - said Edward"

【讨论】:

    【解决方案2】:

    Mustache 是通过模板进行此类字符串操作的绝佳解决方案。对于简单的字符串/模板,我也会使用sprintf。对于更复杂的模板,我肯定会使用 Mustache。

    Mustache 的 R 实现是 whisker-package

    在您的情况下,可以这样做,例如通过:

    #install.packages("whisker")
    library(whisker)
    template <- 
    "Hi, I am {{name}} and I am {{age}} years old! - 
    said {{name}}"
    
    df <- data.frame(name = c('John', 'Richard','Edward'), age =c('10','26','12'))
    
    out <- apply(df, 1, function(x) whisker.render(template, x))
    

    给你:

    [1] "Hi, I am John and I am 10 years old! -\nsaid John"      
    [2] "Hi, I am Richard and I am 26 years old! -\nsaid Richard"
    [3] "Hi, I am Edward and I am 12 years old! -\nsaid Edward" 
    

    存在换行符 (\n) 是输出。

    您也可以使用readLines 来初步读取您的模板,而不是在代码中硬编码。

    【讨论】:

      【解决方案3】:

      最直接的方法(@RomanLustrik 在 cmets 中提出):

      str <- "Hi, I am %s and I am %s years old! - said %s "
      sprintf(str, df$name, df$age, df$name)
      

      结果:

      [1] "Hi, I am John and I am 10 years old! - said John "      
      [2] "Hi, I am Richard and I am 26 years old! - said Richard "
      [3] "Hi, I am Edward and I am 12 years old! - said Edward "  
      

      【讨论】:

        【解决方案4】:

        我们可以以编程方式完成此操作(灵感来自 @RomanLustrik 的想法

        do.call(sprintf, c(cbind(df, name2=df$name), fmt = gsub("name|age", "%s", str)))
        #[1] "Hi, I am John and I am 10 years old! - said John "    
        #[2] "Hi, I am Richard and I am 26 years old! - said Richard "
        #[3] "Hi, I am Edward and I am 12 years old! - said Edward "  
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2019-12-28
          • 1970-01-01
          • 2021-06-08
          • 1970-01-01
          • 1970-01-01
          • 2014-03-21
          • 2013-05-08
          相关资源
          最近更新 更多