功能：拆分字符串，复制行并用拆分的字符串替换原始字符串答案

【问题标题】：function : splitting strings, duplicating rows and replacing the original string by the splitted ones功能：拆分字符串，复制行并用拆分的字符串替换原始字符串
【发布时间】：2020-11-26 00:51:28
【问题描述】：

我第一次尝试编写函数。它应该将一个字符串拆分为多个字符串并将每个片段返回到一个 tibble 行。

例如，假设我有这样的数据。

nasty_entry <- tibble(ID = 1:3, Var = c("ABC", "AB", "A"))

我想要那个。

nice_entry <- tibble(ID = c(1, 1, 1, 2, 2, 3), var = c("A", "B", "C", "A", "B", "A"))

因此，我尝试使用不同类型的循环编写函数（用于练习），因为我的原始数据有大约 300 个条目。

nice_entry <- function(data, var, pattern)
  
  #--------------------DECLARATION--------------------#   
  
  # data : The tibble containing the data to split.
  # var : The variable containing the data to split.
  # pattern : The pattern to use for the spliting.
  
  if(!require(tidyverse)){install.packages("tidyverse")}
  library(tidyverse)
  if(!require(magrittr)){install.packages("tidyverse")}
  library(magrittr)
  
  c1 <- 0 # Reset the counter #1
  c2 <- 0 # Reset the counter #2
  unchanged_rows <- 0 # The number of rows that has been unchanged.
  changed_rows <- 0 # The number of rows that has been changed.
  new_data <- tibble() # The tibble where the data will be stored.
  
  repeat{
    c1 <- c1 +1 # Increase the counter #1 by one at each loop.
    c2 <- 0 # Reset the counter #2 at each loop.

    # Split the string into several strings.
    splited_str <- str_split(string = data %>% select({{ var }}) %>% slice(c1), pattern = pattern) %>% 
                   unlist()
    
    # Add the row into the "new_data" variable if the original string hasn't been splited.
    if(length(splited_str) <= 1) {
      unchanged_rows <- unchanged_rows +1
      new_data <- new_data %>% 
                  bind_rows(slice(data, c1))
      next
    }
    
    # Duplicate the row of the original string. It duplicates it several times according to the 
    # number of times the original string has been splited.
    if(length(splited_str) > 1){
      changed_rows <- changed_rows +1
      duplicated_rows <- data %>% 
                         slice(rep(c1, each = length(splited_str)))
    
      # Replace each original string with the new splited strings.
      while (c2 < length(splited_str)) {
        c2 <- c2 +1
        duplicated_rows <- duplicated_rows %>% 
                           mutate({{ var }} = replace(x = {{ var }}, list = c2, values = splited_str[c2]))
        new_data <- new_data %>% 
                    bind_rows(slice(duplicated_rows, c2))
      }
    }
    
    # Break the loop if the entire tibble has been analyse and return the "new_data" variable.
    if(c1 == length(nrow(data))) {
      break
      return(new_data)
    }
  }
}

我通过在循环中使用“真实变量”尝试了相同的代码，它似乎可以工作。当我将它们纳入功能时，问题就来了。我收到此错误。

错误：找不到对象“c1”

} 错误：“}”中出现意外的“}”

我做错了什么？也许是索引问题？。

我也想对编码功能提出一些建议，如果有替代方案可以做到这一点。

非常感谢！

马修

【问题讨论】：

标签： r string function loops indexing

【解决方案1】：

我们可以使用separate_rows。指定正则表达式环视以匹配两个字符。正则表达式中的 . 匹配任何字符。所以，它基本上是在两个相邻字符之间分割

library(dplyr)
library(tidyr)
nasty_entry %>% 
    separate_rows(Var, sep="(?<=.)(?=.)")
# A tibble: 6 x 2
#     ID Var  
#  <int> <chr>
#1     1 A    
#2     1 B    
#3     1 C    
#4     2 A    
#5     2 B    
#6     3 A

【讨论】：

您好，感谢您的回答。您能否在一些文档中向我推荐一些关于括号“（？
@MathieuBernier 这是一个匹配两个字符的正则表达式环视。 . 匹配任何字符。所以，它在两个字符之间的连接处分裂
抱歉耽搁了。我需要探索 stringr 包。谢谢你的解决方案，效果很好！我不知道我们能做到这一点。

【解决方案2】：

这是您可能想要获得的另一种方法

library(tidyverse)
nasty_entry2 <- nasty_entry %>% 
  mutate(Var = strsplit(as.character(Var), "")) %>%
  tidyr::unnest(Var)

# A tibble: 6 x 2
#      ID Var  
#   <int> <chr>
# 1     1 A    
# 2     1 B    
# 3     1 C    
# 4     2 A    
# 5     2 B    
# 6     3 A

【讨论】：