【发布时间】:2020-11-26 00:51:28
【问题描述】:
我第一次尝试编写函数。它应该将一个字符串拆分为多个字符串并将每个片段返回到一个 tibble 行。
例如,假设我有这样的数据。
nasty_entry <- tibble(ID = 1:3, Var = c("ABC", "AB", "A"))
我想要那个。
nice_entry <- tibble(ID = c(1, 1, 1, 2, 2, 3), var = c("A", "B", "C", "A", "B", "A"))
因此,我尝试使用不同类型的循环编写函数(用于练习),因为我的原始数据有大约 300 个条目。
nice_entry <- function(data, var, pattern)
#--------------------DECLARATION--------------------#
# data : The tibble containing the data to split.
# var : The variable containing the data to split.
# pattern : The pattern to use for the spliting.
if(!require(tidyverse)){install.packages("tidyverse")}
library(tidyverse)
if(!require(magrittr)){install.packages("tidyverse")}
library(magrittr)
c1 <- 0 # Reset the counter #1
c2 <- 0 # Reset the counter #2
unchanged_rows <- 0 # The number of rows that has been unchanged.
changed_rows <- 0 # The number of rows that has been changed.
new_data <- tibble() # The tibble where the data will be stored.
repeat{
c1 <- c1 +1 # Increase the counter #1 by one at each loop.
c2 <- 0 # Reset the counter #2 at each loop.
# Split the string into several strings.
splited_str <- str_split(string = data %>% select({{ var }}) %>% slice(c1), pattern = pattern) %>%
unlist()
# Add the row into the "new_data" variable if the original string hasn't been splited.
if(length(splited_str) <= 1) {
unchanged_rows <- unchanged_rows +1
new_data <- new_data %>%
bind_rows(slice(data, c1))
next
}
# Duplicate the row of the original string. It duplicates it several times according to the
# number of times the original string has been splited.
if(length(splited_str) > 1){
changed_rows <- changed_rows +1
duplicated_rows <- data %>%
slice(rep(c1, each = length(splited_str)))
# Replace each original string with the new splited strings.
while (c2 < length(splited_str)) {
c2 <- c2 +1
duplicated_rows <- duplicated_rows %>%
mutate({{ var }} = replace(x = {{ var }}, list = c2, values = splited_str[c2]))
new_data <- new_data %>%
bind_rows(slice(duplicated_rows, c2))
}
}
# Break the loop if the entire tibble has been analyse and return the "new_data" variable.
if(c1 == length(nrow(data))) {
break
return(new_data)
}
}
}
我通过在循环中使用“真实变量”尝试了相同的代码,它似乎可以工作。当我将它们纳入功能时,问题就来了。我收到此错误。
错误:找不到对象“c1”
} 错误:“}”中出现意外的“}”
} 错误:“}”中出现意外的“}”
我做错了什么?也许是索引问题?。
我也想对编码功能提出一些建议,如果有替代方案可以做到这一点。
非常感谢!
马修
【问题讨论】:
标签: r string function loops indexing