这是一个稍微更通用的解决方案,但它可以很容易地用于明确使用“香蕉”。
V1 <- c("Apple", "OrangeBanana", "BananaBananaBanana", "Watermelon", "GrapeBanana")
首先,让我们通过查找所有不是单词边界的大写字母并将它们替换为空格和大写字母来拆分它们:
splits <- gsub("(?:\\B)([[:upper:]])"," \\1" , V1, perl=TRUE)
[1] "Apple" "Orange Banana" "Banana Banana Banana" "Watermelon" "Grape Banana"
然后按空格字符分割,从列表转换为向量:
unlist(strsplit(splits, " "))
[1] "Apple" "Orange" "Banana" "Banana" "Banana" "Banana" "Watermelon" "Grape" "Banana"
或者在一行中:
unlist(strsplit(gsub("(?:\\B)([[:upper:]])"," \\1" , V1, perl=TRUE), " "))
编辑:对于明确与“香蕉”一起使用的正则表达式:
gsub("(?:\\B)(Banana)"," \\1" , V1, perl=TRUE)