假设你正在处理类似的事情:
mydf <- data.frame(
V1 = c("peanut butter sandwich", "peanut butter and jam sandwich"),
V2 = c("2 slices of bread 1 tablespoon peanut butter",
"2 slices of bread 1 tablespoon peanut butter 1 tablespoon jam"))
mydf
## V1
## 1 peanut butter sandwich
## 2 peanut butter and jam sandwich
## V2
## 1 2 slices of bread 1 tablespoon peanut butter
## 2 2 slices of bread 1 tablespoon peanut butter 1 tablespoon jam
您可以先在“V2”中添加一个您不希望出现的分隔符,然后使用我的“splitstackshape”中的cSplit 来获取“长”数据集格式。
library(splitstackshape)
mydf$V2 <- gsub(" (\\d+)", "|\\1", mydf$V2)
cSplit(mydf, "V2", "|", "long")
## V1 V2
## 1: peanut butter sandwich 2 slices of bread
## 2: peanut butter sandwich 1 tablespoon peanut butter
## 3: peanut butter and jam sandwich 2 slices of bread
## 4: peanut butter and jam sandwich 1 tablespoon peanut butter
## 5: peanut butter and jam sandwich 1 tablespoon jam
以下内容不足以单独发布作为答案,因为它们是@Jota 方法的变体,但为了完整起见,我在这里分享它们:
strsplit 在“data.table”中
list 的拆分会自动平展为单列....
library(data.table)
as.data.table(mydf)[, list(
V2 = unlist(strsplit(as.character(V2), '\\s(?=\\d)', perl=TRUE))), by = V1]
“dplyr”+“tidyr”
您可以使用“tidyr”中的unnest 将列表列展开为长格式....
library(dplyr)
library(tidyr)
mydf %>%
mutate(V2 = strsplit(as.character(V2), " (?=\\d)", perl=TRUE)) %>%
unnest(V2)