1) dplyr/tidyr 用分号、那个数字和另一个分号替换每个数字,然后用分号和可选的周围空格分隔。
library(dplyr)
library(tidyr)
# input
df <- data.frame(V1 = c("25 Edgemont 52 Sioux County",
"57 Burke 88 Papillion-LaVista South"))
df %>%
mutate(V1 = gsub("(\\d+)", ";\\1;", V1)) %>%
separate(V1, c(NA, "No1", "Let1", "No2", "Let2"), sep = " *; *")
## No1 Let1 No2 Let2
## 1 25 Edgemont 52 Sioux County
## 2 57 Burke 88 Papillion-LaVista South
1a) read.table 我们可以使用与 (1) 中相同的 gsub,然后使用 read.table 将其分开。没有使用任何包。
read.table(text = gsub("(\\d+)", ";\\1;", df$V1), sep = ";", as.is = TRUE,
strip.white = TRUE, col.names = c(NA, "No1", "Let1", "No2", "Let2"))[-1]
## No1 Let1 No2 Let2
## 1 25 Edgemont 52 Sioux County
## 2 57 Burke 88 Papillion-LaVista South
2) strcapture 我们可以使用基础 R 中的strcapture:
proto <- list(No1 = integer(0), Let1 = character(0),
No2 = integer(0), Let2 = character(0))
strcapture("(\\d+) (.*) (\\d+) (.*)", df$V1, proto)
## No1 Let1 No2 Let2
## 1 25 Edgemont 52 Sioux County
## 2 57 Burke 88 Papillion-LaVista South
2a) read.pattern 我们可以使用 read.pattern 与 (2) 中相同的模式:
library(gsubfn)
read.pattern(text = format(df$V1), pattern = "(\\d+) (.*) (\\d+) (.*)",
col.names = c("No1", "Let1", "No2", "Let2"), as.is = TRUE, strip.white = TRUE)
## No1 Let1 No2 Let2
## 1 25 Edgemont 52 Sioux County
## 2 57 Burke 88 Papillion-LaVista South