【发布时间】:2021-07-05 23:23:03
【问题描述】:
我有一个字符列需要用正则表达式分隔。以下是原始数据的示例:
data_raw <- tribble(
~census_geo,
"Division No. 1, Subd. V (SNO), Newfoundland and Labrador",
"Portugal Cove South (T), Newfoundland and Labrador",
"Division No. 1, Subd. U, Reserve (SNO), Newfoundland and Labrador")
我们要提取三列。第一个是括号前的所有内容。第二列是括号内的单词。最后一列是最后一个逗号之后的所有内容(或括号中单词之后的所有内容)。以下是干净输出的示例:
data_clean <- tribble(
~csd_name, ~csd_type, ~province,
"Division No. 1, Subd. V", "SNO", "Newfoundland and Labrador",
"Portugal Cove South", "T", "Ontario",
"Division No. 1, Subd. U, Reserve", "SNO", "Newfoundland and Labrador")
我可以用这段代码提取最后一列:
data_raw %>%
mutate(csd_type = str_extract(census_geo, pattern = "(?<=\\().*(?=\\))"))
但我无法获取其他两列。
任何帮助将不胜感激。
【问题讨论】: