【发布时间】:2018-06-04 17:44:45
【问题描述】:
我正在尝试从以下向量中提取一些变量名称和数字并将它们存储到两个新变量中:
unique_strings <- c("PM_1_PMS5003_S_Avg", "PM_2_5_PMS5003_S_Avg", "PM_10_PMS5003_S_Avg",
"PM_1_PMS5003_A_Avg", "PM_2_5_PMS5003_A_Avg", "PM_10_PMS5003_A_Avg",
"PNC_0_3_PMS5003_Avg", "PNC_0_5_PMS5003_Avg", "PNC_1_0_PMS5003_Avg",
"PNC_2_5_PMS5003_Avg", "PNC_5_0_PMS5003_Avg", "PNC_10_0_PMS5003_Avg",
"PM_1_PMS7003_S_Avg", "PM_2_5_PMS7003_S_Avg", "PM_10_PMS7003_S_Avg",
"PM_1_PMS7003_A_Avg", "PM_2_5_PMS7003_A_Avg", "PM_10_PMS7003_A_Avg",
"PNC_0_3_PMS7003_Avg", "PNC_0_5_PMS7003_Avg", "PNC_1_0_PMS7003_Avg",
"PNC_2_5_PMS7003_Avg", "PNC_5_0_PMS7003_Avg", "PNC_10_0_PMS7003_Avg"
)
我想为第一个变量提取PMS 之前的每个字符。这包括带有PM 或PNC 的字符串,以及下划线和数字。我想将这些结果存储到一个名为pollutant 的变量中。
期望的输出:
unique(pollutant)
[1] "PM_1" "PM_2_5" "PM_10" "PNC_0_3" "PNC_0_5" "PNC_1_0" "PNC_2_5" "PNC_5_0" "PNC_10"
我想为第二个变量提取 PMS 之后的所有内容。
为此,我首先尝试从每个字符串中仅提取型号(以003 结尾的四位数字),但是,在提取中也包含A_Avg 或S_Avg 会很有用.
这是我的第一次尝试:
model_id <- str_extract(unique_strings, "[0-9]{4,}")
unique(model_id)
[1] "5003" "7003"
我之前没有使用过正则表达式,并且在浏览现有文档/堆栈帖子时遇到了困难。感谢您的意见!
【问题讨论】:
标签: r regex split stringr stringi