R从data.table列中搜索大写单词的子集字符串答案

【问题标题】：R search subset string from data.table column for Capitalized wordsR从data.table列中搜索大写单词的子集字符串
【发布时间】：2021-03-24 03:51:21
【问题描述】：

我有一个带有“消息”列的 data.table。
我需要提取那些带有以下模式的消息

“THISIsNotImportant：THIS_IS_IMPORTANT 消息的其余部分”

如何提取此模式中的消息并将粗体部分存储到向量中？

【问题讨论】：

标签： r string datatable subset extract

【解决方案1】：

这行得通吗：

library(dplyr)
library(stringr)
df %>% mutate(c2 = str_extract(c1, '(?<=:\\s)[A-Z_]+\\b'))
                                                           c1                  c2
1   THISIsNotImportant: THIS_IS_IMPORTANT Rest of the Message   THIS_IS_IMPORTANT
2 THISIsNotImportant: THIS_IS_UNIMPORTANT Rest of the Message THIS_IS_UNIMPORTANT

使用的数据：

df
                                                           c1
1   THISIsNotImportant: THIS_IS_IMPORTANT Rest of the Message
2 THISIsNotImportant: THIS_IS_UNIMPORTANT Rest of the Message

【讨论】：

我在某些行中仍然遇到困难，: 之后有多个空格\\s 是否只读取一个空格？
@Ram 是的，你可以试试：像这样： df %>% mutate(c2 = str_extract(c1, '(?

【解决方案2】：

str_extract(s, '\\b[A-Z_]+\\b')

【讨论】：

【解决方案3】：

在基础 R 中使用 sub：

x <- "THISIsNotImportant: THIS_IS_IMPORTANT Rest of the Message"
sub('.*:\\s([A-Z_]+).*', '\\1', x)
#[1] "THIS_IS_IMPORTANT"

要将其添加为data.table 中所有行的新列，您可以这样做：

library(data.table)
dt[, imp_message := sub('.*:\\s([A-Z_]+).*', '\\1', message)]

【讨论】：