【发布时间】:2014-02-06 15:27:05
【问题描述】:
我有以下文本文件
[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)
[01/29/14 16:42:57, 10.100.120.120, unknown]: spatial_monitor: Alan left Conference Room (Zone Role contains Person role)
[01/29/14 16:43:00, 10.100.120.120, unknown]: spatial_monitor: Kurt entered Conference Room (Computer desk contains Person role)
[01/29/14 16:43:02, 10.100.120.120, unknown]: spatial_monitor: Kurt left Conference Room (Computer desk contains Person role)
[01/29/14 16:43:03, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)
[01/29/14 16:43:08, 10.100.120.120, unknown]: spatial_monitor: Alan left Conference Room (Zone Role contains Person role)
[01/29/14 16:46:07, 10.100.120.120, unknown]: spatial_monitor: Fred entered Conference Room (Zone Role contains Person role)
[01/29/14 16:46:08, 10.100.120.120, unknown]: spatial_monitor: Fred left Conference Room (Zone Role contains Person role)
我正在尝试使用 R 中的 str_extract(在库 stringr 中)来提取位置的名称(上面示例中的“会议室”)。逻辑是拉出跟在“entered”或“left”之后的字符串部分。为此,我有以下正则表达式
(?<=entered\s)[A-Z][a-z]+\s[A-Z][a-z]+
这在 Notepad++ 中运行良好,但是当我将它嵌入到 R 中时,我收到以下错误
> tt <- "[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)"
> str_extract(tt, '(?<=entered\\s)[A-Z][a-z]+\\s[A-Z][a-z]+')
Error in regexpr("(?<=entered\\s)[A-Z][a-z]+\\s[A-Z][a-z]+", "[01/29/14 16:42:55, 10.100.120.120, unknown]: spatial_monitor: Alan entered Conference Room (Zone Role contains Person role)", :
invalid regular expression '(?<=entered\s)[A-Z][a-z]+\s[A-Z][a-z]+', reason 'Invalid regexp'
其他答案告诉我lookahead and lookbehind only work with Perl。所以问题是如何使用 str_extract 启用 Perl?或者有没有更好的方法来做到这一点?提前致谢。
【问题讨论】:
-
这工作并且不使用前瞻/后瞻。将要提取的部分用括号括起来,如图:
library(gsubfn); strapplyc(tt, 'entered\\s([A-Z][a-z]+\\s[A-Z][a-z]+)', simplify = TRUE)