【发布时间】:2015-11-16 21:06:09
【问题描述】:
我有一个这样的数据集:
df <- data.frame(
text = c("Update AV Line 204 to Los Angeles will be ...",
"91 Line 700 to RiversideDowntown is delayed 15 minutes ...",
"VC Line 102 to Los Angeles is delayed 1520 minutes ...",
"Update AV Line 227 to Lancaster is terminated Via Princessa ",
"RIV Line 411 to Los Angeles is delayed 10 minutes ...",
"SB Line 312 to San Bernardino is delayed up to ...",
"SB Line 327 to Los Angeles is delayed up to 15..."), stringsAsFactors = T)
df
我需要在一个新字段中提取关键词,以便最终产品看起来像这样:
> df
text LinesExtracted
1 Update AV Line 204 to Los Angeles will be ... Line 204 to Los Angeles
2 91 Line 700 to RiversideDowntown is delayed 15 minutes ... Line 700 to Riverside Downtown
3 VC Line 102 to Los Angeles is delayed 1520 minutes ... Line 102 to Los Angeles
4 UpdateAV Line 227 to Lancaster is terminated Via Princessa Line 227 to Lancaster
5 RIV Line 411 to Los Angeles is delayed 10 minutes ... Line 411 to Los Angeles
6 SB Line 312 to San Bernardino is delayed up to ... Line 312 to San Bernardino
7 SB Line 327 to Los Angeles is delayed up to 15... Line 327 to Los Angeles
谢谢。
【问题讨论】:
-
在 IMO 的正则表达式意义上,该模式不是“常规的”。如果你建立一些像“is”和“will”这样的停用词来考虑非城市字符串,你可能会得到一个很好的答案。否则,您将面临匹配错误的风险。您还可以包含要匹配的已知城市列表。
-
看看这个链接:How do I ask a good question?。看起来您是在要求某人为您编写代码 - 这不是 SO 的工作方式。如果您遇到作为特定问题的问题,请尝试解决问题。