【发布时间】:2018-02-11 02:03:17
【问题描述】:
我有一个文本字符串向量,如:
Sentences <- c("I would have gotten the promotion, but TEST my attendance wasn’t good enough.Let me help you with your baggage.",
"Everyone was busy, so I went to the movie alone. Two seats were vacant.",
"TEST Rock music approaches at high velocity.",
"I am happy to take your TEST donation; any amount will be greatly TEST appreciated.",
"A purple pig and a green donkey TEST flew a TEST kite in the middle of the night and ended up sunburnt.",
"Rock music approaches at high velocity TEST.")
我想提取n(例如:三个)单词(一个单词的特征是在字符前后都有一个空格)AROUND (即之前和之后)特定术语(例如,“TEST”)。 重要提示:Several matches 应该是allowed(即,如果特定术语出现多次,则预期的解决方案应涵盖这些情况)。
结果可能是这样的(格式可以改进):
S1 <- c(before = "the promotion, but", after = "my attendance wasn’t")
S2 <- c(before = "", after = "")
S3 <- c(before = "", after = "Rock music approaches")
S4a <- c(before = "to take your", after = "donation; any amount")
S4b <- c(before = "will be greatly", after = "appreciated.")
S5a <- c(before = "a green donkey", after = "flew a TEST")
S5b <- c(before = "TEST flew", after = "kite in the")
S6 <- c(before = "at high velocit", after = "")
我该怎么做?我已经找到了其他 psot,它们要么是 only for one-case-matches,要么与 fixed sentence structures 相关。
【问题讨论】:
标签: r text text-mining tm