【发布时间】:2023-03-30 19:39:02
【问题描述】:
我正在尝试根据特定条件删除句子中的单词列表。
假设我们有这个数据框:
responses <- c("The Himalaya", "The Americans", "A bird", "The Pacific ocean")
questions <- c("The highest mountain in the world","A cold war serie from 2013","A kiwi which is not a fruit", "Widest liquid area on earth")
df <- cbind(questions,responses)
> df
questions responses
[1,] "The highest mountain in the world" "The Himalaya"
[2,] "A cold war serie from 2013" "The Americans"
[3,] "A kiwi which is not a fruit" "A bird"
[4,] "Widest liquid area on earth" "The Pacific ocean"
以及下面的具体单词列表:
articles <- c("The","A")
geowords <- c("mountain","liquid area")
我想做两件事:
删除响应列中第一个位置的文章当与以小写字母开头的单词相邻时
删除响应列中第一个位置的文章
预期的结果应该是:
questions responses
[1,] "The highest mountain in the world" "Himalaya"
[2,] "A cold war serie from 2013" "The Americans"
[3,] "A kiwi which is not a fruit" "bird"
[4,] "Widest liquid area on earth" "Pacific ocean"
我会尝试 gsub 但没有成功,因为我对正则表达式一点也不熟悉... 我在 Stackoverflow 中进行了搜索,但没有发现真正类似的问题。如果 R 和正则表达式全明星可以帮助我,我将非常感激!
【问题讨论】:
-
你是如何在
The Amiericans中获得The的?