欢迎来到 StackOverflow!如果您添加更多细节,我可以完善我的答案,但这里有一些东西可以帮助您入门。
library(data.table)
## Load your csv file
#search_in <- fread("path/to/file.csv")
## In lieu of a csv, create a table of example text values to search within
search_in <- data.table(text=c(
"Visit the U.S. Capital and see Congress in action",
"Santa Clause is (a) real (movie)",
"The Marines were founded in 1775",
"What does the fox say?",
"The United States Senate is the upper chamber of the United States Congress"))
## Create a table of your search terms and the corresponding values
search_for <- data.table(
word=c("U.S. Capital", "Biden", "Congress", "Marines", "Senate", "Santa"),
value=c(-0.5, -0.6, -0.4, -0.2, -0.4, -0.03))
search_res <- merge(search_in[, id:=1L], search_for[, id:=1L], by="id", allow.cartesian=TRUE)[,
match:=text %like% word, by=.(text, word, value)][
match==TRUE, .(words=paste(sort(word), collapse=", "), value=sum(value)), by=text]
search_res <- merge(search_in[, -"id"], search_res, on="text", all.x=TRUE)
search_res
## text words value
##1: Visit the U.S. Capital and see Congress in action Congress, U.S. Capital -0.90
##2: Santa Clause is (a) real (movie) Santa -0.03
##3: The Marines were founded in 1775 Marines -0.20
##4: The United States Senate is the upper chamber of the United States Congress Congress, Senate -0.80
##5: What does the fox say? <NA> NA
创建search_res 的第一行代码连接来自search_in 和search_for 的所有行,在text 列中添加一个指示搜索词是否匹配的列,对匹配的行进行子集,然后求和值。
之后的行将原始 search_in 连接到结果中,因此您可以看到没有关键字匹配的文本行。
根据您的数据大小,这可能就足够了。如果您使用的是 Linux 或 macOS,则可以使用 grep or a similar bash solution 进行调查。