【发布时间】:2020-11-05 01:27:45
【问题描述】:
我正在 R 中编写一个函数 (NextWordPrediction) 来预测给定一些单词的下一个单词。基本结构如下:
- 如果 dat 中存在输入,则
nrow(dat) != 0返回输入并回答 - 如果输入不存在,
nrow(dat) == 0调用递归并尝试输入 1(例如,如果输入是“hello great world”,则尝试“great world”,依此类推,直到 nrownrow(dat) != 0 - 如果在第 2 步之后
nrow(dat) == 0返回字符串"Word not in dictionary. We added this to our database!"并将原始输入添加到数据集
这里是完整的代码:
NextWordPrediction <- function(input) {
dat <- training %>%
filter(., N_gram == str_count(input, "\\S+") + 1) %>%
filter(grepl(paste("^", tolower(str_squish(input)), sep = ""), Word)) %>%
arrange(., desc(Prop))
if (nrow(dat) != 0) {
assign("training",
training %>%
mutate(Frequency = ifelse(Word == input &
N_gram == str_count(input, "\\S+"),
Frequency + 1,
Frequency)) %>%
group_by(., N_gram) %>%
mutate(., Prop = Frequency/ sum(Frequency)) %>%
data.frame(.),
envir = .GlobalEnv)
val <- dat$Word_to_Predict[1]
ans <- paste(str_squish(input), val)
return(list(ans, head(dat,5)))
} else if (nrow(dat) == 0 & word(input, 1) != "NA") {
input_1 <- Reduce(paste, word(input, 2:str_count(input,"\\S+")))
return(NextWordPrediction(input_1))
} else if (nrow(dat) == 0 & word(input, 1) == "NA") {
assign("training",
training %>%
add_row(., Word = tolower(input), Frequency = 1, N_gram = str_count(input, "\\S+")),
envir = .GlobalEnv)
ans <- paste("Word not in dictionary. We added this to our database!")
return(ans)
}
}
我遇到的问题发生在第 2 步和第 3 步之间。如果在递归调用后未找到输入,则添加到数据库的输入是 input-1(“伟大的世界”),我想要原来的输入(“你好伟大的世界”)。这是我第一次尝试实现递归,想了解我代码中的错误。
谢谢:)
更新可重现:
library(dplyr); library(stringr)
training <- data.frame(Word = c("hello", "she was great", "this is", "long time ago in"), Frequency = c(4, 3, 10, 1),
N_gram = c(1, 3, 2, 4), Prop = c(4/18, 3/18, 10/18, 1/18), Word_to_Predict = c(NA, "great", "is", "in"))
NextWordPrediction("she was") ## returns "she was" & "great"
NextWordPrediction("hours ago") ## returns "hours ago" & "in"
NextWordPrediction("words not in data") ## returns "Word not in dictionary. We added this to our database!" after trying "not in data", "in data" and adds "words not in data" to dataset
【问题讨论】:
-
将删除的单词连接 (
paste) 到递归调用的返回值。这意味着您需要存储第一个单词,对剩余的单词进行递归,然后在返回之前再次连接 word1。