【发布时间】:2023-04-02 11:38:01
【问题描述】:
String matching to estimate similarity
上面的代码正是我要找的,除了我似乎无法弄清楚如何比较数据框中列之间的字符串(“正确”答案和“给定”答案),然后存储来自sim.per 作为同一数据框中的新列(“相似性”)。我试过了,例如,
df$similarity <- sim.per(df$answer, df$given)
df$similarity <- mapply(sim.per, df$answer, df$given)
当行为空时,后者也会导致错误,这在我的数据集中是可以接受的,应该计算为 0。
Error in str2[[1]] : subscript out of bounds
预期的输出应该是:
answer given similarity
1 Best way to waste money Instrument to waste money and time 0.6
2 Roy travels to Africa He is in Africa 0.25
3 I go to work 0
任何帮助将不胜感激!谢谢!
数据的子集:
df <- structure(list(trial = 1:10, answer = structure(c(9L, 2L, 4L, 7L, 1L, 5L, 3L, 6L, 8L, 10L), .Label = c("Best way to waste money", "He ran out of money, so he had to stop playing poker", "I go to work", "Lets all be unique together until we realise we are all the same", "Roy travels to Africa", "She borrowed the book from him many years ago and did not returned it yet", "She did her best to help him", "Students did not cheat on the test, for it was not the right thing to do", "The stranger officiates the meal", "We have a lot of rain in June"), class = "factor"), given = structure(c(10L, 3L, 6L, 8L, 4L, 2L, 1L, 7L, 9L, 5L), .Label = c("", "He is in Africa Roy", "He lost money because he had played poker", "Instrument to waste money and time", "It was raining in June", "People are unique until they try to fit in", "She borrowed the book from the library and forgot to return it", "She did her very best to help him out", "Students know not to cheat", "The guests ate the meal"), class = "factor")), class = "data.frame", row.names = c(NA, -10L))
【问题讨论】:
-
能否提供您正在使用的数据的样本子集?请使用 use dput(sample_data) 然后从控制台复制粘贴结果
-
我已编辑帖子以包含数据样本
标签: r string text-mining text-analysis