【发布时间】:2021-01-04 19:46:16
【问题描述】:
我正在编写一个拼写纠正功能。我从维基百科上抓取了spelling variants 页面并将其转换为表格。我现在想将其用作查找表(拼写)并替换我的文档(skills.db)中的值。 注意:下面的技能数据框只是一个示例。忽略第二列。我将在简历处理过程中更早地进行拼写更正。简历很大,所以我想我会改为分享。
我可以使用下面的 for 循环来做到这一点,但是我想知道是否有更好的解决方案
spellings = structure(list(preferred_spellings = c("organisation", "acknowledgement",
"cypher", "anaesthesia", "analyse"), other_spellings = c(" organization",
" acknowledgment", " cipher", " anesthesia", " analyze")), row.names = c(NA,
5L), class = "data.frame")
skills.db = structure(list(skills = c("variance analysis static", "analyze kpi",
"financial analysis", "variance analysis", "organizational",
"analysis", "organize", "result analysis", "analytic", "datum analysis",
"analytics", "business analysis", "organized", "quantitative analysis",
"train need analysis", "analytic think", "analysis trial preparation",
"analyze statue", "google analytics", "service analysis", "organize individual",
"account analysis", "analyze department work", "pareto analysis train",
"organization", "ratio analysis", "statistical analysis", "project organization",
"organize client's file", "with good analytic", "nielsen analytics",
"datum analytics", "textual analytics", "social analytics", "business intelligence analytics",
"market analysis", "analyse", "analytic skill", "superb analytic",
"financial statement analysis", "credit analysis", "quick analysis",
"organizational development", "outstanding financial analytic",
"organization design", "organize conference", "business analytics",
"industry analysis", "fs analysis", "analyze", "cash flow analysis",
"investment analysis", "technical analysis bloomberg", "community organize",
"monthly financial analysis", "expense variance analysis", "stock analysis"
), level1 = c("variance analysis static", "analyze kpi", "financial analysis",
"variance analysis", "organizational", "analysis", "organize",
"result analysis", "analytic", "datum analysis", "analytics",
"business analysis", "organized", "quantitative analysis", "train need analysis",
"analytic think", "analysis trial preparation", "analyze statue",
"google analytics", "service analysis", "organize individual",
"account analysis", "analyze department work", "pareto analysis train",
"organization", "ratio analysis", "statistical analysis", "project organization",
"organize client's file", "with good analytic", "nielsen analytics",
"datum analytics", "textual analytics", "social analytics", "business intelligence analytics",
"market analysis", "analyse", "analytic skill", "superb analytic",
"financial statement analysis", "credit analysis", "quick analysis",
"organizational development", "outstanding financial analytic",
"organization design", "organize conference", "business analytics",
"industry analysis", "fs analysis", "analyze", "cash flow analysis",
"investment analysis", "technical analysis bloomberg", "community organize",
"monthly financial analysis", "expense variance analysis", "stock analysis"
)), row.names = c(49L, 65L, 77L, 82L, 155L, 190L, 215L, 244L,
246L, 260L, 287L, 300L, 311L, 323L, 349L, 356L, 378L, 386L, 447L,
607L, 622L, 664L, 686L, 766L, 824L, 832L, 895L, 922L, 928L, 949L,
1020L, 1054L, 1079L, 1080L, 1081L, 1088L, 1146L, 1158L, 1228L,
1248L, 1319L, 1366L, 1385L, 1440L, 1468L, 1475L, 1509L, 1554L,
1584L, 1606L, 1635L, 1658L, 1660L, 1696L, 1760L, 1762L, 1798L
), class = "data.frame")
for(i in 1:nrow(spellings)){
skills.db = skills.db %>% mutate(TEST = gsub(spellings$other_spellings[i], spellings$preferred_spellings[i], skills))
}
【问题讨论】:
-
我可能会从
names(spellings)[1] <- "preferred_spellings"开始;-) -
@r2evans 很好。这正是我需要这个功能的原因:D
-
另外,您真的打算在所有替换单词的前面插入一个新空格吗?
-
@r2evens 没有。我将对拼写执行 trims() 以删除多余的空格
-
有点强迫症,@marc_s? :-)