使用 R 计算多个文件的可读性分数答案

【问题标题】：Calculate readability scores for several files with R使用 R 计算多个文件的可读性分数
【发布时间】：2016-11-29 14:23:07
【问题描述】：

我想使用 koRpus 包计算 R-3.3.2（R-Studio 3.4 for Win）中几个 txt.files 的可读性分数，并将结果保存到 excel 或 sqllite3 或 txt。现在我只能计算一个文件的可读性分数并将它们打印到控制台。我尝试使用循环目录改进代码，但无法正常工作。

library(koRpus)
library(tm)

#Loop through files
path = "D://Reports"
out.file<-""
file.names <- dir(path, pattern =".txt")
for(i in 1:length(file.names)){
  file <- read.table(file.names[i],header=TRUE, sep=";", stringsAsFactors=FALSE)
  out.file <- rbind(out.file, file)
}

#Only one file
report <- tokenize(txt =file , format = "file", lang = "en")

#SMOG-Index
results_smog <- SMOG(report)
summary(results_smog)

#Flesch/Kincaid-Index
results_fleshkin <- flesch.kincaid(report)
summary(results_fleshkin)

#FOG-Index
results_fog<- FOG(report)
summary(results_fog)

【问题讨论】：

您能否澄清一下：这些报告是否真的是以分号分隔的表格，带有第一行标题（正如您的 read.table 调用所暗示的那样），或者它们只是您尝试阅读的纯文本文档.
另外，您是否打算对所有连接在一起的文件运行koRpus 调用，就好像它只是一个大文件一样（因此您会得到一组koRpus 结果）还是您想要生成一组单独的koRpus 结果，每个文件一个？
@K. A. Buhr 我的目录包含简单的纯文本文档。我想分别获取每个文件的结果，以便以后可以将它们与结果合并到一个 Excel 表中。

标签： r loops readability korpus

【解决方案1】：

我遇到了同样的问题。我正在通过 stackoverflow 寻找解决方案，并看到了您的帖子。经过反复试验，我想出了以下代码。对我来说工作得很好。我提取了所有额外的信息。为了找到我正在寻找的分数的索引值，我首先为一个文件运行它并提取可读性包装器的摘要。它会给你一个包含一堆不同值的表格。将列与行匹配，您将获得要查找的特定数字。有很多不同的选择。

在路径目录中，您的文件应该是独立的文本文件。

#Path
path="C:\\Users\\Philipp\\SkyDrive\\Documents\\Thesiswork\\ReadStats\\"

#list text files 
ll.files <- list.files(path = path, pattern = "txt",  full.names = TRUE);length(ll.files)

#set vectors
SMOG.score.vec=rep(0.,length(ll.files))
FleshKincaid.score.vec=rep(0.,length(ll.files))
FOG.score.vec=rep(0.,length(ll.files))

#loop through each file
for (i in 1:length(ll.files)){
  #tokenize
  tagged.text <- koRpus::tokenize(ll.files[i], lang="en")
  #hyphen the word for some of the packages that require it
  hyph.txt.en <- koRpus::hyphen(tagged.text)
  #Readability wrapper
  readbl.txt <- koRpus::readability(tagged.text, hyphen=hyph.txt.en, index="all")
  #Pull scores, convert to numeric, and update the vectors
  SMOG.score.vec[i]=as.numeric(summary(readbl.txt)$raw[36]) #SMOG Score
  FleshKincaid.score.vec[i]=as.numeric(summary(readbl.txt)$raw[11]) #Flesch Reading Ease Score 
  FOG.score.vec[i]=as.numeric(summary(readbl.txt)$raw[22]) #FOG score
  if (i%%10==0)
    cat("finished",i,"\n")}

#if you wanted to do just one
df=cbind(FOG.score.vec,FleshKincaid.score.vec,SMOG.score.vec)
colnames(df)=c("FOG", "Flesch Kincaid", "SMOG")
write.csv(df,file=paste0(path,"Combo.csv"),row.names=FALSE,col.names=TRUE)

# if you wanted to write seperate csvs
write.csv(SMOG.score.vec,file=paste0(path,"SMOG.csv"),row.names=FALSE,col.names = "SMOG")
write.csv(FOG.score.vec,file=paste0(path,"FOG.csv"),row.names=FALSE,col.names = "FOG")
write.csv(FleshKincaid.score.vec,file=paste0(path,"FK.csv"),row.names=FALSE,col.names = "Flesch Kincaid")

【讨论】：