1.动物园。 zoo 包具有多路合并功能,可以紧凑地执行此操作。 lapply 将 myList 的每个组件转换为 zoo 对象,然后我们简单地将它们全部合并:
# optionally add nice names to the list
names(myList) <- paste("t", seq_along(myList), sep = "")
library(zoo)
fz <- function(x)with(as.data.frame(x, stringsAsFactors=FALSE), zoo(Freq, Var1)))
out <- do.call(merge, lapply(myList, fz))
上面返回一个多元动物园系列,其中“时间”是"a"、"ago" 等,但如果需要数据框结果,那么它只是as.data.frame(out) 的问题。
2。减少。这是第二个解决方案。它在 R 的核心中使用了Reduce。
merge1 <- function(x, y) merge(x, y, by = 1, all = TRUE)
out <- Reduce(merge1, lapply(myList, as.data.frame, stringsAsFactors = FALSE))
# optionally add nice names
colnames(out)[-1] <- paste("t", seq_along(myList), sep = "")
3. xtabs。这个将名称添加到列表中,然后将频率、名称和组提取为一个长向量,每个向量使用xtabs 将它们重新组合在一起:
names(myList) <- paste("t", seq_along(myList))
xtabs(Freq ~ Names + Group, data.frame(
Freq = unlist(lapply(myList, unname)),
Names = unlist(lapply(myList, names)),
Group = rep(names(myList), sapply(myList, length))
))
基准测试
使用 rbenchmark 包对一些解决方案进行基准测试,我们得到以下结果,这表明 zoo 解决方案在样本数据上是最快的,并且可以说也是最简单的。
> t1<-table(strsplit(tolower("this is a test in the event of a real word file you would see many more words here"), "\\W"))
> t2<-table(strsplit(tolower("Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal"), "\\W"))
> t3<-table(strsplit(tolower("Ask not what your country can do for you - ask what you can do for your country"), "\\W"))
> myList <- list(t1, t2, t3)
>
> library(rbenchmark)
> library(zoo)
> names(myList) <- paste("t", seq_along(myList), sep = "")
>
> benchmark(xtabs = {
+ names(myList) <- paste("t", seq_along(myList))
+ xtabs(Freq ~ Names + Group, data.frame(
+ Freq = unlist(lapply(myList, unname)),
+ Names = unlist(lapply(myList, names)),
+ Group = rep(names(myList), sapply(myList, length))
+ ))
+ },
+ zoo = {
+ fz <- function(x) with(as.data.frame(x, stringsAsFactors=FALSE), zoo(Freq, Var1))
+ do.call(merge, lapply(myList, fz))
+ },
+ Reduce = {
+ merge1 <- function(x, y) merge(x, y, by = 1, all = TRUE)
+ Reduce(merge1, lapply(myList, as.data.frame, stringsAsFactors = FALSE))
+ },
+ reshape = {
+ freqs.list <- mapply(data.frame,Words=seq_along(myList),myList,SIMPLIFY=FALSE,MoreArgs=list(stringsAsFactors=FALSE))
+ freqs.df <- do.call(rbind,freqs.list)
+ reshape(freqs.df,timevar="Words",idvar="Var1",direction="wide")
+ }, replications = 10, order = "relative", columns = c("test", "replications", "relative"))
test replications relative
2 zoo 10 1.000000
4 reshape 10 1.090909
1 xtabs 10 1.272727
3 Reduce 10 1.272727
添加:第二种解决方案。
添加:第三种解决方案。
添加:基准测试。