【发布时间】:2017-03-22 12:00:29
【问题描述】:
以下列表“ls”包含三个数据框:
unigrams = data.frame(freq = c(3, 3, 5, 4, 3, 41),
term = c("a-list", "a-p", "aaa", "aam", "aamir", "aaron"))
bigrams = data.frame(freq = c(13, 1, 1, 2, 1, 4),
term = c("a a", "a abode", "a about", "a absolutely", "a accessory", "a acre"))
trigrams = data.frame(freq = c(1, 1, 1, 1, 1, 1),
term = c("a a card", "a a divorce", "a a dreamer", "a a great", "a a guy", "a a hand"))
ls = list(unigrams, bigrams, trigrams)
这给了我们这个:
[[1]]
freq term
1 3 a-list
2 3 a-p
3 5 aaa
4 4 aam
5 3 aamir
6 41 aaron
[[2]]
freq term
1 13 a a
2 1 a abode
3 1 a about
4 2 a absolutely
5 1 a accessory
6 4 a acre
[[3]]
freq term
1 1 a a card
2 1 a a divorce
3 1 a a dreamer
4 1 a a great
5 1 a a guy
6 1 a a hand
我想将每个数据框中的“term”列按单词数分开,创建“word1”、“word2”、“word3”列。像这样:
freq word1
1 3 a-list
2 3 a-p
3 5 aaa
4 4 aam
5 3 aamir
6 41 aaron
freq word1 word2
1 13 a a
2 1 a abode
3 1 a about
4 2 a absolutely
5 1 a accessory
6 4 a acre
freq word1 word2 word3
1 1 a a card
2 1 a a divorce
3 1 a a dreamer
4 1 a a great
5 1 a a guy
6 1 a a hand
我的尝试:
new_ls = list()
for (i in length(ls)) {
x = ls[[i]]
# Split each word in column "term":
x[,paste("word", 1:i, sep = "")] = as.character(lapply(strsplit(as.character(x$term), split=" "), "[", i))
x = subset(x, select = -term)
new_ls[[i]] = x
}
不幸的是,最后这个 sn-p 只在最后一个元素中存储了一些错误的结果:
[[1]]
NULL
[[2]]
NULL
[[3]]
freq word1 word2 word3
1 1 card card card
2 1 divorce divorce divorce
3 1 dreamer dreamer dreamer
4 1 great great great
5 1 guy guy guy
6 1 hand hand hand
我做错了什么?
【问题讨论】:
-
new_ls[[ as.character(i) ]] <- x -
什么是
x$term?请使用可重现的示例。另外,我认为您需要as.character(sapply(strsplit(.. -
正在处理它。给我一分钟。