ddply 错误的含义：'names' 属性 [9] 必须与向量 [1] 的长度相同答案

【问题标题】：meaning of ddply error: 'names' attribute [9] must be the same length as the vector [1]ddply 错误的含义：'names' 属性 [9] 必须与向量 [1] 的长度相同
【发布时间】：2012-12-18 15:39:37
【问题描述】：

我正在为黑客学习机器学习，我被困在这一行：

from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))

这会产生以下错误：

Error in attributes(out) <- attributes(col) : 
  'names' attribute [9] must be the same length as the vector [1]

这是一个回溯（）：

> traceback()
11: FUN(1:5[[1L]], ...)
10: lapply(seq_len(n), extract_col_rows, df = x, i = i)
9: extract_rows(x$data, x$index[[i]])
8: `[[.indexed_df`(pieces, i)
7: pieces[[i]]
6: function (i) 
   {
       piece <- pieces[[i]]
       if (.inform) {
           res <- try(.fun(piece, ...))
           if (inherits(res, "try-error")) {
               piece <- paste(capture.output(print(piece)), collapse = "\n")
               stop("with piece ", i, ": \n", piece, call. = FALSE)
           }
       }
       else {
           res <- .fun(piece, ...)
       }
       progress$step()
       res
   }(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))

priority.train 对象是一个数据框，这里有更多信息：

> mode(priority.train)
[1] "list"
> names(priority.train)
[1] "Date"       "From.EMail" "Subject"    "Message"    "Path"      
> sapply(priority.train, mode)
       Date  From.EMail     Subject     Message        Path 
     "list" "character" "character" "character" "character" 
> sapply(priority.train, class)
$Date
[1] "POSIXlt" "POSIXt" 

$From.EMail
[1] "character"

$Subject
[1] "character"

$Message
[1] "character"

$Path
[1] "character"

> length(priority.train)
[1] 5
> nrow(priority.train)
[1] 1250
> ncol(priority.train)
[1] 5
> str(priority.train)
'data.frame':   1250 obs. of  5 variables:
 $ Date      : POSIXlt, format: "2002-01-31 22:44:14" "2002-02-01 00:53:41" "2002-02-01 02:01:44" "2002-02-01 10:29:23" ...
 $ From.EMail: chr  "removed@removed.ca" "removed@removed.net" "removed@removed.ca" "removed@removed.net" ...
 $ Subject   : chr  "please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" ...
 $ Message   : chr  "    \n Hello,\n   \n         I just installed redhat 7.2 and I think I have everything \nworking properly.  Anyway I want to in"| __truncated__ "Make sure you rebuild as root and you're in the directory that you\ndownloaded the file.  Also it might complain of a few depen"| __truncated__ "Lance wrote:\n\n>Make sure you rebuild as root and you're in the directory that you\n>downloaded the file.  Also it might compl"| __truncated__ "Once upon a time, rob wrote :\n\n>  I dl'd gcc3 and libgcc3, but I still get the same error message when I \n> try rpm --rebuil"| __truncated__ ...
 $ Path      : chr  "../03-Classification/data/easy_ham/01061.6610124afa2a5844d41951439d1c1068" "../03-Classification/data/easy_ham/01062.ef7955b391f9b161f3f2106c8cda5edb" "../03-Classification/data/easy_ham/01063.ad3449bd2890a29828ac3978ca8c02ab" "../03-Classification/data/easy_ham/01064.9f4fc60b4e27bba3561e322c82d5f7ff" ...
Warning messages:
1: In encodeString(object, quote = "\"", na.encode = FALSE) :
  it is not known that wchar_t is Unicode on this platform
2: In encodeString(object, quote = "\"", na.encode = FALSE) :
  it is not known that wchar_t is Unicode on this platform

我会发布一个示例，但内容有点长，我认为内容与这里无关。

同样的错误也发生在这里：

> ddply(priority.train, .(Subject))
Error in attributes(out) <- attributes(col) : 
  'names' attribute [9] must be the same length as the vector [1]

有人知道这里发生了什么吗？该错误似乎是由与priority.train 不同的对象生成的，因为它的names 属性显然有9 个元素。

如果有任何帮助，我将不胜感激。谢谢！

问题已解决

感谢@user1317221_G 关于使用 dput 函数的提示，我发现了问题。问题在于 Date 字段，此时它是一个包含 9 个字段（sec、min、hour、mday、mon、year、wday、yday、isdst）的列表。为了解决这个问题，我只是将日期转换为字符向量，使用 ddply 然后将日期转换回日期：

> tmp <- priority.train$Date
> priority.train$Date <- as.character(priority.train$Date)
> from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))
> priority.train$Date <- tmp
> rm(tmp)

【问题讨论】：

我可以建议str(priority.train)代替您的附加信息吗？
@sebastian-c 当然！我现在将编辑问题。
“这个错误在 R 中是什么意思？”可能是您可以使用的最无用的问题标题。下次请多考虑一下。
在 data.frames 中使用 POSIXct 日期，而不是 POSIXlt。
我在日期字段上遇到了同样的错误。 @hadley 的评论解决了我的问题。这并不奇怪。

标签： r plyr

【解决方案1】：

您可能已经有seen this，但它没有帮助。我想我们可能还没有答案，因为人们无法重现您的错误。

dput 或更小的head(dput()) 可能对此有所帮助。但这里有一个使用base的替代方案：

x <- data.frame(A=c("a","b","c","a"),B=c("e","d","d","d"))

ddply(x,.(A),summarise, Freq = length(B))
  A Freq
1 a    2
2 b    1
3 c    1

 tapply(x$B,x$A,length)
a b c 
2 1 1

tapply 对你有用吗？

x2 <- data.frame(A=c("removed@removed.ca", "removed@removed.net"),
                 B=c("please help a newbie compile mplayer :-)", 
                     "re: please help a newbie compile mplayer :-)"))

tapply(x2$B,x2$A,length)
removed@removed.ca removed@removed.net 
              1                   1 

ddply(x2,.(A),summarise, Freq = length(B))
                    A Freq
1  removed@removed.ca    1
2 removed@removed.net    1

你也可以尝试更简单的：

table(x2$A)

 removed@removed.ca removed@removed.net 
              1                   1

【讨论】：

您编写的示例运行良好。我已经从 DF 中删除了除前两行之外的所有行，并将所有值设置为 NA。然后我运行了你提到的 dput 函数，惊喜！ Date 字段是一个包含 9 个字段（sec、min、hour、mday、mon、year、wday、yday、isdst）的列表。将日期字段转换为字符向量解决了这个问题。谢谢！！

【解决方案2】：

我有一个非常相似的问题，虽然不确定它是否相同。我收到以下错误。

Error in attributes(out) <- attributes(col) : 
  'names' attribute [20388] must be the same length as the vector [128]

我在列表模式下没有任何变量，所以 Mota 的解决方案不适用于我的情况。我对问题进行排序的方式是删除 plyr 1.8 并手动安装 plyr 1.7。然后错误消失了。我也尝试重新安装 plyr 1.8 并复制了问题。

HTH。

【讨论】：

我也看到了同样的错误，用一新的方法修复了。

【解决方案3】：

我通过将格式从 POSIXlt 转换为 POSIXct 解决了这个问题，正如 Hadley 上面建议的那样 - 一行代码：

    mydata$datetime<-strptime(mydata$datetime, "%Y-%m-%d %H:%M:%S") # original conversion from datetime string : > class(mydata$datetime) [1] "POSIXlt" "POSIXt" 
    mydata$datetime<-as.POSIXct(mydata$datetime) # convert to POSIXct to use in data frames / ddply

【讨论】：

那一行 'mydata$datetime

【解决方案4】：

我也遇到了 ddply 的类似问题，并给出了以下代码/错误：

    test <- ddply(test, "catColumn", function(df) df[1:min(nrow(df), 3),])
    Error: 'names' attribute [11] must be the same length as the vector [2]

数据框“测试”中有很多分类变量。

将分类变量转换为字符变量如下使 ddply 命令起作用：

    test <- data.frame(lapply(test, as.character), stringsAsFactors=FALSE)

【讨论】：

【解决方案5】：

我在使用ddply 时遇到了同样的问题，并用doBy 修复了它

library(doBy) 
bylength = function(x){length(x)} 
newdt = bylength(X ~From.EMail + To.EMail, data = dt, FUN = bylength)

【讨论】：

【解决方案6】：

一旦您了解干扰的是一个日期列，您也可以在运行命令时简单地忽略该列而不是转换它...

所以

from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))

可以变成

from.weight <- ddply(priority.train[,c(1:7,9:10)], .(From.EMail), summarise, Freq = length(Subject))

例如，如果 POSIXlt 日期恰好在数据框的第 8 列中。报告的错误的奇怪之处在于它可能与您尝试分组的内容或您正在寻找的输出信息无关......

【讨论】：

【解决方案7】：

我也面临同样的问题，我只保留 ddply 所需的数据并使用 as.character 将过滤器变量和所有所需的文本变量转换为字符来解决它

成功了

【讨论】：

【解决方案8】：

没有数据我无法对此进行测试，但请尝试使用dplyr 而不是plyr。像这样的东西应该返回预期的输出。您必须将其强制返回数据框，因为dplyr 输出将是一个小标题。

from.weight <- priority.train %>%
               group_by(From.EMail) %>%
               summarise(Freq = length(Subject)) %>%
               ungroup() %>%
               as.data.frame()

【讨论】：