【发布时间】:2018-06-08 20:14:31
【问题描述】:
首先我要声明我对 R 语言的经验并不丰富。我有一个大的长格式数据框,以下面的 df 为例,有 3 列:Group、ID 和 dat。我想删除每个“group-id”中的异常值(或者更确切地说用平均值替换)。
Group = c("1","1","2","2","3","3","1","1","2","2","3","3","1","1","2","2","3","3","1","1","2","2","3","3")
ID = c("Eb","Eb","Eb","Eb","Eb","Eb","Sd","Sd","Sd","Sd","Sd","Sd","Re","Re","Re","Re","Re","Re","Tf","Tf","Tf","Tf","Tf","Tf")
dat = c(2,3,4,5,6,7,8,9,1010,11,12,13,1,2,3,-10000,5,6,4,3,2,7,6666,5)
df = data.frame(Group,ID,dat)
我的基本方法(不起作用)如下(我已经尝试了这段代码的多次迭代):
library(outliers)
library(plyr)
# Function to remove outliers
RmOurliFUN = function(x){
rm.outlier(x$dat, fill = TRUE)
}
# splitting data based on first Group, and then ID to apply the outlier removal
GroupSplit = function(x){ddply(x,"ID",RmOurliFUN)}
df2 = ddply(df1, "Group", GroupSplit)
我收到各种错误消息,但通常参数不是数字或逻辑。我很确定我没有正确调用 nested>nested 函数中的 dat 列。
如何执行这样的操作?我愿意接受任何建议。
【问题讨论】:
-
class(df1$dat)是什么?听起来您需要将其转换为数字。 -
同意 Esther - 如果
Group是分类的,那么将其作为一个因素或字符类是有意义的,但看起来您正在尝试检测数字异常值。2是一个数字,"2"是一个字符串,所以你的dat列可能是一个因素或一个字符。使用df$dat = as.numeric(as.character(df$dat))将其转换为数字,然后重试。 -
对不起,我想我做了一个糟糕的示例数据集,我的实际数据是数字,但是当我将此数据集更改为数字时(我现在在上面的示例中已经这样做了)它仍然没有不行。还有
as.character()和as.numeric()forx$dat也不能解决问题...