【发布时间】:2018-09-13 17:15:21
【问题描述】:
part of data我正在尝试从大型数据集(4 个月的数据)中计算每小时测量值的平均值(大约每小时 20 次),但我需要删除每小时定义为 2SD 的异常值远离每小时平均值。
structure(list(YEAR = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L), MONTH = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), DAY = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), HOUR = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), MINUTE = c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), SECOND = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), Tmp = c(25.6984, 25.6967, 25.6962, 25.6962,
25.6955, 25.6949, 25.6959, 25.6944, 25.6954, 25.6954, 25.6958,
25.6958, 25.6962, 25.6967, 25.6982, 25.6976, 25.6978, 25.6977,
25.6975, 25.6979, 25.5552, 25.5577, 25.5579, 25.5573, 25.746,
25.7248, 25.7164, 25.7249, 25.7379, 25.752, 25.7502, 25.7678,
25.7805, 25.7871, 25.7863, 25.7856, 25.7948, 25.7939, 25.7953,
25.7969, 25.7982, 25.7981, 25.7972, 25.7978, 25.644, 25.6451,
25.6455, 25.6456, 25.6451, 25.6454)), row.names = c(NA, 50L), class = "data.frame")
【问题讨论】:
-
听起来逻辑清晰,但我们需要一些数据来帮助您。
-
我添加了部分数据的图片。如您所见,我有按月、日和分钟分隔的列。
-
不是很有帮助,因为我必须自己输入数据。 :) 你可以使用
dput()并在此处发布输出吗?看看mtcars的前3行是如何查找的:dput(mtcars[1:3,]) -
更好吗?这只是数据的一部分,我还有更多列...
-
您发布的部分数据没有足够的变化,所以我手动更改了一个值以创建异常值。希望我的回答有帮助...
标签: r aggregate average outliers