大数据集的 Savitzky-Golay 过滤答案

【问题标题】：Savitzky-Golay filtering for large data set大数据集的 Savitzky-Golay 过滤
【发布时间】：2017-01-29 21:22:09
【问题描述】：

我想将 Savitzky-Golay 过滤器（来自prospectrpackage）应用于为不同感兴趣区域采集的一组样本。这是数据示例。

 > head(file,10)
   subject eye sample_num area sample_value
         1   L          1    1    -7.813280
         1   L          2    1    -7.816787
         1   L          3    1    -7.826342
         1   L          4    1    -7.799060
         1   L          5    1    -7.817019
         1   L          6    1    -7.845589
         1   L          7    1    -7.881824
         1   L          8    1    -7.969951
         1   L          9    1    -8.022991
         1   L         10    1    -8.118056


> dput(head(file))
 structure(list(subject = c(1L, 1L, 1L, 1L, 1L, 1L), eye = structure(c(1L, 
 1L, 1L, 1L, 1L, 1L), .Label = c("L", "R"), class = "factor"), 
     sample_num = 1:6, area = c(1L, 1L, 1L, 1L, 1L, 1L), sample_value = c(-7.81328047761194, 
-7.81678696801706, -7.82634248187633, -7.79906019616205, 
-7.81701949680171, -7.84558887846482)), .Names = c("subject", 
 "eye", "sample_num", "area", "sample_value"), row.names = c(NA, 
 6L), class = "data.frame")

sample_value 中的值对应于为左眼和右眼记录的眼睛位置，每毫秒获取一次。

我想要做的是将过滤器应用于每个区域的样本数据。我尝试使用包 plyr 中的 ddply 以便按主题、眼睛和区域将文件拆分为子集并应用过滤器（我想将原始样本值和过滤后获得的值都保留在新列中）。代码如下。

newfile <- ddply(file, .(file$subject, file$eye, file$area), 
           function(x){
               x$sg_filtered <- savitzkyGolay(x$sample_value, 1,1,3)
               return(x)})

但是，我收到以下错误：

Error in `$<-.data.frame`(`*tmp*`, "sg", value = c(-0.00653100213219515,  : 
  replacement has 1838 rows, data has 1840

想必这是因为包含过滤数据的列在每个区域的第一个和最后一个sample_value不会有对应的值。有没有办法调整代码，以便我得到 NA 并保持两列的长度相同？我真的很感激这方面的任何帮助。谢谢！

【问题讨论】：

您能否使用dput(head(file)) 返回可重现的样本。这是来自prospectr packacge 对吗？
您好，感谢您的评论。我按照您的建议编辑了问题并使用了dput(head(file))。

标签： r filtering

【解决方案1】：

如果你想用NAs 填充返回的向量，你可以使用c()：

set.seed(123)
x <- rnorm(100)
w <- 3 # must be odd number
out <- c(rep(NA, (w-1)/2), savitzkyGolay(x, 1, 1, w = w), rep(NA, (w-1)/2))
length(out)
# [1] 100
head(out)
# [1]         NA  1.0595920  0.1503429 -0.7147103  0.8222783  0.1658142
tail(out)
# [1]  0.01382324  0.41334027  1.06643511 -1.21151668 -1.27951576          NA

【讨论】：