在 r 中，分配前一个非 na 值答案

【问题标题】：In r, assigning the previous non na value在 r 中，分配前一个非 na 值
【发布时间】：2014-09-02 18:37:21
【问题描述】：

我需要将 NA 值向前填充，即将 NA 替换为上一个之前的非 NA 值。这是一个示例，但最后一行没有向前填充。我收到一个错误，即要替换的值的数量与替换值的数量不同。我做错了什么？

# Test time accumulation and assignment
foo_df <- NULL
nTimes = 10000
nEvents = 70
nUnits = 300
usageTimes = seq(0.5, 3, .5)
events = c("File Event", paste("Event ",seq(1,nEvents)))
randDates <- function(N, st="2014/01/01", et="2014/07/31") {
     st <- as.POSIXct(as.Date(st))
     et <- as.POSIXct(as.Date(et))
     dt <- as.numeric(difftime(et,st,unit="sec"))
     ev <- sort(runif(N, 0, dt))
     rt <- st + ev
}
probEvent = rep(1, length(events))
probEvent[1] = 5
# Generate fake data with events, units, and event times
foo_df = data.frame(eventName = sample(events, nTimes, replace=T, probEvent),
   unit = sample(seq(1,nUnits),nTimes,replace=T),
   event_time= randDates(nTimes),
   usageTime = NA, cumTime=NA)
# Order by time, and set the first nUnits events to File Event for each unit
foo_df = foo_df[with(foo_df, order(event_time)), ]
foo_df[1:nUnits ]$eventName = "File Event"
foo_df[1:nUnits ]$unit = seq(1,nUnits)
# Add random usage times to File Events
nFile = length(foo_df$eventName[foo_df$eventName == "File Event"])
foo_df$usageTime[foo_df$eventName == "File Event"] = sample(usageTimes, nFile, replace=T)
# Order by unit / event time
foo_df = foo_df[with(foo_df, order(unit,event_time)), ]

# accumulate the event time for file events
entire_file_rows = foo_df$eventName=="File Event"
temp_df = data.frame(cum_ft=0, event_time=foo_df$event_time[entire_file_rows],
      unit=foo_df$unit[entire_file_rows], usageTime=foo_df$usageTime[entire_file_rows])
temp_df$cumTime <- ave(temp_df$usageTime, temp_df$unit, FUN=cumsum) 
foo_df$cumTime[entire_file_rows] = temp_df$cumTime

# This is where I'm stuck
# Want to assign the cummulative time to the other events (non File Event)

library(zoo) ; 
# foo_df[foo_df$eventType != "File Event"]$"cumTime" <- NA 
foo_df$cumTime <- na.locf(foo_df$cumTime)

我收到错误消息：“$<-.data.frame(*tmp*, "cumTime", value = c(2.5, 2.5, 4, 4, : 替换有9993行，数据有10000"

我可以看到有两个问题，第一，NA首先出现，所以它们不会从na.locf中得到，第二，locf应该按单元分组。

但是，为什么 NA 首先出现？根据 EventTime 对数据进行排序，然后为第一个 nUnit 记录分配单元编号 1 到 nUnit，以及 eventName“文件事件”。以后如何按单位和事件时间排序，在“文件事件”记录之前有时间？

这个过程应该以cumTime为单位累计使用时间，记录按单位排序，然后是EventTime。在将 cumTIme 从“文件事件”转移到其他事件之前，我绘制了按单元和事件类型与 eventTime 分组的 cumTime，该图看起来不错，cumTime 正在增加。但是，在将 cumTime 从“文件事件”转移到其他事件之后，cumTime（按单位/事件类型分组）与 eventTime 的图不正确，因为 cumTime 具有峰值和逐渐减小的值，这是不可能的。

【问题讨论】：

标签： r zoo na

【解决方案1】：

您有领先的NA 值的问题。默认情况下，这些在 na.locf 中被删除，导致赋值右侧的短向量。

您可以使用前导 NA's 进行分配：

foo_df$cumTime <- na.locf(foo_df$cumTime, na.rm=FALSE)

这将覆盖除前导值之外的每个 NA 值。

然后您可以将前导 NA 值分配给其他值：

foo_df$cumTime[is.na(foo_df$cumTime)] <- 0

【讨论】：

感谢您的回复。一旦我按单元分组，我忽略了，那么如果有领先的 NA，反向执行 na.locf 是有意义的。如何按单元分组？
或试试na.fill(na.locf(...whatever...), 0)
开拍，我问的问题不够好。您能否阅读扩展的问题，然后尝试修改您的答案。在“真实”数据中，累积时间会随着事件时间的推移而减少，这是不可能的，我认为错误在于按单位/事件时间排序，这可能与导致领先的错误相同na 在这个例子中（不应该发生）。
@Matthew Lundberg由于数据是按eventTime排序的，并且前nUnit记录分配了nunits和“文件事件”，那么按单位排序时怎么可能有“文件事件”之前的记录和事件时间？

【解决方案2】：

错误在于将“文件事件”和单元分配给前 nUnits 记录的行中。正确的行是

foo_df$eventName[1:nUnits ] = "File Event"
foo_df$unit[1:nUnits ] = seq(1,nUnits)

那么，第一行没有使用时间就没有问题了，命令 na.locf(foo_df$cumTime) 生成正确的记录数。

【讨论】：