【发布时间】:2014-09-02 18:37:21
【问题描述】:
我需要将 NA 值向前填充,即将 NA 替换为上一个之前的非 NA 值。这是一个示例,但最后一行没有向前填充。我收到一个错误,即要替换的值的数量与替换值的数量不同。我做错了什么?
# Test time accumulation and assignment
foo_df <- NULL
nTimes = 10000
nEvents = 70
nUnits = 300
usageTimes = seq(0.5, 3, .5)
events = c("File Event", paste("Event ",seq(1,nEvents)))
randDates <- function(N, st="2014/01/01", et="2014/07/31") {
st <- as.POSIXct(as.Date(st))
et <- as.POSIXct(as.Date(et))
dt <- as.numeric(difftime(et,st,unit="sec"))
ev <- sort(runif(N, 0, dt))
rt <- st + ev
}
probEvent = rep(1, length(events))
probEvent[1] = 5
# Generate fake data with events, units, and event times
foo_df = data.frame(eventName = sample(events, nTimes, replace=T, probEvent),
unit = sample(seq(1,nUnits),nTimes,replace=T),
event_time= randDates(nTimes),
usageTime = NA, cumTime=NA)
# Order by time, and set the first nUnits events to File Event for each unit
foo_df = foo_df[with(foo_df, order(event_time)), ]
foo_df[1:nUnits ]$eventName = "File Event"
foo_df[1:nUnits ]$unit = seq(1,nUnits)
# Add random usage times to File Events
nFile = length(foo_df$eventName[foo_df$eventName == "File Event"])
foo_df$usageTime[foo_df$eventName == "File Event"] = sample(usageTimes, nFile, replace=T)
# Order by unit / event time
foo_df = foo_df[with(foo_df, order(unit,event_time)), ]
# accumulate the event time for file events
entire_file_rows = foo_df$eventName=="File Event"
temp_df = data.frame(cum_ft=0, event_time=foo_df$event_time[entire_file_rows],
unit=foo_df$unit[entire_file_rows], usageTime=foo_df$usageTime[entire_file_rows])
temp_df$cumTime <- ave(temp_df$usageTime, temp_df$unit, FUN=cumsum)
foo_df$cumTime[entire_file_rows] = temp_df$cumTime
# This is where I'm stuck
# Want to assign the cummulative time to the other events (non File Event)
library(zoo) ;
# foo_df[foo_df$eventType != "File Event"]$"cumTime" <- NA
foo_df$cumTime <- na.locf(foo_df$cumTime)
我收到错误消息:“$<-.data.frame(*tmp*, "cumTime", value = c(2.5, 2.5, 4, 4, :
替换有9993行,数据有10000"
我可以看到有两个问题,第一,NA首先出现,所以它们不会从na.locf中得到,第二,locf应该按单元分组。
但是,为什么 NA 首先出现?根据 EventTime 对数据进行排序,然后为第一个 nUnit 记录分配单元编号 1 到 nUnit,以及 eventName“文件事件”。以后如何按单位和事件时间排序,在“文件事件”记录之前有时间?
这个过程应该以cumTime为单位累计使用时间,记录按单位排序,然后是EventTime。在将 cumTIme 从“文件事件”转移到其他事件之前,我绘制了按单元和事件类型与 eventTime 分组的 cumTime,该图看起来不错,cumTime 正在增加。但是,在将 cumTime 从“文件事件”转移到其他事件之后,cumTime(按单位/事件类型分组)与 eventTime 的图不正确,因为 cumTime 具有峰值和逐渐减小的值,这是不可能的。
【问题讨论】: