【问题标题】:calculating difference between subsequent days without for loop在没有 for 循环的情况下计算随后几天之间的差异
【发布时间】:2017-02-22 11:59:34
【问题描述】:

我有以下数据框:

time <- c("2004-01-01 01:30:00","2004-01-01 04:30:00","2004-01-01 07:30:00",
          "2004-01-01 10:30:00","2004-01-01 13:30:00","2004-01-01 16:30:00",
          "2004-01-01 19:30:00","2004-01-01 22:30:00","2004-01-02 01:30:00",
          "2004-01-02 04:30:00","2004-01-02 07:30:00","2004-01-02 10:30:00",
          "2004-01-02 13:30:00","2004-01-02 16:30:00","2004-01-02 19:30:00",
          "2004-01-02 22:30:00","2004-01-03 01:30:00","2004-01-03 04:30:00",
          "2004-01-03 07:30:00","2004-01-03 10:30:00")
d <- c(0.00, 0.00,152808.30, 739872.84, 82641.22, 83031.04, 83031.04, 82641.22, 0.00, 
       0.00, 267024.71,1247414.7, 151638.85, 151249.03, 151249.03, 152028.67, 0.00, 0.00, 
       296650.81,1355783.85)
dat <- data.frame(time = time, dat = d)

显示来自预测模型 3 天的太阳辐射累积(每天)。

要将太阳辐射的单位从 J/m2 转换为 W/m2,我需要计算每天不同预测时间之间的差异,然后除以 10800(预测时间)。这是我的尝试:

itime <- as.numeric(as.Date(dat$time))
utime <- unique(itime)
l <- list()
for(i in 1:length(utime)){
  idx <- itime == utime[i]
  dat2 <- dat[idx,]
  dat3 <- dat2[1,2]/10800
  for(ii in 2:nrow(dat2)){
    dat3[ii] <- (abs(dat2[ii,2] - dat2[ii-1,2]))/10800
  }
  df <- data.frame(dateTime = dat2$time,
                   dd = dat3)
  l[[i]] <- df
}
df1 <- do.call(rbind.data.frame, l)
df1[,1] <- as.POSIXct(df1[,1])

按预期执行。但是,我打算使用此代码的实际数据长度超过 100 天。因此,运行循环并不是最优的。

还有其他方法可以代替循环吗?

我试过了:

dat2 <- c(dat[1,2]/10800,rev(abs(diff(rev(dat[,2])))/10800))
df2 <- data.frame(time = as.POSIXct(dat[,1]), dd = dat2)

它给出了几乎相同的答案(与循环一样),但它还计算不同日期的时间步长之间的差异,而不是将计算隔离到各个日期。

plot(df1, type = 'l')
lines(df2, col = 'red')

如您所见,凌晨存在不匹配。

谁能推荐另一种方法?

【问题讨论】:

    标签: r


    【解决方案1】:

    对于您的列表l,您可以通过

    获得相同的结果
    dat <- data.frame(
    time = c("2004-01-01 01:30:00","2004-01-01 04:30:00","2004-01-01 07:30:00",
              "2004-01-01 10:30:00","2004-01-01 13:30:00","2004-01-01 16:30:00",
              "2004-01-01 19:30:00","2004-01-01 22:30:00","2004-01-02 01:30:00",
              "2004-01-02 04:30:00","2004-01-02 07:30:00","2004-01-02 10:30:00",
              "2004-01-02 13:30:00","2004-01-02 16:30:00","2004-01-02 19:30:00",
              "2004-01-02 22:30:00","2004-01-03 01:30:00","2004-01-03 04:30:00",
              "2004-01-03 07:30:00","2004-01-03 10:30:00"),
    dat = c(0.00, 0.00,152808.30, 739872.84, 82641.22, 83031.04, 83031.04, 82641.22, 0.00, 
           0.00, 267024.71,1247414.7, 151638.85, 151249.03, 151249.03, 152028.67, 0.00, 0.00, 
           296650.81,1355783.85)
    )
    
    dat$itime <- as.numeric(as.Date(dat$time))
    utime <- unique(dat$itime)
    
    daydat <- function(u) { 
      dat2 <- dat[dat$itime==u,]
      data.frame(dateTime = dat2$time, dd = c(dat2$dat[1], abs(diff(dat2$dat)))/10800)
    }
    l <- lapply(utime, daydat)
    

    这是一个带有split()的版本:

    dat$itime <- as.numeric(as.Date(dat$time))
    
    daydat <- function(d) data.frame(dateTime = d$time, dd = c(d$dat[1], abs(diff(d$dat)))/10800)
    
    L <- split(dat, dat$itime)
    l <- lapply(L, daydat)
    

    或者不创建dat$itime:

    daydat <- function(d) data.frame(dateTime = d$time, dd = c(d$dat[1], abs(diff(d$dat)))/10800)
    l <- lapply(split(dat, as.Date(dat$time)), FUN=daydat)
    

    或使用by()

    l2 <- unclass(by(dat, as.Date(dat$time), FUN=daydat))
    

    如果你想在原始数据框中得到结果,你可以使用ave()

    dat$dd <- ave(dat$dat, as.Date(dat$time), FUN=function(x) c(x[1], abs(diff(x)))/10800)
    

    【讨论】:

    • 第一种方法有效,不过需要在函数中的diff前加abs。
    【解决方案2】:

    使用可以使用lag() from dplyrgroup_by()

    library(dplyr)
    df <- dat %>%
        mutate(date = as.Date(time)) %>%
        group_by(date) %>%
        mutate(before.dat = lag(dat, order_by=date)) %>%
        mutate(diff = abs(dat - before.dat)/10800) %>%
        select(time, date, dat, before.dat, diff)
    df
    #Source: local data frame [20 x 5]
    #Groups: date [3]
    #                time        date        dat before.dat          diff
    #                <fctr>     <date>      <dbl>      <dbl>         <dbl>
    #1  2004-01-01 01:30:00 2004-01-01       0.00         NA            NA
    #2  2004-01-01 04:30:00 2004-01-01       0.00       0.00    0.00000000
    #3  2004-01-01 07:30:00 2004-01-01  152808.30       0.00   14.14891667
    #4  2004-01-01 10:30:00 2004-01-01  739872.84  152808.30   54.35782778
    #5  2004-01-01 13:30:00 2004-01-01   82641.22  739872.84   60.85477963
    #6  2004-01-01 16:30:00 2004-01-01   83031.04   82641.22    0.03609444
    #7  2004-01-01 19:30:00 2004-01-01   83031.04   83031.04    0.00000000
    #8  2004-01-01 22:30:00 2004-01-01   82641.22   83031.04    0.03609444
    #9  2004-01-02 01:30:00 2004-01-02       0.00         NA            NA
    #10 2004-01-02 04:30:00 2004-01-02       0.00       0.00    0.00000000
    #11 2004-01-02 07:30:00 2004-01-02  267024.71       0.00   24.72451019
    #12 2004-01-02 10:30:00 2004-01-02 1247414.70  267024.71   90.77685093
    #13 2004-01-02 13:30:00 2004-01-02  151638.85 1247414.70  101.46072685
    #14 2004-01-02 16:30:00 2004-01-02  151249.03  151638.85    0.03609444
    #15 2004-01-02 19:30:00 2004-01-02  151249.03  151249.03    0.00000000
    #16 2004-01-02 22:30:00 2004-01-02  152028.67  151249.03    0.07218889
    #17 2004-01-03 01:30:00 2004-01-03       0.00         NA            NA
    #18 2004-01-03 04:30:00 2004-01-03       0.00       0.00    0.00000000
    #19 2004-01-03 07:30:00 2004-01-03  296650.81       0.00   27.46766759
    #20 2004-01-03 10:30:00 2004-01-03 1355783.85  296650.81   98.06787407
    

    基于 GGamba 评论的简化代码

    dat %>%
        mutate(time = as.Date(time)) %>%
        group_by(time) %>%
        mutate(diff = (dat-lag(dat)) / 10800)
    

    【讨论】:

    • df %&gt;% mutate(time = as.Date(time)) %&gt;% group_by(time) %&gt;% mutate(dd = (dat-lag(dat)) / 10800) 让它更简单
    • @GGamba 是的。我同意。但保留了额外的步骤和列以查看实际的步骤。
    猜你喜欢
    • 2016-10-31
    • 1970-01-01
    • 2021-01-29
    • 2021-11-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-03-24
    • 2019-02-21
    相关资源
    最近更新 更多