【问题标题】：Fill in values based on previous day in R根据前一天在R中填写值
【发布时间】：2021-12-13 19:24:10
【问题描述】：

我有一个如下所示的数据集：

Date,Open,High,Low,Close,Adjusted_close,Volume
2020-10-28,1384,1384,1384,1384,1384,0
2020-10-29,1297,1297,1297,1297,1297,0
2020-10-30,1283,1283,1283,1283,1283,0
2020-11-02,1284,1284,1284,1284,1284,0
2020-11-03,1263,1263,1263,1263,1263,0
2020-11-04,1224,1224,1224,1224,1224,0
2020-11-05,1194,1194,1194,1194,1194,0
2020-11-06,1196,1196,1196,1196,1196,0
2020-11-09,1207,1207,1207,1207,1207,0
2020-11-10,1200,1200,1200,1200,1200,0

我想填写 10-31 和 11-1 的值，以包含前一个交易日 (10-30) 的值。这如何在 R 中轻松完成？我觉得图书馆（tidyr）好像完全适合这张照片？

预期的表现形式是：

Date,Open,High,Low,Close,Adjusted_close,Volume
2020-10-28,1384,1384,1384,1384,1384,0
2020-10-29,1297,1297,1297,1297,1297,0
2020-10-30,1283,1283,1283,1283,1283,0
2020-10-31,1283,1283,1283,1283,1283,0
2020-11-01,1283,1283,1283,1283,1283,0
2020-11-02,1284,1284,1284,1284,1284,0
2020-11-03,1263,1263,1263,1263,1263,0
2020-11-04,1224,1224,1224,1224,1224,0
2020-11-05,1194,1194,1194,1194,1194,0
2020-11-06,1196,1196,1196,1196,1196,0
2020-11-07,1196,1196,1196,1196,1196,0
2020-11-08,1196,1196,1196,1196,1196,0
2020-11-09,1207,1207,1207,1207,1207,0
2020-11-10,1200,1200,1200,1200,1200,0

请求的 dput 输出

structure(list(Date = c("2020-10-28", "2020-10-29", "2020-10-30", 
"2020-11-02", "2020-11-03", "2020-11-04", "2020-11-05", "2020-11-06", 
"2020-11-09", "2020-11-10"), Open = c(1384L, 1297L, 1283L, 1284L, 
1263L, 1224L, 1194L, 1196L, 1207L, 1200L), High = c(1384L, 1297L, 
1283L, 1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L), Low = c(1384L, 
1297L, 1283L, 1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L
), Close = c(1384L, 1297L, 1283L, 1284L, 1263L, 1224L, 1194L, 
1196L, 1207L, 1200L), Adjusted_close = c(1384L, 1297L, 1283L, 
1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L), Volume = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, 10L), class = "data.frame")

【问题讨论】：

请通过复制dput(head(my_dataset, 10)) 的输出来提供您的数据集的可重现示例。还请提供所需输出的示例。
每个周末缺2天，不算平日没有交易。我在周六和周日都填写了等于周五的值。
明白了！抱歉，我最初误读了您的 dput() 示例 input 作为 output 的说明；所以我的评论有误，我已将其删除以避免混淆。不管怎样，my solution 在下面！
my solution 为你工作了吗？

标签： r dataframe dataset

【解决方案1】：

1) 使用read.zoo 转换为zoo 类系列z（这也将Date 转换为Date 类），然后将零宽度动物园对象与所有与z 约会。使用na.locf 填充缺失值，最后使用fortify.zoo 转换回数据框。如果结果是动物园对象没问题，则省略fortify.zoo 部分。

library(zoo)

z <- read.zoo(dat)
out1 <- merge(z, zoo(, seq(start(z), end(z), "day"))) |> 
  na.locf() |>
  fortify.zoo(name = "Date")

# check - target is defined in Note at the end
identical(out1, transform(target, Date = as.Date(Date)))
## [1] TRUE

2) 在这个替代方案中，我们使用以下管道。而不是像上面那样使用merge.zoo，而是转换为ts类并返回以扩展日期。

将dat 转换为zoo 类，这也将索引转换为Date 类。
然后将其转换为ts类。由于该类仅支持规则间隔的系列，因此转换将使用 NA 填充与缺失日期相对应的值。
然后na.locf 将填写这些 NA。
使用fortify.zoo将其转换回数据帧。
由于ts 类不支持日期索引，因此此时的日期列只是数字，因此请将它们转换回Date 类。

library(zoo)

out2 <- dat |> 
  read.zoo() |>
  as.ts() |>
  na.locf() |>
  fortify.zoo(name = "Date") |>
  transform(Date = as.Date(Date))

# check - target is defined in Note at the end    
identical(out2, transform(target, Date = as.Date(Date)))
## [1] TRUE

注意

假设可重现形式的输入 dat 和输出 target 为：

Lines <- "Date,Open,High,Low,Close,Adjusted_close,Volume
2020-10-28,1384,1384,1384,1384,1384,0
2020-10-29,1297,1297,1297,1297,1297,0
2020-10-30,1283,1283,1283,1283,1283,0
2020-10-31,1283,1283,1283,1283,1283,0
2020-11-01,1283,1283,1283,1283,1283,0
2020-11-02,1284,1284,1284,1284,1284,0
2020-11-03,1263,1263,1263,1263,1263,0
2020-11-04,1224,1224,1224,1224,1224,0
2020-11-05,1194,1194,1194,1194,1194,0
2020-11-06,1196,1196,1196,1196,1196,0
2020-11-07,1196,1196,1196,1196,1196,0
2020-11-08,1196,1196,1196,1196,1196,0
2020-11-09,1207,1207,1207,1207,1207,0
2020-11-10,1200,1200,1200,1200,1200,0"
dat <- read.csv(text = Lines, strip.white = TRUE)

Lines2 <- "Date,Open,High,Low,Close,Adjusted_close,Volume
2020-10-28,1384,1384,1384,1384,1384,0
2020-10-29,1297,1297,1297,1297,1297,0
2020-10-30,1283,1283,1283,1283,1283,0
2020-10-31,1283,1283,1283,1283,1283,0
2020-11-01,1283,1283,1283,1283,1283,0
2020-11-02,1284,1284,1284,1284,1284,0
2020-11-03,1263,1263,1263,1263,1263,0
2020-11-04,1224,1224,1224,1224,1224,0
2020-11-05,1194,1194,1194,1194,1194,0
2020-11-06,1196,1196,1196,1196,1196,0
2020-11-07,1196,1196,1196,1196,1196,0
2020-11-08,1196,1196,1196,1196,1196,0
2020-11-09,1207,1207,1207,1207,1207,0
2020-11-10,1200,1200,1200,1200,1200,0"
target <- read.csv(text = Lines2, strip.white = TRUE)

【讨论】：

【解决方案2】：

解决方案

这是tidyverse 中的一个解决方案，其中leverages 和tidyr::fill() 函数用于填充前面行中的值：

library(tidyverse)


# ...
# Code to generate 'my_data'.
# ...


my_data %>%
  # Ensure 'Date' column is proper datatype.
  mutate(Date = as.Date(Date)) %>%
  # Link to full range of dates, with blank rows for missing dates.
  right_join(
    # A temporary dataset with the full range of 'Date's.
    tibble(Date = seq(from = min(.$Date), to = max(.$Date), by = "days")),
    by = "Date"
  ) %>%
  # Sort for filling: earlier above later.
  arrange(Date) %>%
  # Fill blank rows with values above.
  fill(everything(), .direction = "down")

结果

鉴于my_data 喜欢这里转载的data.frame

my_data <- structure(
  list(
    Date = c(
      "2020-10-28", "2020-10-29", "2020-10-30", "2020-11-02", "2020-11-03",
      "2020-11-04", "2020-11-05", "2020-11-06", "2020-11-09", "2020-11-10"
    ),
    Open = c(
      1384L, 1297L, 1283L, 1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L
    ),
    High = c(
      1384L, 1297L, 1283L, 1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L
    ),
    Low = c(
      1384L, 1297L, 1283L, 1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L
    ),
    Close = c(
      1384L, 1297L, 1283L, 1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L
    ),
    Adjusted_close = c(
      1384L, 1297L, 1283L, 1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L
    ),
    Volume = c(
      0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
    )
  ),
  row.names = c(NA, 10L),
  class = "data.frame"
)

这个解决方案应该产生一个像这样的data.frame：

         Date Open High  Low Close Adjusted_close Volume
1  2020-10-28 1384 1384 1384  1384           1384      0
2  2020-10-29 1297 1297 1297  1297           1297      0
3  2020-10-30 1283 1283 1283  1283           1283      0
4  2020-10-31 1283 1283 1283  1283           1283      0
5  2020-11-01 1283 1283 1283  1283           1283      0
6  2020-11-02 1284 1284 1284  1284           1284      0
7  2020-11-03 1263 1263 1263  1263           1263      0
8  2020-11-04 1224 1224 1224  1224           1224      0
9  2020-11-05 1194 1194 1194  1194           1194      0
10 2020-11-06 1196 1196 1196  1196           1196      0
11 2020-11-07 1196 1196 1196  1196           1196      0
12 2020-11-08 1196 1196 1196  1196           1196      0
13 2020-11-09 1207 1207 1207  1207           1207      0
14 2020-11-10 1200 1200 1200  1200           1200      0

【讨论】：

【解决方案3】：

第一个日期必须是日期格式

df$Date = as.Date(df$Date)

df %>% 
  full_join(data.frame(Date = seq(min(df$Date), max(df$Date), by = "days")),by = "Date") %>% 
  arrange(Date) %>% 
  fill(everything())

然后与仅包含数据库中整个日期序列的数据进行连接，我们对其进行排序并使用填充函数来填充它们

【讨论】：