【问题标题】:Retrieving value based on matched dates根据匹配日期检索值
【发布时间】:2021-09-17 20:34:45
【问题描述】:

我有两个数据框。第一个包含具有相应开始和结束时间的事件。第二个包含每分钟不同 ID 的价格。往下看:

Event                       starttime             endtime
Change in Nonfarm Payrolls  2020-03-06 08:15:00   2020-03-06 09:00:00
Change in Nonfarm Payrolls  2020-02-07 08:15:00   2020-02-07 09:00:00
Change in Nonfarm Payrolls  2020-01-10 08:15:00   2020-01-10 09:00:00
Change in Nonfarm Payrolls  2020-01-10 08:15:00   2020-01-10 09:00:00
Price    date_time             ID
24813    2020-03-06 08:14:00   DJ
24763    2020-03-06 08:15:00   DJ
24750    2020-03-06 08:16:00   DJ
24725    2020-03-06 08:17:00   DJ

我想从第二个数据集(开始时间和结束时间)中获取价格和 ID,并将其添加到第一个数据集中。我试过像这样使用ifelse,但它不起作用。

df1$startprice <- ifelse(df1$starttime == df2$date_time, df2$Price, "no")

有人可以帮帮我吗?

重现数据:(对于第一个事件,包括开始和结束时间)

df1 <- structure(list(Event = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("Change in Nonfarm Payrolls"), class = "factor"), 
                    starttime = structure(c(1583478900, 1581059700, 1578640500, 1578640500, 1581059700), class = c("POSIXct", "POSIXt"), tzone = ""), 
                    endtime = structure(c(1583481600, 1581062400, 1578643200, 1578643200, 1581062400), class = c("POSIXct","POSIXt"), tzone = "")), row.names = c(NA, 5L), class = "data.frame")
df2 <- structure(list(Price = c(24813, 24763, 24750, 24725, 
                                      24746, 24735, 24755, 24735, 24735, 24744, 24762, 24763, 24773, 
                                      24773, 24778, 24832, 24856, 24845, 24842, 24902, 24934, 24854, 
                                      24888, 24914, 24922, 24875, 24896, 24853, 24834, 24845, 24886, 
                                      24872, 24844, 24846, 24860, 24812, 24791, 24767, 24765, 24756, 
                                      24745, 24791, 24800, 24789, 24787, 24887, 24876, 24911), date_time = structure(c(1583478840, 
                                                                                                                                    1583478900, 1583478960, 1583479020, 1583479080, 1583479140, 1583479200, 
                                                                                                                                    1583479260, 1583479320, 1583479380, 1583479440, 1583479500, 1583479560, 
                                                                                                                                    1583479620, 1583479680, 1583479740, 1583479800, 1583479860, 1583479920, 
                                                                                                                                    1583479980, 1583480040, 1583480100, 1583480160, 1583480220, 1583480280, 
                                                                                                                                    1583480340, 1583480400, 1583480460, 1583480520, 1583480580, 1583480640, 
                                                                                                                                    1583480700, 1583480760, 1583480820, 1583480880, 1583480940, 1583481000, 
                                                                                                                                    1583481060, 1583481120, 1583481180, 1583481240, 1583481300, 1583481360, 
                                                                                                                                    1583481420, 1583481480, 1583481540, 1583481600, 1583481660), class = c("POSIXct", 
                                                                                                                                                                                                           "POSIXt"), tzone = ""), ID = c("DJ", "DJ", "DJ", 
                                                                                                                                                                                                                                                          "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", 
                                                                                                                                                                                                                                                          "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", 
                                                                                                                                                                                                                                                          "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", 
                                                                                                                                                                                                                                                          "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", "DJ", 
                                                                                                                                                                                                                                                          "DJ")), row.names = 62835:62882, class = "data.frame")

提前致谢! 亲切的问候, 于尔根

【问题讨论】:

    标签: r date if-statement matching economics


    【解决方案1】:

    您可以先使用starttime 两次merge,然后再使用endtime

    merge(df1, transform(df2, start_time_price = Price)[-1], 
          by.x = 'starttime', by.y = 'date_time') |>
      merge(transform(df2, end_time_price = Price)[-1], 
            by.x = c('ID', 'endtime'), by.y = c('ID', 'date_time'))
    

    如果您想在最终输出中保留df1 的所有行,请使用merge 中的all.x = TRUE。如果您使用的是旧版本的 R,则在 R 4.1 中引入了管道运算符 (|&gt;) -

    merge(merge(df1, transform(df2, start_time_price = Price)[-1], 
          by.x = 'starttime', by.y = 'date_time'), 
      transform(df2, end_time_price = Price)[-1], 
            by.x = c('ID', 'endtime'), by.y = c('ID', 'date_time'))
    

    【讨论】:

      【解决方案2】:

      我假设您尝试通过将第二个数据集的 date_time 与第一个数据集的 starttime 匹配,将第二个数据集的 PriceID 添加到第一个数据集。

      在这种情况下,可以使用 dplyr 的left_join

      library(dplyr)
      df1 %>% left_join(df2, by = c('starttime' = 'date_time'))
      

      输出:

                             Event           starttime             endtime Price   ID
      1 Change in Nonfarm Payrolls 2020-03-06 15:15:00 2020-03-06 16:00:00 24763   DJ
      2 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00    NA <NA>
      3 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00    NA <NA>
      4 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00    NA <NA>
      5 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00    NA <NA>
      

      更新:
      您想在starttimeendtimePrice 处获得Price

      您可以通过管道将另一个left_join 连接到之前的代码,这次链接df1 的endtime 而不是starttime

      combinedPrice <- df1 %>% left_join(df2, by = c('starttime' = 'date_time')) %>% left_join(df2, by = c('endtime' = 'date_time'))
      

      combinedPrice的输出:

      Event                           starttime             endtime Price.x ID.x Price.y ID.y
      1 Change in Nonfarm Payrolls 2020-03-06 15:15:00 2020-03-06 16:00:00   24763   DJ   24876   DJ
      2 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00      NA <NA>      NA <NA>
      3 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00      NA <NA>      NA <NA>
      4 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00      NA <NA>      NA <NA>
      5 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00      NA <NA>      NA <NA>
      

      起始价格和结束价格分别命名为Price.xPrice.y。此外,我们有 2 个 ID 列作为连接的结果。我们可以像这样重命名价格列并删除 1 个 ID 列:

      combinedPrice %>% rename('PriceStart' = Price.x, 'PriceEnd' = Price.y, 'ID' = ID.y) %>% select(-ID.x)
      

      输出:

        Event                           starttime             endtime   PriceStart PriceEnd   ID
      1 Change in Nonfarm Payrolls 2020-03-06 15:15:00 2020-03-06 16:00:00      24763    24876   DJ
      2 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00         NA       NA <NA>
      3 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00         NA       NA <NA>
      4 Change in Nonfarm Payrolls 2020-01-10 15:15:00 2020-01-10 16:00:00         NA       NA <NA>
      5 Change in Nonfarm Payrolls 2020-02-07 15:15:00 2020-02-07 16:00:00         NA       NA <NA>
      

      【讨论】:

      • 非常感谢!我想知道您是否也知道如何获取开始时间的价格和结束时间的价格?导致两个单独的价格列。
      • 我现在正在考虑复制df2,将Price 的变量名更改为Price2,然后用endtime 而不是starttime 重复这个公式。这可能会起作用,但有点麻烦......
      • 更新帖子以回答您的问题:)
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-02-27
      • 1970-01-01
      • 2013-09-12
      • 1970-01-01
      相关资源
      最近更新 更多