【问题标题】:Conditional Matching and Extraction involving 2 data tables涉及2个数据表的条件匹配和提取
【发布时间】:2019-08-12 16:32:01
【问题描述】:

我有 2 个数据表,它们的输出如下:

dput(x)
structure(list(site = c("A", "B", "C"), date = c("2018-05-06 00:00:05", 
"2018-05-06 12:00:00", "2018-05-06 17:00:00")), .Names = c("site", 
"date"), row.names = c(NA, -3L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000000002570788>)


dput(y)
structure(list(sites = c("A", "A", "B"), vol = c(30, 40, 20), 
    date = structure(c(1525611600, 1525625640, 1525564805), class = c("POSIXct", 
    "POSIXt"), tzone = ""), pn = c("sp90", "sp70", "sp98")), .Names = c("sites", 
"vol", "date", "pn"), class = c("data.table", "data.frame"), row.names = c(NA, 
-3L), .internal.selfref = <pointer: 0x0000000002570788>)

生成的数据表应该是:

  site                date vol   pn
1:    A 2018-05-06 00:00:05  30 sp90
2:    A 2018-05-06 12:00:00  40 sp70
3:    B 2018-05-06 17:00:00  20 sp98

我需要先检查站点是否匹配,然后检查 x$date 是否小于 y$date,将 vol 和 pn 拉到 x。

有什么想法吗?

谢谢。

【问题讨论】:

    标签: r dplyr plyr tidyr


    【解决方案1】:

    你可以这样-

    library(data.table)
    setDT(x)[,date:=as.POSIXct(date)]
    setDT(y)[,date:=as.POSIXct(date)]
    
    x[, c("vol", "pn","site") := # Assign the below result to new columns
        x[y, # join
          .(vol, pn,site), # get the column you need
          on = .(site = sites, # join conditions
                 date < date 
          ), 
          mult = "last"]]
    

    输出-

    > x
       site                date vol   pn
    1:    A 2018-05-06 00:00:05  30 sp90
    2:    A 2018-05-06 12:00:00  40 sp70
    3:    B 2018-05-06 17:00:00  20 sp98
    

    编辑-

    您在问题中提供的数据集-

    x = structure(list(site = c("A", "B", "C"), 
                       date = c("2018-05-06 00:00:05", "2018-05-06 12:00:00", "2018-05-06 17:00:00")),
                      .Names = c("site","date"), row.names = c(NA, -3L), class = c("data.table", "data.frame"))
    
    
    y= structure(list(sites = c("A", "A", "B"),
                      vol = c(30, 40, 20), 
                      date = structure(c(1525611600, 1525625640, 1525564805),
                      class = c("POSIXct", "POSIXt"), tzone = ""),
                      pn = c("sp90", "sp70", "sp98")),
                     .Names = c("sites", "vol", "date", "pn"),
                      class = c("data.table", "data.frame"),
                      row.names = c(NA,-3L))
    

    【讨论】:

    • 我试过了,但由于某种原因它给出了一个错误“未找到站点”。
    • 我使用的是您提供的相同数据,它对我有用。
    猜你喜欢
    • 2020-01-10
    • 1970-01-01
    • 2013-07-06
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多