【问题标题】:Flagging a row in a data set referring to conditions in another data set标记数据集中引用另一个数据集中条件的行
【发布时间】:2017-11-22 21:55:57
【问题描述】:

我的场景如下:

我有两个销售活动,A 和 B 如果 A 成功,它会导致新的活动 B

A 和 B 有自己的具有匹配结构的数据集。但是,没有存储信息表明哪个 A 已经成功并导致了哪个 B。这就是我要标记的内容。

我尝试应用的规则如下: A$datedone = B$datecreated AND A$组织 = B$组织

换句话说: 如果完成 A 的日期 = 创建 B 的日期 AND 如果组织 A = B 上的组织

如何检查两个数据集之间的这些条件,然后将 TRUE/FALSE 存储到数据集 A 中每个记录的新变量中?

【问题讨论】:

标签: r conditional


【解决方案1】:

在不了解每个数据帧的数据结构的情况下很难得出一个可靠的答案,但您可以利用 dplyr 的 left_join 函数来获得您想要的结果。这样的事情可能会帮助您获得所需的内容。

library(dplyr)

# Create test dataframes
A <- data.frame(datedone = c("1/1/2017", "2/2/2017","3/3/2017", "4/4/2017"),
                organization = c("org1","org1","org2","org3"),
                someotherdata = c("d1","d2","d3","d4"),
                stringsAsFactors = FALSE)


B <- data.frame(datecreated = c("1/1/2017", "2/4/2017","3/3/2017", "4/4/2017"),
                organization = c("org1","org1","org2","org3"),
                someotherdata = c("d1","d2","d3","d4"),
                stringsAsFactors = FALSE)

# Add column to each dataframe to act as an identifier
A$fromdf <- "A"
B$fromdf <- "B"

# Left join dataframe B onto dataframe A
AB <- A %>%
      left_join(B, by = c("datedone" = "datecreated", "organization" = "organization")) %>%
      # Create a new column to show if there was a match, based on whether a value is present in fromdf.y
      mutate(ABmatch = ifelse(is.na(fromdf.y), FALSE, TRUE)) %>%
      # Select whichever columns are needed from the dataframe
      select(datedone, organization, someotherdata = someotherdata.x, ABmatch)

【讨论】:

    猜你喜欢
    • 2017-01-10
    • 1970-01-01
    • 2018-08-28
    • 2019-07-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多