【问题标题】:Incorporating Dplyr Join and Set Operations into a Custom Function将 Dplyr Join 和 Set 操作合并到自定义函数中
【发布时间】:2016-11-28 04:12:42
【问题描述】:

我有下面两个简单的数据框。我想使用 dplyr 和 tidyverse 来查找第二个数据帧(Df2)的“Task2”中不在第一个数据帧(Df)的“Task”中的类别。我想为此使用 dplyr 的“setdiff”功能。另外,我想保留第二个数据帧(Df2)的“时间”列中的相应时间。

因此,最终产品应包括两行,一排为客户“Chris”的“Iron shirt”,总时间为 30 次,一排为客户“Eric”,带有“Buy groceries”,以及对应的时间为 8。

我还想删除日期列。

我在想一种方法是使用 dplyr 的“setdiff”函数(我意识到必须更改 Task 和 Task2 列名以便它们匹配)将两行分开,然后重新加入总数加入函数的时间。

最后,我希望这是一个自定义函数,因为我将不得不重复执行此任务。我想要一个像“Differences(Df1,Df2)”这样的函数......所以我可以输入两个数据帧,然后得到结果。

我希望这不是要求太多!我是自定义函数的新手,尤其是包含 dplyr 和管道的函数。

希望有人可以帮助我!

CaseWorker<-c("John","John","Kim")

Client<-c("Chris","Chris","Eric")

Task<-c("Feed cat","Make dinner","Do homework")

Date<-c("10/27/2016","09/22/2016","10/11/2016")

Df<-data.frame(CaseWorker,Client,Date,Task)

第二个数据帧...

CaseWorker<-c("John","John","John","John","John","John","John","John","John",
          "John","Kim","Kim","Kim")

Client<-c("Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Eric","Eric","Eric")

Date<-c("11/10/2016","10/10/2016","11/13/2016","09/18/2016","11/11/2016","09/19/2016","08/08/2016","10/10/2016","08/05/2016","11/12/2016","09/09/2016","11/11/2016","09/10/2016")

Task2<-c("Feed cat","Feed cat","Feed cat","Feed cat","Feed cat","Make dinner","Make dinner","Make dinner","Iron shirt","Iron shirt","Do homework",
"Do homework","Buy groceries")

Time<-c(20,34,11,10,5,6,55,30,20,10,12,10,8)

Df2<-data.frame(CaseWorker,Client,Date,Task2,Time)

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    我们可以使用anti_join

    library(dplyr)
    anti_join(Df2, Df, by = c("Task2"="Task")) %>%
             group_by(CaseWorker,Client, Task2) %>% 
             summarise(Time = sum(Time))
    #    CaseWorker Client         Task2  Time
    #        <fctr> <fctr>        <fctr> <dbl>
    #1       John  Chris    Iron shirt    30
    #2        Kim   Eric Buy groceries     8
    

    如果我们需要转换为函数

    DiffGoals <- function(dat1, dat2) {
                anti_join(dat1, dat2, by = c("Task2" = "Task")) %>%
                       group_by(CaseWorker, Client, Task2) %>%
                       summarise(Time = sum(Time))
     }
    
    DiffGoals(Df2, Df)
    

    【讨论】:

    • 谢谢!该解决方案比我想象的要简单得多,而且效果很好。由于它只有三行,我认为自定义函数不是必需的,但出于好奇,我仍然想知道如何实现这一点。可能叫做“DiffGoals(Df1,Df2)?这很容易做到吗?
    猜你喜欢
    • 1970-01-01
    • 2019-04-05
    • 2022-01-20
    • 2021-03-16
    • 1970-01-01
    • 1970-01-01
    • 2015-05-13
    • 2014-05-25
    • 2012-06-11
    相关资源
    最近更新 更多