【问题标题】:R Increasing Variable Based on Previous OccurrencesR 基于先前出现的增加变量
【发布时间】:2017-10-12 22:30:18
【问题描述】:

我有一个按日期排序的餐厅检查数据框。对于每个观察,我想添加两个额外的变量来记录这家餐厅总共进行了多少次检查,以及他们失败了多少次。我想避免使用 for 循环,但我不知道该怎么做。本质上,我目前有一个数据框,由下面数据框的前三列组成,我想添加最后两列。

初始数据帧

    Restaurant_ID    Date         Result
    1                01/02/2011   Pass 
    2                02/05/2011   Pass
    3                04/07/2011   Fail
    1                09/05/2011   Fail
    2                03/13/2012   Pass
    1                08/25/2012   Fail
    2                09/25/2012   Pass
    3                01/05/2013   Pass

所需的输出 1

Restaurant_ID    Date         Result   total_inspect  failed_inspect
1                01/02/2011   Pass     1              0
2                02/05/2011   Pass     1              0
3                04/07/2011   Fail     1              1
1                09/05/2011   Fail     2              1
2                03/13/2012   Pass     2              0
1                08/25/2012   Fail     3              2
2                09/25/2012   Pass     3              0
3                01/05/2013   Pass     2              1

编辑:我意识到我实际上希望最后两列反映当前观察之前的总检查次数和失败检查次数。所以我真正想要的是

所需输出 2

    Restaurant_ID    Date         Result   past_inspect  past_failed_inspect
    1                01/02/2011   Pass     0              0
    2                02/05/2011   Pass     0              0
    3                04/07/2011   Fail     0              0
    1                09/05/2011   Fail     1              0
    2                03/13/2012   Pass     1              0
    1                08/25/2012   Fail     2              1
    2                09/25/2012   Pass     2              0
    3                01/05/2013   Pass     1              1

【问题讨论】:

    标签: r for-loop dataframe


    【解决方案1】:

    此解决方案使用包 tidyverselubridate 中的函数。

    # Create the example data frame
    dt1 <- read.csv(text = "Restaurant_ID,Date,Result
    1,01/02/2011,Pass
    2,02/05/2011,Pass
    3,04/07/2011,Fail
    1,09/05/2011,Fail
    2,03/13/2012,Pass
    1,08/25/2012,Fail
    2,09/25/2012,Pass
                   3,01/05/2013,Pass",
                   stringsAsFactors = FALSE)
    
    # Load packages
    library(tidyverse)
    library(lubridate)
    
    dt2 <- dt1 %>%
      # Convert the Date column to Date class
      mutate(Date = mdy(Date)) %>%
      # Sort the data frame based on Restaurant_ID and Date
      arrange(Restaurant_ID, Date) %>%
      # group the data by each restaurant ID
      group_by(Restaurant_ID) %>%
      # Create a column showing total_inspect
      mutate(total_inspect = 1:n()) %>%
      # Create a column showing fail_result, fail is 1, pass is 0
      mutate(fail_result = ifelse(Result == "Fail", 1, 0)) %>%
      # Calculate the cumulative sum of fail_result
      mutate(failed_inspect = cumsum(fail_result)) %>%
      # Remove fail_result
      select(-fail_result) %>%
      # Sort the data frame by Date
      arrange(Date)
    

    编辑:计算过去的检查和失败次数

    dt3 <- dt2 %>%
      mutate(past_inspect = ifelse(total_inspect == 0, 0, total_inspect - 1)) %>%
      mutate(past_failed_inspect = ifelse(Result == "Fail" & failed_inspect != 0, 
                                          failed_inspect - 1,
                                          failed_inspect)) %>%
      select(-total_inspect, -failed_inspect)
    

    【讨论】:

    • 谢谢你,这太棒了!我以前不知道 tidyverse 包,所以我很感谢你指出我的方向。我希望你能帮助我完成我刚刚添加的编辑。我希望总检查和失败检查能够反映当前观察之前的检查。起初我以为我可以从两列中减去 1,但这对于 past_failed_inspect 不起作用,如最后一行所示,餐厅 3 的 failed_inspect 和 past_failed_inspect 相同。
    • @person10559 请查看我的更新。 dt3Desired output 2
    • 这非常有帮助,非常感谢!
    猜你喜欢
    • 2021-07-26
    • 1970-01-01
    • 2017-04-13
    • 1970-01-01
    • 2015-08-01
    • 2013-08-06
    • 2022-01-05
    • 2016-03-30
    • 2021-11-04
    相关资源
    最近更新 更多