【问题标题】:How to get each customers entry and exit to the shop?如何让每位顾客进出店铺?
【发布时间】:2020-06-07 07:58:08
【问题描述】:

我有一个每个客户在不同时间被传感器捕获的数据集。因此客户将进入商店,并且将通过 sensor_id 1 捕获,并且客户也可以通过 sensor_id 2 进入。但是客户只能通过sensor_id 3.数据集如下:

library(data.table)
library(lubridate)
DT1 <- data.table(
  customer_id=c(1,1,1,2,2,2,1,1),
  sensor_id=c(1,2,3,1,2,3,2,3),
  in_time=c("2017-01-01 00:00:05","2017-01-01 00:06:35","2017-01-01 00:23:44","2017-01-02 22:00:20","2017-01-02 22:01:09","2017-01-02 22:28:02","2017-01-03 22:23:01","2017-01-03 22:50:52")
  )

DT1[,in_time:=ymd_hms(in_time)]

所以从这里,我想得到数据框

result <- data.table(
  customer_id=c(1,2,1),
  entry_sensor_id=c(1,1,2),
  entry_time=c("2017-01-01 00:00:05","2017-01-02 22:00:20","2017-01-03 22:23:01"),
  entry_sensor_id=c(3,3,3),
  exit_Time=c("2017-01-01 00:23:44","2017-01-02 22:28:02","2017-01-03 22:50:52")

)

所以我尝试了以下方法:

DT1[, spotted_group := rleid( cumsum(difftime(in_time, 
                                                          shift(in_time, fill = first(in_time)), units = "mins") > 120)), customer_id]


DT1Stretch=DT1[ DT1[order(in_time), .I[c(1L,.N)], by=list(customer_id,spotted_group)]$V1 ]

DT1Stretch[,c(.SD[1,] , .SD[2,]),by=c("customer_id","spotted_group")]

但是如果客户在 2 小时内返回商店,这种方法将不起作用,因为我根据 120 分钟的差异标记了 Spotted_group,这并不理想。

不确定哪种方法是解决我的问题的正确方法。任何帮助表示赞赏。

给留在店里和得到的顾客的群体贴上标签

【问题讨论】:

    标签: r dplyr data.table


    【解决方案1】:

    这是另一个使用 unique 的选项,通过退出传感器和滚动加入后的时间:

    unique(
      DT1[sensor_id==3L][DT1[sensor_id!=3L], on=.(customer_id, in_time), roll=-Inf,
        .(customer_id, entry_sensor_id=i.sensor_id, entry_time=i.in_time,
          exit_sensor_id=3L, exit_time=x.in_time)],
      by=c("customer_id", "exit_sensor_id", "exit_time"))
    

    【讨论】:

      【解决方案2】:

      这是否回答了你的问题:

      library(data.table)
      library(lubridate)
      
      DT <- data.table(
        customer_id=c(1,1,1,2,2,2,1,1),
        sensor_id=c(1,2,3,1,2,3,2,3),
        in_time=c("2017-01-01 00:00:05","2017-01-01 00:06:35","2017-01-01 00:23:44","2017-01-02 22:00:20","2017-01-02 22:01:09","2017-01-02 22:28:02","2017-01-03 22:23:01","2017-01-03 22:50:52")
      )
      DT[,in_time:=lubridate::ymd_hms(in_time)]
      
      # For both sensors 1&2 customer is in
      DT[, customer_in:= ifelse(sensor_id %in% c(1,2),T,F)]
      
      # Aggregate sensors 1 & 2, find first entry time
      inout <- DT[order(customer_id,in_time)][, .(in_time = min(in_time),customer_in) , by = .(customer_id,rleid(customer_in),customer_in)]
      
      # Separate entry & exit
      entry <- inout[customer_in == T]
      exit <- inout[customer_in == F]
      
      # Join results
      entry[exit,.(customer_id,in_time=x.in_time,out_time=in_time),roll=Inf, on=.(customer_id,in_time)]
      
         customer_id             in_time            out_time
      1:           1 2017-01-01 00:00:05 2017-01-01 00:23:44
      2:           2 2017-01-02 22:00:20 2017-01-02 22:28:02
      3:           1 2017-01-03 22:23:01 2017-01-03 22:50:52
      

      【讨论】:

      • 这适用于小型数据集。但是对于较大的数据集,我没有在数据表 inout 中为客户获取顺序 rleid,并且无法选择 min(in_time) 而是选择所有条目(通过 sensor_id 1,2 ) 不仅是 min (in_time)。知道为什么吗?
      • 编辑:当我按 cutomer_id 订购时,我只能订购,但不是按 customer_id 订购,而是按 in_time 订购,但它失败了
      • 提议的逻辑假设in_time是递增的,客户ID不应该有影响,因为它属于by。
      • 但是,为了确保我使用 customer_id,in_time 的订单更新了代码。如果现在在更大的数据集上一切正常,请告诉我。
      猜你喜欢
      • 1970-01-01
      • 2020-01-24
      • 1970-01-01
      • 2012-10-30
      • 1970-01-01
      • 2021-03-20
      • 1970-01-01
      • 2017-03-24
      • 1970-01-01
      相关资源
      最近更新 更多