所以对于测试数据中显示的示例案例,只有一天的数据,GMB 的解决方案可以正常工作。
一旦你进入很多天(可以/不能有重叠的商店访问,让我们假装你不能在商店过夜)
可以通过以下方式修复:
select t.hour::date, t.customer_id, min(t.hour) min_hour, max(t.hour) max_hour
from mytable t
group by 1,2
但是多个条目,ether 需要标签数据,例如:
with mytable as (
select * from values
('2019-04-01 09:00:00','x','in')
,('2019-04-01 15:00:00','x','out')
,('2019-04-02 12:00:00','x','in')
,('2019-04-02 14:00:00','x','out')
v(hour, customer_id, state)
)
或者为了它被推断:
with mytable as (
select * from values ('2019-04-01 09:00:00','x','in'),('2019-04-01 15:00:00','x','out')
,('2019-04-02 12:00:00','x','in'),('2019-04-02 14:00:00','x','out')
v(hour, customer_id, state)
)
select hour::date as day
,hour
,customer_id
,state
,BITAND(row_number() over(partition by day, customer_id order by hour), 1) = 1 AS in_dir
from mytable
order by 3,1,2;
给予:
DAY HOUR CUSTOMER_ID STATE IN_DIR
2019-04-01 2019-04-01 09:00:00 x in TRUE
2019-04-01 2019-04-01 15:00:00 x out FALSE
2019-04-02 2019-04-02 12:00:00 x in TRUE
2019-04-02 2019-04-02 14:00:00 x out FALSE
现在这可以与 LAG 和 QUALIFY 一起使用以获得可以处理多条目的真实范围:
select customer_id
,day
,hour
,lead(hour) over (partition by customer_id, day order by hour) as exit_time
from infer_direction
qualify in_dir = true
它的工作原理是为每一天/客户的所有行获取下一次,然后(通过资格)只保留行'in' rows。
然后我们可以加入一天中的时间:
select dateadd('hour', row_number() over(order by null) - 1, '00:00:00'::time) as hour
from table (generator(rowcount => 24))
因此一切都交织在一起
with mytable as (
select hour::timestamp as hour, customer_id, state
from values
('2019-04-01 09:00:00','x','in')
,('2019-04-01 12:00:00','x','out')
,('2019-04-02 13:00:00','x','in')
,('2019-04-02 14:00:00','x','out')
,('2019-04-02 9:00:00','x','in')
,('2019-04-02 10:00:00','x','out')
v(hour, customer_id, state)
), infer_direction AS (
select hour::date as day
,hour::time as hour
,customer_id
,state
,BITAND(row_number() over(partition by day, customer_id order by hour), 1) = 1 AS in_dir
from mytable
), visit_ranges as (
select customer_id
,day
,hour
,lead(hour) over (partition by customer_id, day order by hour) as exit_time
from infer_direction
qualify in_dir = true
), time_of_day AS (
select dateadd('hour', row_number() over(order by null) - 1, '00:00:00'::time) as hour
from table (generator(rowcount => 24))
)
select t.customer_id
,t.day
,h.hour
from visit_ranges as t
join time_of_day h on h.hour between t.hour and t.exit_time
order by 1,2,3;
我们得到:
CUSTOMER_ID DAY HOUR
x 2019-04-01 09:00:00
x 2019-04-01 10:00:00
x 2019-04-01 11:00:00
x 2019-04-01 12:00:00
x 2019-04-02 09:00:00
x 2019-04-02 10:00:00
x 2019-04-02 13:00:00
x 2019-04-02 14:00:00