【问题标题】:Normalize transactions data from time and status columns to minutes per status value将时间和状态列中的事务数据标准化为每个状态值的分钟数
【发布时间】:2017-10-15 06:12:20
【问题描述】:

我有一张用户状态变化表,例如:

insert_time     status
1/1/2017 0:00   AVAILABLE
1/1/2017 0:15   BUSY
1/1/2017 0:30   NOT AVAILABLE
1/1/2017 1:30   AVAILABLE
1/1/2017 3:10   BUSY
1/1/2017 5:00   NOT AVAILABLE

例如:此用户在 00:00 到 00:15 之间有空,从 00:15 到 00:30 忙,以此类推。

为了分析数据,我需要将其转换为这种结构:

day       hour  available minutes   not available minutes   busy minutes
1/1/2017     0                 15                      30             15
1/1/2017     1                 30                      30              0
1/1/2017     2                 60                       0              0
1/1/2017     3                 10                       0             50
1/1/2017     4                  0                       0             60

其中包括状态未更改的小时数的数据。

我认为这不是一个简单的 PIVOT 查询,因为我需要将单行分成几列,包括没有数据的小时数。

如何在 Oracle SQL 查询中执行此操作?

【问题讨论】:

标签: sql oracle normalization


【解决方案1】:

这种查询的一个解决方案涉及两个部分:类别生成,然后聚合到生成的类别中。

对于您提供的数据,这种解决方案的第一步是按小时对数据进行存储(由于您提供的数据在 02:00 或 04:00 时没有任何事件,因此在最终结果中显示这些小时,它们可以生成)。

第二部分是通过pivot 聚合到每小时存储桶中,正如 Jorge Campos 在 cmets 中提到的那样。

下面是一个例子。

首先创建一个测试表:

CREATE TABLE INSERT_TIME_STATUS(
  INSERT_TIME TIMESTAMP,
  STATUS VARCHAR2(128)
);

并添加测试数据:

INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 00:00:00', 'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 00:15:00', 'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 00:30:00', 'NOT AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 01:30:00', 'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 03:10:00', 'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (TIMESTAMP '2017-01-01 05:00:00', 'NOT AVAILABLE');

然后创建查询。这将使用子查询分解来概述此过程的两步性质。

这里的CALENDAR 子因子将在一天中的每个小时生成,无论该小时内是否发生了任何记录。

HOUR_CALENDAR 子因素会将每个提供的状态记录分配给特定的小时,并将跨越另一个小时的状态分割成碎片,因此所有记录都适合一个小时的跨度。

DURATION_IN_STATUS 子因素将计算每个状态在每小时内处于活动状态的分钟数。

最终查询将 PIVOT 汇总 (SUM) 每个 STATUS 在每小时内处于活动状态的时间量。

WITH HOUR_OF_DAY AS (SELECT LEVEL - 1 AS THE_HOUR
                     FROM DUAL
                     CONNECT BY LEVEL < 25),
    CALENDAR AS (SELECT DAY_START
                 FROM (
                   SELECT (TIMESTAMP '2017-01-01 00:00:00' + NUMTODSINTERVAL(DATE_INCREMENT.OFFSET, 'DAY')) AS DAY_START
                   FROM (SELECT LEVEL - 1 AS OFFSET
                         FROM DUAL
                         CONNECT BY LEVEL < 9999) DATE_INCREMENT)
                 WHERE DAY_START BETWEEN (SELECT MIN(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                                          FROM INSERT_TIME_STATUS)
                 AND (SELECT MAX(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                      FROM INSERT_TIME_STATUS)),
    HOUR_CALENDAR AS (
     SELECT
       TO_CHAR(CALENDAR.DAY_START, 'MM/DD/YYYY')                                               AS THE_DAY,
       HOUR_OF_DAY.THE_HOUR,
       CALENDAR.DAY_START + NUMTODSINTERVAL(HOUR_OF_DAY.THE_HOUR, 'HOUR')                      AS HOUR_START,
       (SELECT MAX(INSERT_TIME_STATUS.STATUS)
       KEEP (DENSE_RANK LAST
         ORDER BY INSERT_TIME_STATUS.INSERT_TIME ASC)
        FROM INSERT_TIME_STATUS
        WHERE INSERT_TIME_STATUS.INSERT_TIME <= DAY_START + NUMTODSINTERVAL(THE_HOUR, 'HOUR')) AS HOUR_START_STATUS
     FROM CALENDAR
       CROSS JOIN HOUR_OF_DAY),
    ALL_HOUR_STATUS AS (
    SELECT
      HOUR_CALENDAR.THE_DAY,
      HOUR_CALENDAR.THE_HOUR,
      HOUR_CALENDAR.HOUR_START        AS THE_TIME,
      HOUR_CALENDAR.HOUR_START_STATUS AS THE_STATUS
    FROM HOUR_CALENDAR
    UNION ALL
    SELECT
      HOUR_CALENDAR.THE_DAY,
      HOUR_CALENDAR.THE_HOUR,
      INSERT_TIME_STATUS.INSERT_TIME AS THE_TIME,
      INSERT_TIME_STATUS.STATUS      AS THE_STATUS
    FROM HOUR_CALENDAR
      INNER JOIN INSERT_TIME_STATUS
        ON HOUR_CALENDAR.HOUR_START < INSERT_TIME_STATUS.INSERT_TIME
           AND HOUR_CALENDAR.THE_HOUR = EXTRACT(HOUR FROM INSERT_TIME_STATUS.INSERT_TIME)),
    DURATION_IN_STATUS AS (
     SELECT
       ALL_HOUR_STATUS.THE_DAY,
       ALL_HOUR_STATUS.THE_HOUR,
       ALL_HOUR_STATUS.THE_STATUS,
       (EXTRACT(HOUR FROM
                (COALESCE(LEAD(THE_TIME)
                          OVER (
                            PARTITION BY NULL
                            ORDER BY THE_TIME ASC ), TO_TIMESTAMP(THE_DAY, 'MM/DD/YYYY') + NUMTODSINTERVAL(THE_HOUR + 1, 'HOUR')) - THE_TIME)) * 60)
       +
       EXTRACT(MINUTE FROM
               (COALESCE(LEAD(THE_TIME)
                         OVER (
                           PARTITION BY NULL
                           ORDER BY THE_TIME ASC ), TO_TIMESTAMP(THE_DAY, 'MM/DD/YYYY') + NUMTODSINTERVAL(THE_HOUR + 1, 'HOUR')) - THE_TIME))
         AS DURATION_IN_STATUS
     FROM ALL_HOUR_STATUS)
SELECT
  THE_DAY,
  THE_HOUR,
  COALESCE(AVAILABLE, 0)     AS AVAILABLE,
  COALESCE(NOT_AVAILABLE, 0) AS NOT_AVAILABLE,
  COALESCE(BUSY, 0)          AS BUSY
FROM DURATION_IN_STATUS
PIVOT (SUM(DURATION_IN_STATUS)
  FOR THE_STATUS
  IN ('AVAILABLE' AS AVAILABLE, 'NOT AVAILABLE' AS NOT_AVAILABLE, 'BUSY' AS BUSY)
)
ORDER BY THE_DAY ASC, THE_HOUR ASC;

结果:

THE_DAY     THE_HOUR  AVAILABLE  NOT_AVAILABLE  BUSY  
01/01/2017  0         15         30             15    
01/01/2017  1         30         30             0     
01/01/2017  2         60         0              0     
01/01/2017  3         10         0              50    
01/01/2017  4         0          0              60    
01/01/2017  5         0          60             0     
01/01/2017  6         0          60             0     
01/01/2017  7         0          60             0     
01/01/2017  8         0          60             0     
01/01/2017  9         0          60             0     
01/01/2017  10        0          60             0     
01/01/2017  11        0          60             0     
01/01/2017  12        0          60             0     
01/01/2017  13        0          60             0     
01/01/2017  14        0          60             0     
01/01/2017  15        0          60             0     
01/01/2017  16        0          60             0     
01/01/2017  17        0          60             0     
01/01/2017  18        0          60             0     
01/01/2017  19        0          60             0     
01/01/2017  20        0          60             0     
01/01/2017  21        0          60             0     
01/01/2017  22        0          60             0     
01/01/2017  23        0          60             0     


24 rows selected. 

此示例查询生成一整天的记录。所以NOT AVAILABLE 的最后一个状态得以延续。如果您想在最后分配的状态时停止,可以根据需要调整此行为。

编辑,为响应您的更新,以根据 channel_iduser_id 评估这些时间,这是另一个示例:

首先创建测试表:

CREATE TABLE INSERT_TIME_STATUS(
  USER_ID NUMBER,
  CHANNEL_ID NUMBER,
  INSERT_TIME TIMESTAMP,
  STATUS VARCHAR2(128)
);

并加载它(这里 user_id=1 在频道 3 和 4 上,而 user_id=2 只在频道 3 上):

INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 0:00','MM/DD/YYYY HH24:MI'),'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 0:15','MM/DD/YYYY HH24:MI'),'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 0:30','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 1:30','MM/DD/YYYY HH24:MI'),'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 3:10','MM/DD/YYYY HH24:MI'),'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,3,TO_TIMESTAMP('1/1/2017 5:00','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 0:00','MM/DD/YYYY HH24:MI'),'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 0:15','MM/DD/YYYY HH24:MI'),'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 0:30','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 1:30','MM/DD/YYYY HH24:MI'),'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 3:10','MM/DD/YYYY HH24:MI'),'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (1111,4,TO_TIMESTAMP('1/1/2017 5:00','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 0:00','MM/DD/YYYY HH24:MI'),'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 0:15','MM/DD/YYYY HH24:MI'),'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 0:30','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 1:30','MM/DD/YYYY HH24:MI'),'AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 3:10','MM/DD/YYYY HH24:MI'),'BUSY');
INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 5:00','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');
INSERT INTO INSERT_TIME_STATUS VALUES (2222,3,TO_TIMESTAMP('1/1/2017 5:00','MM/DD/YYYY HH24:MI'),'NOT AVAILABLE');

然后更新查询以生成数据 per-user_id per-channel_id。在此示例中,所有时间都包含每个用户参与的所有频道的数据。用户 1 将计算频道 34 一天中的每个小时的计数,而用户 2 将仅计算频道 3 的一天中每个小时的计数(如果它在另一个频道上有记录,则包括该频道以及)。

WITH HOUR_OF_DAY AS (SELECT LEVEL - 1 AS THE_HOUR
                     FROM DUAL
                     CONNECT BY LEVEL < 25),
    CALENDAR AS (SELECT DAY_START
                 FROM (
                   SELECT ((SELECT MIN(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                            FROM INSERT_TIME_STATUS) + NUMTODSINTERVAL(DATE_INCREMENT.OFFSET, 'DAY')) AS DAY_START
                   FROM (SELECT LEVEL - 1 AS OFFSET
                         FROM DUAL
                         CONNECT BY LEVEL < 9999) DATE_INCREMENT)
                 WHERE DAY_START BETWEEN (SELECT MIN(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                                          FROM INSERT_TIME_STATUS)
                 AND (SELECT MAX(TRUNC(INSERT_TIME_STATUS.INSERT_TIME))
                      FROM INSERT_TIME_STATUS)),
    USER_CHANNEL_HOUR_CALENDAR AS (
     SELECT
       USER_ID,
       CHANNEL_ID,
       CALENDAR.DAY_START,
       TO_CHAR(CALENDAR.DAY_START, 'MM/DD/YYYY')                                               AS THE_DAY,
       HOUR_OF_DAY.THE_HOUR,
       CALENDAR.DAY_START + NUMTODSINTERVAL(HOUR_OF_DAY.THE_HOUR, 'HOUR')                      AS HOUR_START
     FROM CALENDAR
       CROSS JOIN HOUR_OF_DAY
       --
       CROSS JOIN (SELECT UNIQUE USER_ID, CHANNEL_ID FROM INSERT_TIME_STATUS)
  ),
    HOUR_CALENDAR AS (
     SELECT USER_ID,
       CHANNEL_ID,
       THE_DAY,
       THE_HOUR,
       DAY_START,
       HOUR_START,
       (SELECT MAX(INSERT_TIME_STATUS.STATUS)
       KEEP (DENSE_RANK LAST
         ORDER BY INSERT_TIME_STATUS.INSERT_TIME ASC)
        FROM INSERT_TIME_STATUS
        WHERE INSERT_TIME_STATUS.INSERT_TIME <= DAY_START + NUMTODSINTERVAL(THE_HOUR, 'HOUR')
              AND INSERT_TIME_STATUS.USER_ID = USER_ID
              AND INSERT_TIME_STATUS.CHANNEL_ID = CHANNEL_ID) AS HOUR_START_STATUS
     FROM USER_CHANNEL_HOUR_CALENDAR),
    ALL_HOUR_STATUS AS (
    SELECT
      HOUR_CALENDAR.USER_ID,
      HOUR_CALENDAR.CHANNEL_ID,
      HOUR_CALENDAR.THE_DAY,
      HOUR_CALENDAR.THE_HOUR,
      HOUR_CALENDAR.HOUR_START        AS THE_TIME,
      HOUR_CALENDAR.HOUR_START_STATUS AS THE_STATUS
    FROM HOUR_CALENDAR
    UNION ALL
    SELECT
      INSERT_TIME_STATUS.USER_ID,
      INSERT_TIME_STATUS.CHANNEL_ID,
      HOUR_CALENDAR.THE_DAY,
      HOUR_CALENDAR.THE_HOUR,
      INSERT_TIME_STATUS.INSERT_TIME AS THE_TIME,
      INSERT_TIME_STATUS.STATUS      AS THE_STATUS
    FROM HOUR_CALENDAR
      INNER JOIN INSERT_TIME_STATUS
        ON HOUR_CALENDAR.HOUR_START < INSERT_TIME_STATUS.INSERT_TIME
           AND HOUR_CALENDAR.THE_HOUR = EXTRACT(HOUR FROM INSERT_TIME_STATUS.INSERT_TIME)
           AND HOUR_CALENDAR.USER_ID = INSERT_TIME_STATUS.USER_ID
           AND HOUR_CALENDAR.CHANNEL_ID = INSERT_TIME_STATUS.CHANNEL_ID),
    DURATION_IN_STATUS AS (
     SELECT
       ALL_HOUR_STATUS.USER_ID,
       ALL_HOUR_STATUS.CHANNEL_ID,
       ALL_HOUR_STATUS.THE_DAY,
       ALL_HOUR_STATUS.THE_HOUR,
       ALL_HOUR_STATUS.THE_STATUS,
       (EXTRACT(HOUR FROM
                (COALESCE(LEAD(THE_TIME)
                          OVER (
                            PARTITION BY USER_ID, CHANNEL_ID
                            ORDER BY THE_TIME ASC ), TO_TIMESTAMP(THE_DAY, 'MM/DD/YYYY') + NUMTODSINTERVAL(THE_HOUR + 1, 'HOUR')) - THE_TIME)) * 60)
       +
       EXTRACT(MINUTE FROM
               (COALESCE(LEAD(THE_TIME)
                         OVER (
                           PARTITION BY USER_ID, CHANNEL_ID
                           ORDER BY THE_TIME ASC ), TO_TIMESTAMP(THE_DAY, 'MM/DD/YYYY') + NUMTODSINTERVAL(THE_HOUR + 1, 'HOUR')) - THE_TIME))
         AS DURATION_IN_STATUS
     FROM ALL_HOUR_STATUS)
SELECT
  USER_ID,
  CHANNEL_ID,
  THE_DAY,
  THE_HOUR,
  COALESCE(AVAILABLE, 0)     AS AVAILABLE,
  COALESCE(NOT_AVAILABLE, 0) AS NOT_AVAILABLE,
  COALESCE(BUSY, 0)          AS BUSY
FROM DURATION_IN_STATUS
PIVOT (SUM(DURATION_IN_STATUS)
  FOR THE_STATUS
  IN ('AVAILABLE' AS AVAILABLE, 'NOT AVAILABLE' AS NOT_AVAILABLE, 'BUSY' AS BUSY)
)
  -- You can additionally filter the result
  -- WHERE CHANNEL_ID IN (3,4)
  -- WHERE USER_ID = 12345
  -- WHERE THE_DAY > TO_CHAR(DATE '2017-01-01')
  -- etc.
ORDER BY USER_ID ASC, CHANNEL_ID ASC, THE_DAY ASC, THE_HOUR ASC;

然后测试一下:

USER_ID  CHANNEL_ID  THE_DAY     THE_HOUR  AVAILABLE  NOT_AVAILABLE  BUSY  
1111     3           01/01/2017  0         15         30             15    
1111     3           01/01/2017  1         30         30             0     
1111     3           01/01/2017  2         60         0              0     
1111     3           01/01/2017  3         10         0              50    
1111     3           01/01/2017  4         0          0              60    
1111     3           01/01/2017  5         0          60             0     
1111     3           01/01/2017  6         0          60             0  
...
1111     3           01/01/2017  23        0          60             0     
1111     4           01/01/2017  0         15         30             15    
1111     4           01/01/2017  1         30         30             0     
1111     4           01/01/2017  2         60         0              0     
1111     4           01/01/2017  3         10         0              50    
1111     4           01/01/2017  4         0          0              60    
1111     4           01/01/2017  5         0          60             0     
1111     4           01/01/2017  6         0          60             0
...
1111     4           01/01/2017  23        0          60             0     
2222     3           01/01/2017  0         15         30             15    
2222     3           01/01/2017  1         30         30             0     
2222     3           01/01/2017  2         60         0              0     
2222     3           01/01/2017  3         10         0              50    
2222     3           01/01/2017  4         0          0              60    
2222     3           01/01/2017  5         0          60             0     
2222     3           01/01/2017  6         0          60             0 

【讨论】:

  • 这是一个很好的解决方案。我尝试在此表中再添加 2 列,但语法卡住了。
  • 这是一个很好的解决方案。我尝试在此表中再添加 2 列,但我被语法卡住了。它看起来一样,但每个 user_id 和 channel_id。 fe: user_id channel_id insert_time status 1111 3 1/1/2017 0:00 AVAILABLE 1111 3 1/1/2017 0:15 BUSY 1111 3 1/1/2017 0:30 NOT AVIALABLE 1111 3 1/1/2017 1:30可用 1111 4 1/1/2017 0:00 可用 1111 4 1/1/2017 0:15 忙碌 1111 4 1/1/2017 0:30 不可用 1111 4 1/1/2017 1:30 可用 2222 3 1/ 2017 年 1 月 0:00 可用 2222 3 2017 年 1 月 1 日 0:15 忙碌 2222 3 2017 年 1 月 1 日 0:30 不可用 2222 3 2017 年 1 月 1 日 1:30 可用
  • 我希望汇总结果表如下所示: user_id channel_id day hour available minutes not available minutes busy minutes 1111 3 1/1/2017 0 15 30 15 1111 3 1/1/2017 1 30 30 0 1111 3 1/1/2017 2 60 0 0 1111 3 1/1/2017 3 10 0 50 1111 4 1/1/2017 0 15 30 15 1111 4 1/1/2017 1 30 30 0 1111 4 1/ 2017 年 1 月 2 60 0 0 1111 4 2017 年 1 月 1 日 3 10 0 50 2222 3 2017 年 1 月 1 日 0 15 30 15 2222 3 2017 年 1 月 1 日 1 30 30 0 2222 3 2017 年 1 月 1 日 2 60 0 0 2222 3 1/1/2017 3 10 0 50 我试图添加它,但我很困惑。你能帮我添加这些数据吗?谢谢,E
  • 谢谢@user4810167 我在您帖子的数据中没有看到user_idchannel_id。没有看到这些数据来自哪里,我不知道如何评估它们。如果当前问题已得到解答,那么在此处接受答案并提出一个新问题可能会有所帮助,重点是在将其他 user_idchannel_id 列添加到表中时出现的问题。谢谢
  • 我开了一个新问题,如果你能看一下,我将不胜感激:stackoverflow.com/questions/43996771/… 谢谢!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2016-12-26
  • 1970-01-01
  • 2021-09-13
  • 1970-01-01
  • 2016-07-28
  • 2010-11-09
相关资源
最近更新 更多