【问题标题】:Postgres group results with window function lag returns 0 rows具有窗口函数滞后的 Postgres 组结果返回 0 行
【发布时间】:2018-06-12 03:36:15
【问题描述】:

我正在尝试执行一个查询,我想忽略结果查询的第一行和最后一行。为了做到这一点,使用窗口函数给出了一个命中,就像上面给我的查询一样

SELECT lag(timestamp_min)    OVER (ORDER BY timestamp_min) AS timestamp_min,
       lag(type)             OVER (ORDER BY timestamp_min) AS type,
       lag(sum_first_medium) OVER (ORDER BY timestamp_min),
FROM (SELECT to_timestamp(
                floor(
                   (extract('epoch' FROM TIMESTAMP) / 300)
                ) * 300
             ) AS timestamp_min,
             type,
             floor(sum(medium[1])) AS sum_first_medium
      FROM default_dataset
      WHERE type = 'ap_clients.wlan0'
        AND timestamp > current_timestamp - INTERVAL '85 minutes'
        AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
      GROUP BY timestamp_min, type) lagme
OFFSET 2;

问题是最后一个查询没有返回任何内容:

ws_controller_hist=> SELECT lag(timestamp_min) OVER (ORDER BY timestamp_min) AS timestamp_min, lag(type) OVER (ORDER BY timestamp_min) AS type, lag(sum_first_medium) OVER (ORDER BY timestamp_min) FROM (SELECT to_timestamp(floor((extract('epoch' FROM TIMESTAMP) / 300)) * 300) AS timestamp_min, type, floor(sum(medium[1])) AS sum_first_medium FROM default_dataset WHERE type = 'ap_clients.wlan0' AND timestamp > current_timestamp - INTERVAL '85 minutes' AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638' GROUP BY timestamp_min, type) lagme OFFSET 2;
 timestamp_min | type | lag
---------------+------+-----
(0 rows)

但我有“ap_clients.wlan0”类型的数据

ws_controller_hist=> select * from default_dataset where type ='ap_clients.wlan0' order by timestamp desc limit 3;
                  id                  |       timestamp        | agregation_period | medium | maximum | minimum | sum |       type       |              device_id               | network_id |           organiza
tion_id            |     labels
--------------------------------------+------------------------+-------------------+--------+---------+---------+-----+------------------+--------------------------------------+------------+-------------------
-------------------+----------------
 b3661dca-a459-43cd-a3c4-7609e36c18d5 | 2018-01-02 10:21:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 abbca52d-f3f5-4a99-bd2f-41602964506e | 2018-01-02 10:16:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 24e00926-bc6d-4025-8a6c-a8de9efacdad | 2018-01-02 10:11:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
(3 rows)

我需要一个查询来检索过去一小时内所有媒体的总和,按 5 分钟分组。

我解决问题的第一种方法是忽略我使用 offset(1) 的第一条记录,并忽略最后一条我试图在我的 id 字段中进行限制,按时间戳 desc 排序。

ws_controller_hist=>  
SELECT to_timestamp(floor((extract('epoch' FROM TIMESTAMP) / 300)) * 300) 
AS timestamp_min,
       TYPE,
       floor(sum(medium[1]))
FROM default_dataset
WHERE TYPE LIKE 'ap_clients.wlan0'
  AND TIMESTAMP > CURRENT_TIMESTAMP - interval '85 minutes'
  AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
  AND id NOT IN
    (SELECT id
     FROM default_dataset
     ORDER BY TIMESTAMP DESC
     LIMIT 1)
GROUP BY timestamp_min,
         TYPE
ORDER BY timestamp_min ASC
OFFSET 1;

     timestamp_min      |       type       | floor
------------------------+------------------+-------
 2017-12-19 14:20:00+00 | ap_clients.wlan0 |    38
 2017-12-19 14:25:00+00 | ap_clients.wlan0 |    37
 2017-12-19 14:30:00+00 | ap_clients.wlan0 |    39
 2017-12-19 14:35:00+00 | ap_clients.wlan0 |    42
 2017-12-19 14:40:00+00 | ap_clients.wlan0 |    43
 2017-12-19 14:45:00+00 | ap_clients.wlan0 |    44
 2017-12-19 14:50:00+00 | ap_clients.wlan0 |    45
 2017-12-19 14:55:00+00 | ap_clients.wlan0 |    45
 2017-12-19 15:00:00+00 | ap_clients.wlan0 |    43
 2017-12-19 15:05:00+00 | ap_clients.wlan0 |    43
 2017-12-19 15:10:00+00 | ap_clients.wlan0 |    50
 2017-12-19 15:15:00+00 | ap_clients.wlan0 |    52
 2017-12-19 15:20:00+00 | ap_clients.wlan0 |    50
 2017-12-19 15:25:00+00 | ap_clients.wlan0 |    53
 2017-12-19 15:30:00+00 | ap_clients.wlan0 |    49
 2017-12-19 15:35:00+00 | ap_clients.wlan0 |    39
 2017-12-19 15:40:00+00 | ap_clients.wlan0 |    16

但是我的最后一个查询并没有忽略最后一条记录,因为我有相同的记录不使用子查询“并且 id 不在(从 default_dataset 中选择 id 按时间戳 desc 限制 1 顺序)”。

如果我尝试查询以查看类型“ap_clients.wlan0”的结果

ws_controller_hist=> select * from default_dataset where organization_id='ce4b69af-bdce-4f1b-ba71-dd03544205d5' and type ='ap_clients.wlan0' order by timestamp desc limit 5;
                  id                  |       timestamp        | agregation_period | medium | maximum | minimum | sum |       type       |              device_id               | network_id |           organiza
tion_id            |     labels
--------------------------------------+------------------------+-------------------+--------+---------+---------+-----+------------------+--------------------------------------+------------+-------------------
-------------------+----------------
 b3661dca-a459-43cd-a3c4-7609e36c18d5 | 2018-01-02 10:21:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 abbca52d-f3f5-4a99-bd2f-41602964506e | 2018-01-02 10:16:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 24e00926-bc6d-4025-8a6c-a8de9efacdad | 2018-01-02 10:11:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 e67baf28-6d5b-43a5-85e2-fcf2d04a0b2e | 2018-01-02 10:06:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 c7ce16ce-9cda-423f-b32b-f4d6dce859e6 | 2018-01-02 10:01:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}

我能做什么?

【问题讨论】:

  • 您是否会像您的示例一样,在您选择的时间窗口中获得每个可能的 5 分钟时段的数据?如果是这样,为什么不只选择您想要的插槽,而不是在开始时选择一个额外的插槽,在最后选择一个额外的插槽,然后尝试删除它们?
  • 我已经编辑了我的问题并为类型 ap_clients.wlan0 添加了一个新查询

标签: sql postgresql postgresql-9.3


【解决方案1】:

一个简单的解决方案是使用laglead窗口函数,其参数不能是NULL,这样lag将返回NULL作为第一行,lead将返回@最后一行是 987654327@,因此您可以简单地过滤它们都是 NOT NULL 的行:

SELECT
    t2.timestamp_min,
    t2.type,
    t2.sum_first_medium
FROM (
    SELECT
        t1.*,
        lead(1) OVER(ORDER BY t1.timestamp_min) AS lead,
        lag(1) OVER(ORDER BY t1.timestamp_min) AS lag
    FROM (
        SELECT
            to_timestamp(
              floor(
                (extract('epoch' FROM TIMESTAMP) / 300)
              ) * 300
            ) AS timestamp_min,
            type,
            floor(sum(medium[1])) AS sum_first_medium
        FROM default_dataset
        WHERE
            type = 'ap_clients.wlan0'
            AND timestamp > current_timestamp - INTERVAL '85 minutes'
            AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
        GROUP BY timestamp_min, type
    ) t1
) t2
WHERE
    t2.lag IS NOT NULL -- Only first row will return NULL, skip it
    AND t2.lead IS NOT NULL -- Only last row will return NULL, skip it
ORDER BY t2.timestamp_min

注意我使用了lead(1)lag(1) 只是因为1 是一个非NULL 表达式,你可以使用任何非NULL 表达式甚至是一个列(因为保证是NOT NULL)。

另一种可能的解决方案是应用两个row_number() 调用,一个使用ORDER BY timestamp_min ASC,另一个使用ORDER BY timestamp_min DESC,然后过滤那些为<> 1 的行。但这需要两种数据集(一种用于ASC,一种用于DESC),而lag/lead 解决方案只需要一种(尽管可能更难理解)。

【讨论】:

    猜你喜欢
    • 2019-03-03
    • 2017-04-30
    • 1970-01-01
    • 1970-01-01
    • 2021-01-21
    • 2016-10-08
    • 1970-01-01
    • 2018-12-23
    • 1970-01-01
    相关资源
    最近更新 更多