【问题标题】:Redshift SQL - Count Sequences of Repeating Values Within GroupsRedshift SQL - 计算组内重复值的序列
【发布时间】:2021-10-14 17:11:37
【问题描述】:

我有一张如下所示的表格:

| id |      date_start     |    gap_7_days   |
| -- | ------------------- | --------------- |
|  1 | 2021-06-10 00:00:00 |        0        |
|  1 | 2021-06-13 00:00:00 |        0        |
|  1 | 2021-06-19 00:00:00 |        0        |
|  1 | 2021-06-27 00:00:00 |        0        |
|  2 | 2021-07-04 00:00:00 |        1        |
|  2 | 2021-07-11 00:00:00 |        1        |
|  2 | 2021-07-18 00:00:00 |        1        |
|  2 | 2021-07-25 00:00:00 |        1        |
|  2 | 2021-08-01 00:00:00 |        1        |
|  2 | 2021-08-08 00:00:00 |        1        |
|  2 | 2021-08-09 00:00:00 |        0        |
|  2 | 2021-08-16 00:00:00 |        1        |
|  2 | 2021-08-23 00:00:00 |        1        |
|  2 | 2021-08-30 00:00:00 |        1        |
|  2 | 2021-08-31 00:00:00 |        0        |
|  2 | 2021-09-01 00:00:00 |        0        |
|  2 | 2021-08-08 00:00:00 |        1        |
|  2 | 2021-08-15 00:00:00 |        1        |
|  2 | 2021-08-22 00:00:00 |        1        |
|  2 | 2021-08-23 00:00:00 |        1        |

对于每个 ID,我检查连续的 date_start 值是否相隔 7 天,并在 gap_7_days 中相应地输入 1 或 0。

我想做以下事情(仅使用 Redshift SQL):

  1. 获取gap_7_days中每个ID的每个连续1序列的长度

预期输出:

| id |      date_start     |    gap_7_days   | sequence_length |
| -- | ------------------- | --------------- | --------------- |
|  1 | 2021-06-10 00:00:00 |        0        |                 |
|  1 | 2021-06-13 00:00:00 |        0        |                 |
|  1 | 2021-06-19 00:00:00 |        0        |                 |
|  1 | 2021-06-27 00:00:00 |        0        |                 |
|  2 | 2021-07-04 00:00:00 |        1        |        6        |
|  2 | 2021-07-11 00:00:00 |        1        |        6        |
|  2 | 2021-07-18 00:00:00 |        1        |        6        |
|  2 | 2021-07-25 00:00:00 |        1        |        6        |
|  2 | 2021-08-01 00:00:00 |        1        |        6        |
|  2 | 2021-08-08 00:00:00 |        1        |        6        |
|  2 | 2021-08-09 00:00:00 |        0        |                 |
|  2 | 2021-08-16 00:00:00 |        1        |        3        |
|  2 | 2021-08-23 00:00:00 |        1        |        3        |
|  2 | 2021-08-30 00:00:00 |        1        |        3        |
|  2 | 2021-08-31 00:00:00 |        0        |                 |
|  2 | 2021-09-01 00:00:00 |        0        |                 |
|  2 | 2021-08-08 00:00:00 |        1        |        4        |
|  2 | 2021-08-15 00:00:00 |        1        |        4        |
|  2 | 2021-08-22 00:00:00 |        1        |        4        |
|  2 | 2021-08-23 00:00:00 |        1        |        4        |
  1. 获取每个 ID 的序列数

预期输出:

| id |    num_sequences    |
| -- | ------------------- |
|  1 |          0          |
|  2 |          3          |

我怎样才能做到这一点?

【问题讨论】:

    标签: sql count amazon-redshift


    【解决方案1】:

    如果你想要序列的数量,只需查看之前的值。当当前值为“1”且前一个值为NULL0 时,则您有一个新序列。

    所以:

    select id,
           sum( (gap_7_days = 1 and coalesce(prev_gap_7_days, 0) = 0)::int ) as num_sequences
    from (select t.*,
                 lag(gap_7_days) over (partition by id order by date_start) as prev_gap_7_days
          from t
         ) t
    group by id;
    

    如果你真的想要序列的长度,就像中间结果一样,那么问一个新的问题。此问题不需要该信息。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-11-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-09-12
      • 1970-01-01
      相关资源
      最近更新 更多