【问题标题】:View to get the minimum date with a complicated condition查看以获取条件复杂的最短日期
【发布时间】:2021-05-02 02:28:44
【问题描述】:

我在 SQL Server 中有一个这样的表:

+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
DateFrom: date not null -- unique for each EmployeeID
Completed: bit not null
EmployeeID: bigint not null
  • 每一行都属于一个由开始日期定义的子期间,可以完成也可以不完成。
  • 每个员工可以有多个子期间。
  • 期间由有序子期间列表定义,直到最后一个子期间完成。

我想创建一个视图,该视图将返回每个 EmployeeID 的最后一个期间的开始日期,如下所示:

  1. 如果没有 Completed 为 true,则获取最小 DateFrom。 [员工有一个时期尚未完成]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01|   false   |     1      |
|2021-01-05|   false   |     1      |
|2021-01-09|   false   |     1      |
|2021-01-10|   false   |     1      |
|2021-01-07|   false   |     2      |
|2021-01-15|   false   |     2      |
+----------+-----------+------------+

Expected Result:
2021-01-01 for EmployeeID = 1
2021-01-07 for EmployeeID = 2
  1. 否则,返回最后一个 Completed 为 true 后的最小 DateFrom。 [最后一期还没有完成]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01|   false   |     1      |
|2021-01-05|   true    |     1      |
|2021-01-09|   false   |     1      |
|2021-01-10|   false   |     1      |
|2021-01-07|   true    |     2      |
|2021-01-15|   false   |     2      |
+----------+-----------+------------+

Expected Result:
2021-01-09 for EmployeeID = 1
2021-01-15 for EmployeeID = 2
  1. 如果最大 DateFrom 已 Completed=true,则返回最后一个 Completed 为 true 之前的最小 DateFrom,如果存在,则返回它之前的 true。 [最后一期完成,有多个子期]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01|   false   |     1      |
|2021-01-05|   true    |     1      |
|2021-01-09|   false   |     1      |
|2021-01-10|   true    |     1      |
|2021-01-07|   false   |     2      |
|2021-01-15|   true    |     2      |
+----------+-----------+------------+

Expected Result:
2021-01-09 for EmployeeID = 1
2021-01-07 for EmployeeID = 2
  1. 如果最大 DateFrom 的 Completed=true 并且没有其他行或之前的行 Completed=true,则返回最大 DateFrom。 [最后一期以一个子期结束]
+----------+-----------+------------+
| DateFrom | Completed | EmployeeID |
+----------+-----------+------------+
|2021-01-01|   false   |     1      |
|2021-01-05|   false   |     1      |
|2021-01-09|   true    |     1      |
|2021-01-10|   true    |     1      |
|2021-01-07|   true    |     2      |
+----------+-----------+------------+

Expected Result:
2021-01-10 for EmployeeID = 1
2021-01-07 for EmployeeID = 2

我正在寻找最优化的解决方案。

我试过了,但在第三个例子中我得到了一个 NULL 值:

WITH T AS (
    SELECT EmployeeID
        , MAX(CASE WHEN Completed = 0 THEN NULL ELSE DateFrom END) MaxDateFrom 
    FROM TableDates
    GROUP BY EmployeeID
)
SELECT TableDates.EmployeeID, MIN(TableDates.DateFrom) DateFrom
FROM T
LEFT JOIN TableDates ON T.EmployeeID = TableDates.EmployeeID
    AND (T.MaxDateFrom IS NULL OR TableDates.DateFrom > T.MaxDateFrom)
GROUP BY TableDates.EmployeeID

【问题讨论】:

    标签: sql sql-server tsql view minimum


    【解决方案1】:

    我认为您只需要条件聚合——带有一堆逻辑。假设您每天都有行,我认为这可以满足您的要求:

    select employeeid,
           (case when -- case 4
                      min(completed) = max(completed) and
                      min(completed) = 'true'
                 then max(datefrom) 
                 when -- case 1
                      min(completed) = max(completed) and
                      min(completed) = 'false'
                 then min(datefrom) 
                 when -- case 3
                      max(datefrom) = max(case when completed = 'true' then datefrom end)
                 then min(case when completed_seqnum = 1 then datefrom end)
                 else dateadd(day, 1, max(case when completed = 'true' then datefrom end))
            end)
    from (select t.*,
                 sum(case when completed = 'true' then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
          from t
         ) t
    group by employeeid;
    

    每天需要一行实际上只是一种方便——例如,允许代码在特定的“真”假之后添加一天以获取日期。这也可以在子查询中使用lead() 来完成。

    注意:这不能处理所有条件(至少对于非 NULL 日期。例如,当数据末尾有一系列“true”时,它会返回 NULL

    如果这是一个问题 - 您的问题的这个版本已被问到。提出一个问题,并提供适当的样本数据和所需的结果。我还认为您也许能够解释您试图解决的问题并简化解释。

    编辑:

    如果缺少日期,您可以使用:

    select employeeid,
           (case when -- case 4
                      min(completed) = max(completed) and
                      min(completed) = 'true'
                 then max(datefrom) 
                 when -- case 1
                      min(completed) = max(completed) and
                      min(completed) = 'false'
                 then min(datefrom) 
                 when -- case 3
                      max(datefrom) = max(case when completed = 'true' then datefrom end)
                 then min(case when completed_seqnum = 1 then datefrom end)
                 else max(case when completed = 'true' then next_datefrom end)
            end)
    from (select t.*,
                 lead(datefrom) over (partition by employeeid order by datefrom) as next_datefrom,
                 sum(case when completed = 'true' then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
          from t
         ) t
    group by employeeid;
    

    【讨论】:

    • 非常感谢!只是,请注意,我们没有每天的行。
    • @FaresDellel 。 . .只需在子查询中使用lead() 并使用下一个日期而不是dateadd()。您的示例数据确实有每天的数据。
    • 我们可以这样优化解决方案:select EmployeeID, (case when min(completed_seqnum) = 0 then min(case when completed_seqnum = 0 then DateFrom end) else min(case when completed_seqnum = 1 then DateFrom end) end) FinalResult from (select t.*, sum(case when completed = 'true' then 1 else 0 end) over (partition by EmployeeID order by DateFrom desc) as completed_seqnum from t ) t group by EmployeeID;
    【解决方案2】:

    这是一个有效的查询。它可能过于复杂,但我把简化留给你。

    它处理了3种情况,都按要求按EmployeeId分区,如下:

    1. 当不存在Completed=1 时,使用sum(Completed) over() 检测到,然后使用first_value(DateFrom)

    2. 当最后一行值为completed=1,前一行为completed=0时,使用last_value(Completed)lag(Completed)检测,则使用max(case when Completed = 0 then DateFrom else null end)

    3. 棘手的情况是,Completed=1 存在并且不是最后一个。在这种情况下,找到 Completed=1 的最近行的 DateFrom,然后为所有比先前检测到的行更新的行找到 min(DateFrom),直到前面的 Completed=1

    4. 如果最后一行有completed=1,倒数第二行有completed=1,则使用最后一行的DateFrom。如果所有其他选项都为空,Coalesce 会确保这一点。

    insert into @Test (EmployeeId, DateFrom, Completed)
    values
    -- Scenario 1
    (1, '2021-01-01', 0),
    (1, '2021-01-02', 0),
    (1, '2021-01-03', 0),
    -- Scenario 2
    (2, '2021-01-01', 0),
    (2, '2021-01-02', 1),
    (2, '2021-01-03', 0),
    (2, '2021-01-04', 0),
    -- Scenario 3
    (3, '2021-01-01', 0),
    (3, '2021-01-02', 1),
    (3, '2021-01-03', 0),
    (3, '2021-01-04', 1),
    -- Special case, single row
    (4, '2021-01-01', 1),
    -- Scenario 4
    (5, '2021-01-01', 0),
    (5, '2021-01-02', 0),
    (5, '2021-01-03', 1);
    
    with cte as (
      select *
        -- First value of DateFrom over all rows (not the default)
        , first_value (DateFrom) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) FirstDateFrom
        -- Last value of Completed over all rows (not the default)
        , last_value (Completed) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) LastCompleted
        -- Find the Date of the last row with Completed = 1
        , max (case when Completed = 1 then DateFrom else null end) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) LastCompletedNew
        -- Regular row number
        , row_number() over (partition by EmployeeId order by DateFrom desc) RowNumber
        -- Total number of rows with Completed = 1
        , sum(convert(int,Completed)) over (partition by EmployeeId) SumOfCompleted
        -- Max value of DateFrom where Completed = 0
        , max(case when Completed = 0 then DateFrom else null end) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) MaxDateFrom
        -- Check the lagged complete to see if the last 2 rows are completed = 1
        , lag(Completed) over (partition by EmployeeId order by DateFrom asc) LaggedComplete
        -- Borrowed from Gordon to check which rows are prior to the last Completed = 1 and before the preceding Completed = 1
        , sum(case when completed = 1 then 1 else 0 end) over (partition by employeeid order by datefrom desc) as completed_seqnum
      from @Test
    )
    select
      EmployeeId
      -- Use the only DateFrom if there is only one
      , coalesce(case
        -- Scenario 1
        when SumOfCompleted = 0 then FirstDateFrom
        when LastCompleted = 1 then
          case
          -- Scenario 4
          when coalesce(LaggedComplete,0) = 1 then DateFrom
          -- Scenario 3
          else Scenario3
          end
        -- Scenario 2
        else ActualResult
        end, DateFrom) FinalResult
      --, * -- Uncomment for working
    from (
      select *
        -- Find the lowest DateFrom which is greater then the DateFrom of the last row where Completed = 1
        , min(case when DateFrom > LastCompletedNew then DateFrom else null end) over (partition by EmployeeId) ActualResult
        -- Find the min DateFrom over the rows between the last Completed=1 and the Completed=1 before it (if it exists)
        , min(case when completed_seqnum = 1 then DateFrom else null end) over (partition by EmployeeId order by DateFrom asc rows between unbounded preceding and unbounded following) Scenario3
      from cte
    ) x
    -- Because we have calculated the same result for every row we just take the first
    where RowNumber = 1
    order by x.EmployeeId asc, x.DateFrom asc;
    

    注意:这里假设每个日期只有一行。

    【讨论】:

    • 非常感谢!只是,如果你能帮忙的话,我想做 GROUP BY。
    • @FaresDellel 我猜您忘记在示例数据中添加 EmployeeId 了?查看修改。
    • 我在第一篇文章中提到我会做一个 GROUP BY,然后我用一个完整的例子编辑了它。
    • @FaresDellel 但您的示例数据不包含它 - 它应该包含它。
    • @FaresDellel 您的示例数据应包含此类条件。
    猜你喜欢
    • 1970-01-01
    • 2021-03-10
    • 1970-01-01
    • 1970-01-01
    • 2017-09-24
    • 1970-01-01
    • 1970-01-01
    • 2016-05-22
    • 1970-01-01
    相关资源
    最近更新 更多