【问题标题】:Aggregate function with analytic function over partition on teradata- elected non-aggregated values must be part of the associated group在 teradata 选择的非聚合值上具有分析函数的聚合函数必须是关联组的一部分
【发布时间】:2019-01-04 09:29:40
【问题描述】:

我必须计算状态更改的次数,但前提是从一种状态到另一种状态的时间差小于 30 分钟。在我的数据库中,我有当前时间和以前的时间列,我使用过度分区。这是我的查询,但出现错误: “选定的非聚合值必须是关联组的一部分”。 有人可以帮忙吗?

select col1, col2,
    MAX(creation_dt_utc) OVER(PARTITION BY col1,col2,col3 ORDER BY creation_dt ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS previous_creation_dt,
    (creation_dt - prev_creation_dt) DAY(4) TO SECOND(6) as time_difference,
    EXTRACT(DAY FROM time_difference) * 24*60 + EXTRACT(HOUR FROM time_difference) * 60 + EXTRACT(MINUTE FROM time_difference) AS Total_Minutes
    SUM(
        CASE WHEN status_previous='Test1'
                and status_current='Test2' THEN 1

            ELSE    

                CASE WHEN status_previous='Test3'
                    and status_current='Test2' THEN 1                           
            ELSE
                CASE WHEN status_previous='Test4'
                    and status_current='Test2' THEN 1                       
                ELSE 0
                END
            END
        END
    ) AS "Total_Change"
from myTable
qualify Total_Minutes<30
where EXTRACT(YEAR from year_column)='2017';

【问题讨论】:

  • 最后的group by col1, col2 是不是错过了?但是,您的 SQL 存在更多问题。尝试发布此输入的示例输入和预期输出。
  • 是的,对不起。我按 col1、col2 添加了组,但我仍然遇到相同的错误:“选定的非聚合值必须是关联组的一部分”
  • 根据当前没有 SUM 的 Select,您的预期结果是什么?
  • 我的预期结果应该是从一种状态到另一种状态的时间差小于 30 分钟的更改总数。

标签: sql teradata partition


【解决方案1】:

分析函数在聚合后处理(where-from-group by-having-olap-qualify-order by),因此您不能对 OVER 的结果应用聚合,您可以将其嵌套在 Derived Tabe 中或公用表表达式:

SELECT
   Sum(
       CASE WHEN (status_previous='Test1' AND status_current='Test2')
              OR (status_previous='Test3' AND status_current='Test2')
              OR (status_previous='Test4' AND status_current='Test2')
            THEN 1                       
            ELSE 0
       END) AS "Total_Change"
FROM
 (
   SELECT col1, col2,
       Max(creation_dt_utc)
       Over(PARTITION BY col1,col2,col3
            ORDER BY creation_dt
            ROWS BETWEEN 1 Preceding AND 1 Preceding) AS previous_creation_dt,

       (creation_dt - prev_creation_dt) DAY(4) TO SECOND(6) AS time_difference,

       Extract(DAY From time_difference) * 24*60 + Extract(HOUR From time_difference) * 60 + Extract(MINUTE From time_difference) AS Total_Minutes

   FROM myTable
   WHERE Extract(YEAR From year_column)=2017 -- the result of EXTRACT is an INTEGER, not a string
   QUALIFY Total_Minutes<30
 ) AS dt

但由于您只需要计数,您可以将 CASE 移至 QUALIFY:

SELECT Count(*) AS "Total_Change"
FROM
 (
   SELECT col1, col2,
       Max(creation_dt_utc)
       Over(PARTITION BY col1,col2,col3
            ORDER BY creation_dt
            ROWS BETWEEN 1 Preceding AND 1 Preceding) AS previous_creation_dt,

       (creation_dt - prev_creation_dt) DAY(4) TO SECOND(6) AS time_difference,

       Extract(DAY From time_difference) * 24*60 + Extract(HOUR From time_difference) * 60 + Extract(MINUTE From time_difference) AS Total_Minutes

   FROM myTable
   WHERE Extract(YEAR From year_column)=2017 -- the result of EXTRACT is an INTEGER, not a string
   QUALIFY Total_Minutes<30
       AND (   (status_previous='Test1' AND status_current='Test2')
            OR (status_previous='Test3' AND status_current='Test2')
            OR (status_previous='Test4' AND status_current='Test2')
           )
 ) AS dt

编辑:

CASE 逻辑可以进一步简化为:

CASE WHEN status_current='Test2' and status_previous IN ('Test1','Test3','Test4')
     THEN 1                       
     ELSE 0
END

或许

CASE WHEN status_current='Test2' and status_previous <>'Test2'
     THEN 1                       
     ELSE 0
END

【讨论】:

  • 非常感谢您分享这个经过编辑的示例 :)
【解决方案2】:

我认为 QUALIFY 应该在 WHERE 子句之后。

对于之前的值,我认为 LAG 比 MAX 更合适。

而那些嵌套的CASE可以写成1CASE。 因为一旦满足 WHEN 条件,它就不会检查其后的其他 WHEN 条件。

由于使用了普通的 SUM,所以应该有一个 GROUP BY。

SELECT col1, col2,
 COUNT(*) AS Total,
 SUM(TimeDiffMinutes) AS Total_Minutes,
 SUM(CASE WHEN StatusChanged = 1 THEN TimeDiffMinutes ELSE 0 END) AS Total_Minutes_Change,
 COUNT(CASE WHEN StatusChanged = 1 THEN 1 END) AS Total_Change
FROM
(
  SELECT col1, col2, col3, creation_dt,
  (CASE 
   WHEN status_previous='Test1' and status_current='Test2' THEN 1
   WHEN status_previous='Test3' and status_current='Test2' THEN 1   
   WHEN status_previous='Test4' and status_current='Test2' THEN 1
   ELSE 0
   END) AS StatusChanged,
  LAG(creation_dt) OVER (PARTITION BY col1, col2, col3 ORDER BY creation_dt) AS prev_creation_dt,
  (creation_dt - prev_creation_dt) DAY(4) TO SECOND(6) AS time_difference,
  EXTRACT(DAY FROM time_difference)*(24*60) + EXTRACT(HOUR FROM time_difference)*60 + EXTRACT(MINUTE FROM time_difference) AS TimeDiffMinutes
  FROM myTable  
  WHERE EXTRACT(YEAR from year_column) = '2017'
  QUALIFY (creation_dt - prev_creation_dt) day(4) to second(6) < interval '30' minute
) q
GROUP BY col1, col2
ORDER BY col1, col2

【讨论】:

  • 从逻辑上讲,QUALIFY 是在 WHERE/GROUP BY/HAVING 之后完成的,但是 Teradata 的解析器(太)灵活并且不关心 SELECT 之后的关键字顺序 :-) 并且添加了 LAG最近,在 TD16.10 中,OP 可能会运行旧版本...
  • 但它仍然应该放在它之后;)因为我认为 Windows 函数的条件通常在 WHERE 和 HAVING 条件之后进行评估。好吧,不知道他的版本。哦,好吧,如果他的版本没有 LAG,那么他仍然有 MAX。
  • 当然,当您看到 Select 并且关键字的顺序错误时,这完全是令人困惑的 :-)
猜你喜欢
  • 2020-05-14
  • 2013-06-27
  • 2021-12-17
  • 1970-01-01
  • 1970-01-01
  • 2020-10-23
  • 2018-07-15
  • 1970-01-01
  • 2015-10-05
相关资源
最近更新 更多