【问题标题】:snowflake: calculating rolling average on time series with date gap雪花:计算具有日期间隔的时间序列的滚动平均值
【发布时间】:2020-12-18 11:00:56
【问题描述】:

Snowflake 允许我们根据当前值和之前的两个值计算滚动平均值。时间序列数据有缺口怎么办?

例如在以下示例中,我想计算三天移动平均线。对于 7/30,以下查询在计算 7/30 的 3 天移动平均线时将使用 7/25 数据。有没有办法避免这种情况?

CREATE OR REPLACE TABLE STOCK_PRICE(TRADE_DATE DATE, SYMBOL STRING, CLOSE_PRICE float);
INSERT INTO STOCK_PRICE VALUES
('2020-07-25', 'AAPL', '800.0'),
('2020-07-25', 'AXP', '90.0'),
('2020-07-30', 'AAPL', '1010.0'),
('2020-07-30', 'AXP', '112.0'),
('2020-07-31', 'AAPL', '1025.0'),
('2020-07-31', 'AXP', '105.0'),
('2020-08-03', 'AAPL', '978.0'),
('2020-08-03', 'AXP', '110.0'),
('2020-08-04', 'AAPL', '970.0'),
('2020-08-04', 'AXP', '115.0'),
('2020-08-05', 'AAPL', '990.0'),
('2020-08-05', 'AXP', '120.0'),
('2020-08-06', 'AAPL', '995.0'),
('2020-08-06', 'AXP', '125.0'),
('2020-08-07', 'AAPL', '990.0'),
('2020-08-07', 'AXP', '122.0'),
('2020-08-10', 'AAPL', '998.0'),
('2020-08-10', 'AXP', '124.0')

SELECT TRADE_DATE, SYMBOL, CLOSE_PRICE, 
AVG(CLOSE_PRICE) OVER  (PARTITION BY SYMBOL ORDER BY TRADE_DATE ROWS between 2 PRECEDING AND CURRENT ROW) AS MV_AVG_5DAY
FROM STOCK_PRICE

【问题讨论】:

    标签: windows snowflake-cloud-data-platform rolling-computation


    【解决方案1】:

    以下似乎工作,使用你的小数据集。它基于两个想法:

    1. 在您的数据中填写“缺失”的日期/符号记录
    2. AVG 函数忽略具有 NULL 值的记录

    基本做法如下:

    1. 创建一个包含表中最小和最大 TRADE_DATE 值之间的所有日期的数据集

    2. 创建数据集中唯一符号值的数据集

    3. 将这 2 个数据集连接在一起以获得所有日期/符号组合

    4. 将其加入您的表格以获得没有日期/符号间隔的数据集

    5. 针对这个新数据集运行查询的变体 -- Set MIN/MAX dates set min_date = (select min(trade_date) from stock_price); set max_date = (select max(trade_date) from stock_price); -- set parameter to be used as generator "constant" including the start day set num_days = (Select datediff(day, $min_date,$max_date+1)); -- Create a list of all dates between the min/max dates in the original table with date_list as ( select dateadd(day,'-' || row_number() over (order by null),dateadd(day, '+1', $max_date)) as date from table (generator(rowcount => ($num_days))) ), -- Get a unique list of symbols symbol_list as ( select distinct symbol from stock_price ), -- Create a data set containing every combination of date/symbol all_dates_symbols as ( select date, symbol from date_list, symbol_list -- Cartesian product ), -- Get the stock price for all dates/symbols. This will be null for any combinations not in the original table stock_price_all_dates as ( select t1.date "TRADE_DATE", t1.symbol "SYMBOL", t2.close_price "CLOSE_PRICE" from all_dates_symbols t1 left outer join STOCK_PRICE t2 on t1.date = t2.trade_date and t1.symbol = t2.symbol ), -- Calculate the average over the preceding x days. Nulls for any date/symbol should not be included MV_5DAY AS ( SELECT T1.TRADE_DATE, T1.SYMBOL, T1.CLOSE_PRICE, AVG(T1.CLOSE_PRICE) OVER (PARTITION BY T1.SYMBOL ORDER BY T1.TRADE_DATE ROWS between 2 PRECEDING AND CURRENT ROW) AS MV_AVG_5DAY FROM stock_price_all_dates T1 ) -- Join back to original table to exclude all records that don't exist in that table SELECT T1.TRADE_DATE, T1.SYMBOL, T1.CLOSE_PRICE, T1.MV_AVG_5DAY FROM MV_5DAY T1 INNER JOIN STOCK_PRICE T2 ON T1.TRADE_DATE = T2.TRADE_DATE AND T1.SYMBOL = T2.SYMBOL;

      --清理之前设置的参数变量 取消设置 num_days; 未设置 min_date; 取消设置 max_date;

    【讨论】:

      【解决方案2】:

      窗口化数据少于 3 天的日子的预期输出是多少?例如,第一行是否应该计算 800 的平均值(仅基于该行)?如果是这样,那么这样的事情可能会起作用(尽管如果您希望窗口远大于 3 天,它可能会变得笨拙):

      WITH CTE_STOCK_PRICE AS (
        SELECT TRADE_DATE
              ,SYMBOL
              ,CLOSE_PRICE
              ,CLOSE_PRICE AS CLOSE_PRICE_FOR_AVG
          FROM STOCK_PRICE
        UNION ALL
        SELECT TRADE_DATE
              ,SYMBOL
              ,CLOSE_PRICE
              ,LAG(CLOSE_PRICE,1) OVER (PARTITION BY SYMBOL ORDER BY TRADE_DATE) AS CLOSE_PRICE_FOR_AVG
          FROM STOCK_PRICE
        QUALIFY LAG(TRADE_DATE,1) OVER (PARTITION BY SYMBOL ORDER BY TRADE_DATE) = DATEADD(DAY, -1, TRADE_DATE)
        UNION ALL
        SELECT TRADE_DATE
              ,SYMBOL
              ,CLOSE_PRICE
              ,LAG(CLOSE_PRICE,2) OVER (PARTITION BY SYMBOL ORDER BY TRADE_DATE) AS CLOSE_PRICE_FOR_AVG
          FROM STOCK_PRICE
        QUALIFY LAG(TRADE_DATE,2) OVER (PARTITION BY SYMBOL ORDER BY TRADE_DATE) = DATEADD(DAY, -2, TRADE_DATE)
      )
      SELECT TRADE_DATE
            ,SYMBOL
            ,ANY_VALUE(CLOSE_PRICE) AS CLOSE_PRICE
            ,AVG(CLOSE_PRICE_FOR_AVG) AS MV_AVG_3DAY
        FROM CTE_STOCK_PRICE
       GROUP BY TRADE_DATE
               ,SYMBOL
       ORDER BY TRADE_DATE
               ,SYMBOL
      ;
      

      另请注意,您指定需要 3 天的窗口,但您的计算列名为 MV_AVG_5DAY;我调整了列名以匹配 3 天规格。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2013-11-17
        • 2018-10-25
        • 1970-01-01
        • 2015-01-12
        • 1970-01-01
        • 2010-10-30
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多