【问题标题】:PostgreSQL: fill NULL values in timeserie query with previous valuePostgreSQL:用以前的值填充时间序列查询中的 NULL 值
【发布时间】:2018-04-14 20:18:15
【问题描述】:

我有一个包含时间相关信息的数据库。我想要一个包含每分钟值的列表。像这样:

12:00:00  3
12:01:00  4
12:02:00  5
12:03:00  5
12:04:00  5
12:05:00  3

但是当几分钟没有数据时,我得到了这样的结果:

12:00:00  3
12:01:00  4
12:02:00  5
12:03:00  NULL
12:04:00  NULL
12:05:00  3

我想用之前的 NOT NULL 值填充 NULL 值。

此查询为每分钟创建一个时间序列。然后它将它加入到我的数据库中的数据中。

我阅读了一些关于窗口函数的内容,以用之前的 NOT NULL 值填充 NULL 值,但我不知道如何在此查询中实现这一点。有人能把我推向好的方向吗?

我尝试了这个解决方案,但 NULL 值仍然存在: PostgreSQL use value from previous row if missing

这是我的查询:

SELECT
    date,
    close
FROM generate_series(
  '2017-11-01 09:00'::timestamp,
  '2017-11-01 23:59'::timestamp,
  '1 minute') AS date
LEFT OUTER JOIN
 (SELECT
    date_trunc('minute', market_summary."timestamp") as day,
    LAST(current, timestamp) AS close
    FROM market_summary
  WHERE created_at >= '2017-11-01 09:00'
    AND created_at < '2017-11-01 23:59'
    GROUP BY day
 ) results
ON (date = results.day)
ORDER BY date

【问题讨论】:

    标签: sql postgresql


    【解决方案1】:

    我发现以下方法更简单:

    创建给定的数据样本:

    WITH example (date,close) AS 
    (VALUES 
        ('12:00:00',3),
        ('12:00:01',4),
        ('12:00:02',5),
        ('12:00:03',NULL),
        ('12:00:04',NULL), 
        ('12:00:05',3)
    ) 
    SELECT * INTO temporary table market_summary FROM example;
    

    查询用之前填充的值填充NULL值

    select 
        date, 
        close, 
        first_value(close) over (partition by grp_close) as corrected_close
    from (
          select date, close,
                 sum(case when close is not null then 1 end) over (order by date) as grp_close
          from   market_summary
    ) t
    

    返回

    date      | close | corrected_close
    -----------------------------------
    12:00:00  | 3     | 3
    12:01:00  | 4     | 4
    12:02:00  | 5     | 5
    12:03:00  | NULL  | 5
    12:04:00  | NULL  | 5
    12:05:00  | 3     | 3
    
    • 关闭:现有值
    • corrected_close:更正值

    【讨论】:

    • 我发现您可能还需要在 first_value 子句中按日期排序,否则您可能仍然会弹出空值first_value(close) over (partition by grp_close order by date) as corrected_close
    【解决方案2】:

    这是一种方法:

    select ms.*, ms_prev.close as lag_close
    from (select ms.*,
                 max(date) filter (where close is not null) over (order by date rows between unbounded preceding and 1 preceding) as dprev
          from market_summary ms
         ) ms left join
         market_summary ms_prev
         on ms_prev.dprev = ms.date
    order by ms.date;
    

    但是,如果您连续只有一个或两个NULLs,则使用起来可能更简单:

    select ms.*,
           coalesce(lag(ms.close, 1) over (order by date),
                    lag(ms.close, 2) over (order by date),
                    lag(ms.close, 3) over (order by date)
                   ) as prev_close
    from market_summary ms;
    

    【讨论】:

      【解决方案3】:

      我在页面上找到了解决方案: http://www.postgresql-archive.org/lag-until-you-get-something-OVER-window-function-td5824644.html

      CREATE OR REPLACE FUNCTION GapFillInternal( 
          s anyelement, 
          v anyelement) RETURNS anyelement AS 
      $$ 
      BEGIN 
        RETURN COALESCE(v,s); 
      END; 
      $$ LANGUAGE PLPGSQL IMMUTABLE; 
      
      CREATE AGGREGATE GapFill(anyelement) ( 
        SFUNC=GapFillInternal, 
        STYPE=anyelement 
      ); 
      
      postgres=# select id, natural_key, gapfill(somebody) OVER (ORDER BY 
      natural_key, id) from lag_test; 
       id │ natural_key │ gapfill 
      ────┼─────────────┼───────── 
        1 │           1 │ 
        2 │           1 │ Kirk 
        3 │           1 │ Kirk 
        4 │           2 │ Roybal 
        5 │           2 │ Roybal 
        6 │           2 │ Roybal 
      (6 rows) 
      

      【讨论】:

        【解决方案4】:

        如何在 vanilla posgres 中使用几个自定义函数来执行此操作。

        架构 (PostgreSQL v12)

        CREATE TABLE test (ts timestamp, email varchar, title varchar);
        insert into test values
        ('2017-01-01', 'me@me.com', 'Old title'),
        ('2017-01-02', 'me@me.com', null),
        ('2017-01-03', 'me@me.com', 'New Title'),
        ('2017-01-04', 'me@me.com', null),
        ('2017-01-05', 'me@me.com', null),
        ('2017-01-06', 'me@me.com', 'Newer Title'),
        ('2017-01-07', 'me@me.com', null),
        ('2017-01-08', 'me@me.com', null);
        
         -- The built in function coalesce is not a aggregate function, nor is variadic.
         -- It might just be a compiler construct.
         -- So we define our own version
         CREATE FUNCTION f_coalesce(a anyelement, b anyelement) RETURNS anyelement AS '
            SELECT COALESCE(a,b);
         ' LANGUAGE SQL PARALLEL SAFE;
         -- Aggregate colasce that keeps first non-null value it sees
        CREATE AGGREGATE agg_coalesce (anyelement)
        (
            sfunc = f_coalesce,
            stype = anyelement
        );
        

        查询 #1

        SELECT
            ts,
            email,
        
            array_agg(title) FILTER (WHERE title is not null ) OVER ( 
                order by ts desc ROWS BETWEEN current row and unbounded following 
            ) as title_array,
            (array_agg(title) FILTER (WHERE title is not null ) OVER ( 
                order by ts desc ROWS BETWEEN current row and unbounded following )
            )[1] as title,
            COALESCE(
                agg_coalesce(title) OVER ( 
                    order by ts desc ROWS BETWEEN current row and unbounded following 
                ),
                (select title from test 
                    where title is not null 
                    and ts < '2017-01-02'
                    order by ts desc limit 1 )
            )as title_locf 
        from test
        where ts >= '2017-01-02'
        order by ts desc;
        

        要点:

        https://gist.github.com/DanielJoyce/cc9f80d4326b7cb40d07af2ffb069b74

        【讨论】:

          猜你喜欢
          • 2022-01-13
          • 2021-03-09
          • 2020-08-29
          • 2021-04-21
          • 2020-01-26
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2020-03-25
          相关资源
          最近更新 更多