【问题标题】:How to use timebucket_gapfill when rows can have null values?当行可以有空值时如何使用 timebucket_gapfill?
【发布时间】:2020-01-19 18:46:59
【问题描述】:

我有一个时间序列表,其中测量值被记录到“宽”行中。行可能包含所有测量值或仅包含一些测量值。然后将其他列设置为NULL

我想使用timebucket_gapfill() 来“清理”这个表,并确保输出中的每一行在所有列中都有数据,即使基础数据集的某些列有一些空值。

这就是我用一些数据准备表格的方式(来自getting started guide 的模式):

CREATE TABLE conditions (
  time        TIMESTAMPTZ       NOT NULL,
  location    TEXT              NOT NULL,
  temperature DOUBLE PRECISION  NULL,
  humidity    DOUBLE PRECISION  NULL
);
SELECT create_hypertable('conditions', 'time');
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES ('2019-07-10 05:02:14-07', 'office', 70.0, 50.0);
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES ('2019-07-10 05:02:15-07', 'office', 71.0, null);
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES ('2019-07-10 05:02:16-07', 'office', 72.0, 48.0);
-- gap at 2019-07-10 05:02:17-07
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES ('2019-07-10 05:02:18-07', 'office', 72.0, 48.0);
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES ('2019-07-10 05:02:18.8-07', 'office', 72.1, NULL);
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES ('2019-07-10 05:02:19.2-07', 'office', NULL, 46.0);
INSERT INTO conditions(time, location, temperature, humidity)
  VALUES ('2019-07-10 05:02:20-07', 'office', 73.0, 45.0);

这就是我查询它的方式:

SELECT
    time_bucket_gapfill('1000ms', time,
      start => '2019-07-10 05:02:13',
      finish => '2019-07-10 05:02:21'
    ) as ival,
    count(*) as samplesUsed,
    interpolate(avg(temperature)) as lineartemperature,
    interpolate(avg(humidity)) as linearhumidity
 FROM conditions
 GROUP BY ival
 ORDER BY ival;

输出是:

          ival          | samplesused | lineartemperature | linearhumidity 
------------------------+-------------+-------------------+----------------
 2019-07-10 05:02:13-07 |             |                   |               
 2019-07-10 05:02:14-07 |           1 |                70 |             50
 2019-07-10 05:02:15-07 |           1 |                71 |               
 2019-07-10 05:02:16-07 |           1 |                72 |             48
 2019-07-10 05:02:17-07 |             |            72.025 |             48
 2019-07-10 05:02:18-07 |           2 |             72.05 |             48
 2019-07-10 05:02:19-07 |           1 |                   |             46
 2019-07-10 05:02:20-07 |           1 |                73 |             45
  • 我明白为什么第一行是空的 - 数据集中没有数据。
  • 在 5:02:17,当数据集中没有行时,插值工作正常。
  • 但是,在 5:02:15 和 5:02:19,基础行是“部分”的,数据库没有使用前一行和后一行的值分别插入湿度和温度的结果。

如何编写查询以返回所有测量列的插值?

【问题讨论】:

    标签: sql postgresql time-series timescaledb


    【解决方案1】:

    Timescaledb 不会将 NULL 视为缺失值。我必须重写查询以避免具有 NULL 值的行,这意味着使用 timebucket_gapfill 执行多个查询并将结果连接在一起。

    这很有效,可以满足我的要求:

    SELECT
        condh.ival, humidity, temperature
    from
    (
        select
        time_bucket_gapfill('1000ms', time,
          start => '2019-07-10 05:02:13',
          finish => '2019-07-10 05:02:21'
        ) as ival,
        count(*) as samplesUsed,
        interpolate(avg(humidity)) as humidity
        FROM conditions
        WHERE humidity is not NULL
        GROUP BY ival
    ) condh 
    INNER JOIN 
    (
         SELECT
        time_bucket_gapfill('1000ms', time,
          start => '2019-07-10 05:02:13',
          finish => '2019-07-10 05:02:21'
        ) as ival,
        count(*) as samplesUsed,
        interpolate(avg(temperature)) as temperature
        FROM conditions
        WHERE temperature is not NULL
        GROUP BY ival
    ) condt
    on (condt.ival = condh.ival)
    ORDER BY ival;
    

    输出:

              ival          | humidity | temperature 
    ------------------------+----------+-------------
     2019-07-10 05:02:13-07 |          |            
     2019-07-10 05:02:14-07 |       50 |          70
     2019-07-10 05:02:15-07 |       49 |          71
     2019-07-10 05:02:16-07 |       48 |          72
     2019-07-10 05:02:17-07 |       48 |      72.025
     2019-07-10 05:02:18-07 |       48 |       72.05
     2019-07-10 05:02:19-07 |       46 |      72.525
     2019-07-10 05:02:20-07 |       45 |          73
    (8 rows)
    

    在 timescaledb 松弛方面获得了一些帮助 - 感谢 gayathri。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-11-10
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-19
      • 1970-01-01
      • 2011-01-12
      相关资源
      最近更新 更多