【问题标题】:Delete rows with specific values in one column and a missing value in another column of the same or consecutive row删除同一列或连续行的另一列中具有特定值和缺失值的行
【发布时间】:2014-02-24 19:35:47
【问题描述】:

我的数据集包含不同公司的每日时间序列,我使用 PostgreSQL。 我的数据集中有一个指标变量,取值 1、-1,大多数情况下为 0。如果指标变量不为 0,并且公司在当天(指标日)或下一天的另一列中有缺失值日,公司将被完全排除在数据集中。

我们可以想到以下示例数据:

date             company   indicator   value
2012-01-02       A         0           2
2012-01-02       B         0           9
2012-01-02       C         0           1
2012-01-02       D         0           3
2012-01-03       A         1           NULL
2012-01-03       B         0           NULL
2012-01-03       C        -1           1
2012-01-03       D         0           2
2012-01-04       A         0           1
2012-01-04       B         0           1
2012-01-04       C         0           NULL
2012-01-04       D         1           4
2012-01-05       A         0           4
2012-01-05       B         0           2
2012-01-05       C         0           1
2012-01-05       D         0           7

因此必须排除 A,因为它在指标日有缺失值,而 C 因为它在指标日的第二天有缺失值。

我尝试了以下方法:

    CREATE TABLE to_delete
    AS SELECT * FROM mytable
    WHERE company IN(
                   SELECT company 
                   FROM mytable 
                   WHERE date BETWEEN (SELECT date FROM mytable WHERE indicator != 0)
                          AND (SELECT date+1 FROM mytable WHERE indicator != 0) 
                   AND indicator != 0)
    AND date BETWEEN (SELECT date FROM mytable WHERE indicator != 0)
                 AND (SELECT date+1 FROM mytable WHERE indicator != 0) 

    DELETE FROM mytable WHERE company in (SELECT DISTINCT company FROM to_delete);

如果示例数据集中仅存在一个不等于零的指标值,则该方法有效。如果不止一个,PostgreSQL 会返回一个错误,说我的子查询返回了不止一行。

我真的很难解决这个问题。您是否知道解决方案,或者可能是实现所需结果的完全其他方法?

【问题讨论】:

    标签: sql postgresql gaps-and-islands


    【解决方案1】:

    我会在很大程度上简化为EXISTS semi-join

    只删除违规行

    SELECT * FROM tbl t
    -- DELETE FROM tbl t
    WHERE  indicator <> 0
    AND EXISTS (
       SELECT 1
       FROM   tbl t1
       WHERE  day IN (t.day, t.day + 1)
       AND    t1.company = t.company
       AND    t1.value IS NULL
       )
    

    -> SQLfiddle

    使用列名day 而不是date,因为我从不使用基本类型名称作为标识符。

    day + 1day 属于data type date 时有效(应该如此)。

    删除整个公司

    公司应完全从数据集中排除。

    DELETE FROM tbl t
    USING (
        SELECT DISTINCT company
        FROM   tbl t
        WHERE  indicator <> 0
        AND EXISTS (
            SELECT 1
            FROM   tbl t1
            WHERE  day IN (t.day, t.day + 1)
            AND    t1.company = t.company
            AND    t1.value IS NULL
            )
        ) del
    WHERE t.company = del.company
    

    -> SQLfiddle

    【讨论】:

      【解决方案2】:
      DELETE FROM test WHERE company IN (
      WITH 
          for_check AS (SELECT date, company FROM test WHERE indicator != 0)
      SELECT test.company 
      FROM test 
      INNER JOIN for_check fc 
          ON test.date IN (fc.date, fc.date + 1) 
              AND fc.company = test.company 
      WHERE test.value IS NULL
      )
      

      【讨论】:

      • Ervin 的变体看起来更清晰,如果这是少量数据并且子选择的数量无关紧要。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-10-09
      • 2022-11-22
      • 2013-01-02
      • 1970-01-01
      相关资源
      最近更新 更多