【问题标题】:Filling nulls with average between neighbor values with restriction on another column用相邻值之间的平均值填充空值,并限制另一列
【发布时间】:2019-03-30 18:21:44
【问题描述】:

我有一个列名为“id”、“time”、“value”的表 并且当“值”为空时,我希望它是该 id 上“时间”列的最近邻居之间的平均值

我的问题正是这里描述的select nearest neighbours,但答案没有解释如何找到对另一列有限制的最近邻居(id 应该相同)

示例: 第二行“值”缺失

id       | time  | value
-------------------------
11111    | 1     | 5.0
11111    | 10    | 
22222    | 7     | 32.6
33333    | 11    | 15.88
11111    | 15    | 20.0

我希望它是:

id       | time  | value
-------------------------
11111    | 1     | 5.0
11111    | 10    | 12.5*
22222    | 7     | 32.6
33333    | 11    | 15.88
11111    | 15    | 20.0

如 (20.0 + 5.0) / 2 = 12.5

在MySQL中如何获取?

【问题讨论】:

    标签: mysql sql partition-by


    【解决方案1】:

    假设time 定义了顺序并且是唯一的(为此需要一个唯一的列和一个定义顺序的列),一种方法是使用子查询获取顶部(底部)value 的记录更小(更大)time 使用ORDER BYLIMIT

    SELECT t1.id,
           t1.time,
           coalesce(t1.value,
                    ((SELECT t2.value
                             FROM elbat t2
                             WHERE t2.id = t1.id
                                   AND t2.time < t1.time
                             ORDER BY t2.time DESC
                             LIMIT 1)
                     +
                     (SELECT t2.value
                             FROM elbat t2
                             WHERE t2.id = t1.id
                                   AND t2.time > t1.time
                             ORDER BY t2.time ASC
                             LIMIT 1)
                    )
                    /
                    2) value
           FROM elbat t1;
    

    db<>fiddle

    但这只能填补一排宽的空白。如果可能有更大的差距,您必须定义这些行的下一个非空邻居是什么。

    【讨论】:

      【解决方案2】:

      只加入自己,但注意不要 NEXT_VALUE

      SELECT ID_,
         TIME_,
         CASE
            WHEN VALUE_ IS NULL THEN (LAST_VALUE + NEXT_VALUE) / 2
            ELSE VALUE_
         END AS REAL_VALUE
      FROM (SELECT ROW_NUMBER () OVER (PARTITION BY ID_ ORDER BY TIME_ DESC)
                    NOW_ROW_NUM,
                 ID_,
                 TIME_,
                 VALUE_
            FROM TESTTABLE)
         LEFT JOIN (SELECT (ROW_NUMBER ()
                               OVER (PARTITION BY ID_ ORDER BY TIME_ DESC))
                           - 1
                              LAST_ROW_NUM,
                           ID_ AS LAST_ID,
                           VALUE_ AS LAST_VALUE
                      FROM TESTTABLE)
            ON ID_ = LAST_ID AND NOW_ROW_NUM = LAST_ROW_NUM
         LEFT JOIN (SELECT (ROW_NUMBER ()
                               OVER (PARTITION BY ID_ ORDER BY TIME_ DESC))
                           + 1
                              NEXT_ROW_NUM,
                           ID_ AS NEXT_ID,
                           VALUE_ AS NEXT_VALUE
                      FROM TESTTABLE)
            ON ID_ = LAST_ID AND NOW_ROW_NUM = NEXT_ROW_NUM
      

      【讨论】:

        【解决方案3】:

        只需使用lead()lag()。最简单的答案是:

        selet t.*
              (case when value is null
                    then ( lag(value) over (partition by id order by time) + lead(value) over (partition by id order by time) ) / 2
                    else value
               end) as new_value
        from t;
        

        这不适用于第一个或最后一个值。您可以改为使用:

        selet t.*
              (case when value is null
                    then ( avg(value) over (partition by id order by time rows between 1 preceding and 1 following)
                    else value
               end) as new_value
        from t;
        

        这会根据前后行中的可用数据计算平均值。

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2020-10-20
          • 2018-01-16
          • 2016-10-11
          • 1970-01-01
          • 2020-10-21
          • 1970-01-01
          • 2022-11-24
          相关资源
          最近更新 更多