【问题标题】:Selecting records for deletions based on relationship with previous and next records根据与前一个和下一个记录的关系选择要删除的记录
【发布时间】:2018-01-01 03:20:54
【问题描述】:

我有一个 SQL Server 2014 表,其中包含数百万个 gps 坐标,每个坐标都在特定时间。但是,注册之间的间隔不是固定的,从 1 秒到几个小时不等。我只想每 4 分钟保留一次测量,所以必须删除其他记录。

我在 T-SQL 中尝试了一个遍历每条记录的 WHILE 循环,在循环内有一个带有双重 CROSS APPLY 的 select 语句,仅当它位于相隔不超过 4 分钟的其他 2 条记录之间时才返回一条记录.然而这个策略被证明太慢了。

这可以通过基于集合的解决方案来完成吗?或者有没有办法加快这个查询? (下面的测试查询只是打印,尚未删除)

SELECT * INTO #myTemp FROM gps ORDER BY TimePoint asc 

declare @Id Uniqueidentifier
declare @d1 varchar(19)
declare @d2 varchar(19)
declare @d3 varchar(19)


While EXISTS (select * from #myTemp )
BEGIN
    select top 1 @Id = ID FROM #myTemp order by TimePoint asc

    SELECT 
        @d1 = convert(varchar(19), a.justbefore, 121), 
        @d2 = convert(varchar(19), b.tijdstip, 121),
        @d3 = convert(varchar(19), c.justafter, 121)
    FROM Gps B CROSS APPLY 
        (
            SELECT  top 1 TimePoint as justbefore
            FROM Gps
            WHERE    (B.TimePoint > TimePoint ) AND (B.Id = @Id )
            ORDER by TimePoint desc 
        ) A 
        CROSS APPLY (
            SELECT  top 1 TimePoint as justafter
            FROM Gps
            WHERE   (Datediff(n,A.justbefore,TimePoint ) between -4 AND 0) 
                    AND (B.TimePoint < TimePoint )
            ORDER by TimePoint asc
        ) C

    print 'ID=' + Cast(@id as varchar(50)) 
                + ' / d1=' + @d1 + ' / d2=' + @d2 + ' / d3=' + @d3                   

    DELETE #myTemp where Id = @id   
END

--

 Sample data:
    Id     TimePoint            Lat      Lon
    1      20170725 13:05:27    12,256   24,123
    2      20170725 13:10:27    12,254   24,120
    3      20170725 13:10:29    12,253   24,125  
    4      20170725 13:11:55    12,259   24,127
    5      20170725 13:11:59    12,255   24,123
    6      20170725 13:14:28    12,254   24,126
    7      20170725 13:16:52    12,259   24,121
    8      20170725 13:20:53    12,257   24,125

在这种情况下,应删除记录 3、4、5。 记录 7 应该保留,因为 7 和 8 之间的间隔超过 4 分钟。

【问题讨论】:

  • 您能发布一些示例数据和预期结果吗?
  • 我同意样本数据和预期结果将使这更容易解决。但我建议搜索 Gaps and Islands 有很多例子。诀窍是您需要将记录分组为 4 分钟增量,并能够识别每个组中的第一条记录。

标签: sql-server tsql sql-server-2014


【解决方案1】:

查看数字...看起来 1 和 2 次停留(相隔 5 分钟)...3、4 和 5 应该去... 6 次停留(从 2 开始 4 分钟)... 7 应该去(距离 6 仅 2 分钟)和 8 次住宿(距离 6 仅 6 分钟)...

If this is correct, the following will do what you're looking for...


IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL 
DROP TABLE #TestData;

CREATE TABLE #TestData (
    Id INT NOT NULL PRIMARY KEY CLUSTERED,
    TimePoint DATETIME2(0) NOT NULL,
    Lat DECIMAL(9,3),
    Lon DECIMAL(9,3)
    );  

INSERT #TestData (Id, TimePoint, Lat, Lon) VALUES
    (1, '20170725 13:05:27', 12.256, 24.123),
    (2, '20170725 13:10:27', 12.254, 24.120),
    (3, '20170725 13:10:29', 12.253, 24.125),  
    (4, '20170725 13:11:55', 12.259, 24.127),
    (5, '20170725 13:11:59', 12.255, 24.123),
    (6, '20170725 13:14:28', 12.254, 24.126),
    (7, '20170725 13:16:52', 12.259, 24.121),
    (8, '20170725 13:20:53', 12.257, 24.125);

--  SELECT * FROM #TestData td;

--================================================================================

WITH 
    cte_AddLag AS (
        SELECT 
            td.Id, td.TimePoint, td.Lat, td.Lon,
            MinFromPrev = DATEDIFF(mi, LAG(td.TimePoint, 1) OVER (ORDER BY td.TimePoint), td.TimePoint)
        FROM
            #TestData td
        ),
    cte_TimeGroup AS (
        SELECT 
            *,
            TimeGroup = ISNULL(SUM(al.MinFromPrev) OVER (ORDER BY al.TimePoint ROWS UNBOUNDED PRECEDING) / 4, 0)
        FROM
            cte_AddLag al
        )
SELECT TOP 1 WITH TIES 
    tg.Id, 
    tg.TimePoint, 
    tg.Lat, 
    tg.Lon
FROM
    cte_TimeGroup tg
ORDER BY 
    ROW_NUMBER() OVER (PARTITION BY tg.TimeGroup ORDER BY tg.TimePoint);

结果...

Id          TimePoint                   Lat                                     Lon
----------- --------------------------- --------------------------------------- ---------------------------------------
1           2017-07-25 13:05:27         12.256                                  24.123
2           2017-07-25 13:10:27         12.254                                  24.120
6           2017-07-25 13:14:28         12.254                                  24.126
8           2017-07-25 13:20:53         12.257                                  24.125

HTH,杰森

【讨论】:

  • 理论上第 7 点应该保留,因为目的是使间隔尽可能接近 4 分钟。通过删除第 7 点,第 6 点和第 8 点之间的间隔变得超过 6 分钟。
  • 根本不会有一种有效的方法来做到这一点。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2015-11-12
  • 2016-03-25
相关资源
最近更新 更多