【问题标题】:Filtering bad data in SQL Server 2005在 SQL Server 2005 中过滤不良数据
【发布时间】:2011-03-07 22:25:40
【问题描述】:

使用 SQL Server 2005

我有一个包含以下列的表格

身份证 姓名 日期 价值

我想从表中选择按日期没有四个连续零的所有行。我该怎么做?下面是我的意思的一个例子。

id      name     date         value
1       a        1/1/2010     5
2       a        1/2/2010     3
3       a        1/3/2010     5
4       a        1/4/2010     0
5       a        1/7/2010     0
6       a        1/8/2010     0
7       a        1/9/2010     2
8       a        1/10/2010    3
9       a        1/11/2010    0
10      a        1/15/2010    0
11      a        1/16/2010    0
12      a        1/17/2010    0
13      a        1/20/2010    4
14      a        1/21/2010    4

我希望查询结果包括除 id 9-12 之外的所有行。

【问题讨论】:

  • 很有趣,只是想知道是否需要这个......这是业务规则还是只是学习过程?
  • 这是业务需求。我们需要从汇总计算中消除不良数据点。还有所有的爵士乐。

标签: sql sql-server sql-server-2005


【解决方案1】:

这是假设您按 ID 对行进行排序,但您可以简单地将 ORDER BY id 更改为其他值,它仍然可以工作。

使用this Kodyaz Development Resources site 上的 T-SQL CTE,我能够创建以下代码。我让它工作,所以它删除了有两个连续零的行,而不是 4,因为我在我的代码上测试了它,只是更改了表/行名称。

WITH CTE as (
  SELECT
    RN = ROW_NUMBER() OVER (ORDER BY id),
    *
  FROM tablename
)
SELECT
  [Current Row].*
FROM CTE [Current Row]
LEFT JOIN CTE [Previous Row] ON
  [Previous Row].RN = [Current Row].RN - 1
LEFT JOIN CTE [Next Row] ON
  [Next Row].RN = [Current Row].RN + 1
WHERE
  not([Current Row].value = 0 AND [Next Row].value = 0) AND  
     // this deletes the row where value is zero and the next rows value is zero
  not([Previous Row].value = 0 AND [Current Row].value = 0) 
     // this deletes the row where value is zero and the previous rows value is zero

要使其适用于您的案例,您所要做的就是将每个可能的组合放入WHERE 语句中。例如处理这一行和接下来的 3 行等于 0 或这行上一行和下一行 2。

【讨论】:

  • 使用 ROW_NUMBER 来保证能够通过简单的 +1 找到下一行的好主意
【解决方案2】:

您没有提及名称是如何涉及的,因此我假设您希望按名称完成此操作。我将进一步假设,当您谈论“连续”时,您的意思是按日期顺序,而不是按 id 顺序。最后,我还将假设您还会连续排除 5 个零、连续排除 6 个零等。

可能有更简单的方法,但这应该可行:

;WITH Transitions_To_CTE AS
(
    SELECT
        T1.id,
        T1.name,
        T1.date,
        T1.value
    FROM
        My_Table T1
    LEFT OUTER JOIN My_Table T2 ON
        T2.name = T1.name AND
        T2.date < T1.date AND
        T2.value <> 0
    LEFT OUTER JOIN My_Table T3 ON
        T3.name = T1.name AND
        T3.date > COALESCE(T2.date, '1900-01-01') AND
        T3.date < T1.date
    WHERE
        T1.value = 0 AND
        T3.id IS NULL
),
Transitions_From_CTE AS
(
    SELECT
        T1.id,
        T1.name,
        T1.date,
        T1.value
    FROM
        My_Table T1
    LEFT OUTER JOIN My_Table T2 ON
        T2.name = T1.name AND
        T2.date > T1.date AND
        T2.value <> 0
    LEFT OUTER JOIN My_Table T3 ON
        T3.name = T1.name AND
        T3.date < COALESCE(T2.date, '9999-12-31') AND
        T3.date > T1.date
    WHERE
        T1.value = 0 AND
        T3.id IS NULL
),
Range_Exclusions AS
(
    SELECT
        S.name,
        S.date AS start_date,
        E.date AS end_date
    FROM
        Transitions_To_CTE S
    INNER JOIN Transitions_From_CTE E ON
        E.name = S.name AND
        E.date > S.date
    LEFT OUTER JOIN Transitions_From_CTE E2 ON
        E2.name = S.name AND
        E2.date > S.date AND
        E2.date < E.date
    WHERE
        E2.id IS NULL AND
        (SELECT COUNT(*) FROM dbo.My_Table T WHERE T.name = S.name AND T.date BETWEEN S.date AND E.date) >= 4
)
SELECT
    T.id,
    T.name,
    T.date,
    T.value
FROM
    dbo.My_Table T
WHERE
    NOT EXISTS (SELECT * FROM Range_Exclusions RE WHERE RE.name = T.name AND T.date BETWEEN RE.start_date AND RE.end_date)

【讨论】:

  • 谢谢。 +1 为您的回答和差点击败我。
【解决方案3】:

这是我的尝试,使用递归 cte 计算连续零的数量,然后使用级别 > 4 创建一个 ID 序列,然后简单地对 id 执行 not in 子句。

with trend --work out number of consecutive zeros using level
as
(Select 1 as level, id, value, id as startid
    from IdsAndValues
    Union All
    Select [Level]+1, P.ID, p.value, t.startid
    From IdsAndValues as p
        Inner Join trend as t on p.id = t.id+1
    Where t.value =0 and p.value=0
)
,IDs --create sequence of ids using startid and  id, this allows us to do the not in
as
(   
    Select  startid as ExcludeID ,id 
    from trend as t--
    Where level>=4
    Union All
    Select ExcludeID +1, id
    From ids 
    where ExcludeID <id 
)

Select *
from IdsAndValues
Where id Not in
    (Select ExcludeID from IDs)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-12-14
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多