【问题标题】:Eliminating duplicate rows with precedence优先消除重复行
【发布时间】:2014-07-31 12:49:06
【问题描述】:

我正在研究一个将历史与预测相结合的存储过程。我有一个位列(PHP),它指定投影是否优先于历史。我还有一列指定数据是来自历史表还是投影表。我的存储过程的输出如下所示:

CaseId     Year     Projection    PHP    Gas   Oil
  1        2004         0          1    
  1        2005         0          1    
  1        2005         1          1    
  1        2006         1          1    
  1        2007         1          1    
  1        2008         1          1    
  1        2009         1          1    
  2        2003         0          0    
  2        2004         0          0    
  2        2005         0          0    
  2        2005         1          0    
  2        2006         1          0    
  2        2007         1          0    
  2        2008         1          0    
  2        2006         1          0    

在此示例中,我需要删除第二行,因为对于 CaseId 1 投影具有优先权,因此应删除重叠的历史日期。此外,应该删除 CaseId 2 的第四行,因为历史优先。

CaseId     Year     Projection    PHP    Gas   Oil  
  1        2004         0          1    
  1        2005         1          1    
  1        2006         1          1    
  1        2007         1          1    
  1        2008         1          1    
  1        2009         1          1    
  2        2003         0          0    
  2        2004         0          0    
  2        2005         0          0    
  2        2006         1          0    
  2        2007         1          0    
  2        2008         1          0    
  2        2006         1          0

我需要在 CaseId 中标记重复年份,然后比较 Projection 和 PHP 列并删除它们不匹配的行。

这是我正在处理的查询:

SELECT      rcl.ReportRunCaseId AS CaseId, 
            year(rce.EcoDate) as Year,
            1 as Projection,
            cpq.ProjectionHasPrecedence as PHP,
            rce.GrossOil as Oil,                
            rce.GrossGas as Gas    
  from  phdreports.PhdRpt.ReportCaseList_28 rcl 
         inner join phdreports.PhdRpt.RptCaseEco_28 rce on
            rce.ReportRunCaseId = rcl.ReportRunCaseId
         inner join dbo.caseQualifier cq on 
            cq.CorpScenarioId = 1 and 
            cq.CaseCaseId = rcl.ReportRunCaseId and 
            cq.CorpQualifierTypeId = 1
         inner join dbo.caseProjectionQualifier cpq on 
            cpq.CaseCaseId = rcl.ReportRunCaseId and 
            cpq.CorpQualifierId = cq.QualifierHasData 
where rcl.ReportRunCaseId <=2
group by year(rce.EcoDate), rcl.ReportRunCaseId, cpq.ProjectionHasPrecedence, rce.GrossGas, rce.GrossOil

union all

select      rmp.ReportRunCaseId AS CaseId, 
            year(rmp.EcoDate) as Year,
            0 as Projection,
            cpq.ProjectionHasPrecedence as PHP,
            rmp.GrossOil as Oil,
            rmp.GrossGas as Gas              
from PhdReports.PhdRpt.RptMonthlyProduction_50 rmp
        inner join dbo.caseQualifier cq on 
          cq.CorpScenarioId = 1 and 
          cq.CaseCaseId = rmp.ReportRunCaseId and 
          cq.CorpQualifierTypeId = 1
        inner join dbo.caseProjectionQualifier cpq on 
          cpq.CaseCaseId = rmp.ReportRunCaseId and 
          cpq.CorpQualifierId = cq.QualifierHasData 
where rmp.ReportRunCaseId <= 2
group by year(rmp.EcoDate), rmp.ReportRunCaseId, cpq.ProjectionHasPrecedence, rmp.GrossGas, rmp.GrossOil 

如何消除 Projection 和 PHP 不匹配的重复年份?

【问题讨论】:

    标签: sql sql-server sql-server-2012 duplicates


    【解决方案1】:

    ROW_NUMBER() 函数在这里应该可以帮助您:

    WITH Data AS
    (   SELECT      rcl.ReportRunCaseId AS CaseId, 
                    year(rce.EcoDate) as Year,
                    1 as Projection,
                    cpq.ProjectionHasPrecedence as PHP,
                    rce.GrossOil as Oil,                
                    rce.GrossGas as Gas    
          from  phdreports.PhdRpt.ReportCaseList_28 rcl 
                 inner join phdreports.PhdRpt.RptCaseEco_28 rce on
                    rce.ReportRunCaseId = rcl.ReportRunCaseId
                 inner join dbo.caseQualifier cq on 
                    cq.CorpScenarioId = 1 and 
                    cq.CaseCaseId = rcl.ReportRunCaseId and 
                    cq.CorpQualifierTypeId = 1
                 inner join dbo.caseProjectionQualifier cpq on 
                    cpq.CaseCaseId = rcl.ReportRunCaseId and 
                    cpq.CorpQualifierId = cq.QualifierHasData 
        where rcl.ReportRunCaseId <=2
        group by year(rce.EcoDate), rcl.ReportRunCaseId, cpq.ProjectionHasPrecedence, rce.GrossGas, rce.GrossOil
    
        union all
    
        select      rmp.ReportRunCaseId AS CaseId, 
                    year(rmp.EcoDate) as Year,
                    0 as Projection,
                    cpq.ProjectionHasPrecedence as PHP,
                    rmp.GrossOil as Oil,
                    rmp.GrossGas as Gas              
        from PhdReports.PhdRpt.RptMonthlyProduction_50 rmp
                inner join dbo.caseQualifier cq on 
                  cq.CorpScenarioId = 1 and 
                  cq.CaseCaseId = rmp.ReportRunCaseId and 
                  cq.CorpQualifierTypeId = 1
                inner join dbo.caseProjectionQualifier cpq on 
                  cpq.CaseCaseId = rmp.ReportRunCaseId and 
                  cpq.CorpQualifierId = cq.QualifierHasData 
        where rmp.ReportRunCaseId <= 2
        group by year(rmp.EcoDate), rmp.ReportRunCaseId, cpq.ProjectionHasPrecedence, rmp.GrossGas, rmp.GrossOil
    ), Data2 AS
    (   SELECT  *, 
                RowNum = ROW_NUMBER() OVER(PARTITION BY CaseId, Year 
                                            ORDER BY CASE WHEN PHP = Projection THEN 0 ELSE 1 END DESC, PHP DESC, Projection DESC)
        FROM    Data
    )
    SELECT  CaseId, Year, Projection, PHP, Oil, Gas
    FROM    Data2
    WHERE   RowNum - 1;
    

    只考虑最后一点,因为第一个只是您在公用表表达式中的查询:

    RowNum = ROW_NUMBER() OVER(PARTITION BY CaseId, Year 
                                ORDER BY CASE WHEN PHP = Projection THEN 0 ELSE 1 END DESC, PHP DESC, Projection DESC)
    

    在这里,我们给每个 caseIdyear 元组一个等级,按照 PHP 是否等于投影进行排序。然后最后一部分将结果限制为每个元组的第一行,因此如果存在相等的行,则将采用该行,如果没有相等的行,则将使用不相等的行。

    您可能需要在 order by 中添加更多标准以确保结果是确定性的,即如果您有两行相同的 caseId/Year,其中 PHP 和 projection 都为 1,请确保选择同一行每次。

    【讨论】:

    • 我尝试过使用这种方法,当 PHP 设置为 0 时它会返回投影行。
    • 谢谢。我从来不知道你可以在按订单时使用箱子。
    【解决方案2】:

    我不知道您的查询与该问题有什么关系。所以,让我假设您有一个查询:

    select CaseId, Year, Projection, PHP, Gas, Oil 
    from t
    

    有了这个,你可以用row_number()做你想做的事:

    select CaseId, Year, Projection, PHP, Gas, Oil
    from (select CaseId, Year, Projection, PHP, Gas, Oil,
                 row_number() over (partition by CaseId, Year
                                    order by Projection + PHP desc
                                   ) as seqnum
          from t
         ) t
    where seqnum = 1;
    

    这将根据设置的标志数对行进行优先级排序。在CaseId = 2 的示例中,两行包含相同的值。这将返回这些行之一。如果要在它们之间进行选择,则需要另一列,因此请指定优先级。

    【讨论】:

    • PHP 是我应该使用的优先级。它要么是 1 要么是 0。如果是 1,我应该使用投影行,否则应该使用历史行。
    • @RolandP 。 . .据我所知,这是相同的逻辑。如果 PHP 为 1,则包含项目的行的总和为 2。所以这将是第一个。如果值为 (1, 0), (0, 1),则选择任意行,因为问题没有指定哪一行。在所有情况下,只会选择一行。
    • 我已经结合使用您的代码和 GarethD 的代码解决了这个问题。我将 Projection + PHP 的顺序更改为 Projection = PHP then 0 else 1 end 并且它工作的情况。谢谢你的例子。
    猜你喜欢
    • 2021-02-11
    • 2020-01-02
    • 1970-01-01
    • 2012-09-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-02-11
    • 1970-01-01
    相关资源
    最近更新 更多