【问题标题】:SQL Server: Only last entry in GROUP BYSQL Server:仅 GROUP BY 中的最后一个条目
【发布时间】:2010-10-01 03:16:32
【问题描述】:

我在MSSQL2005中有下表

id | business_key | result
1 | 1 | 0
2 | 1 | 1
3 | 2 | 1
4 | 3 | 1
5 | 4 | 1
6 | 4 | 0

现在我想根据返回具有最高 id 的完整条目的 business_key 进行分组。 所以我的预期结果是:

business_key | result
1 | 1
2 | 1
3 | 1
4 | 0

我敢打赌,有一种方法可以实现这一点,我只是暂时看不到它。

【问题讨论】:

    标签: sql sql-server group-by


    【解决方案1】:

    另一种解决方案,它可能会为您提供更好的性能(测试两种​​方式并检查执行计划):

    SELECT
         T1.id,
         T1.business_key,
         T1.result
    FROM
         dbo.My_Table T1
    LEFT OUTER JOIN dbo.My_Table T2 ON
         T2.business_key = T1.business_key AND
         T2.id > T1.id
    WHERE
         T2.id IS NULL
    

    此查询假定 ID 是唯一值(至少对于任何给定的 business_key)并且它设置为 NOT NULL。

    【讨论】:

    【解决方案2】:
    select
      drv.business_key,
      mytable.result
    from mytable
      inner join
      (
        select 
          business_key, 
          max(id) as max_id
        from mytable
        group by
          business_key
      ) as drv on
        mytable.id = drv.max_id
    

    【讨论】:

      【解决方案3】:

      试试这个

      select  business_key, 
              result
      from    myTable
      where   id in 
              (select max(id)
              from    myTable
              group by business_key)
      

      编辑:我创建了表来测试我的代码。我把它包括在下面,以防其他人想测试它。

      SET ANSI_NULLS ON
      GO
      SET QUOTED_IDENTIFIER ON
      GO
      CREATE TABLE [dbo].[myTable](
          [id] [int] NOT NULL,
          [business_key] [int] NOT NULL,
          [result] [int] NOT NULL
      ) ON [PRIMARY]
      go
      
      insert into myTable values(1,1,0);
      insert into myTable values(2,1,1);
      insert into myTable values(3,2,1);
      insert into myTable values(4,3,1);
      insert into myTable values(5,4,1);
      insert into myTable values(6,4,0);
      
      select  * from mytable
      

      【讨论】:

      • 该查询将只显示一行数据,而不是四行。您需要使子查询相关 - 您需要 MyTable 的两个实例的别名(称它们为“第一”和“第二”);添加 WHERE first.id = second.id
      • Jonathan-您是对的,where 子句中有错字。它应该是 'in' 而不是 '='。但别名不是必需的。谢谢指出我的错误。
      【解决方案4】:
      select business_key, 
             result
          from 
          (select id, 
              business_key, 
              result, 
              max(id) over (partition by business_key) as max_id
          from mytable) x
      where id = max_id
      

      【讨论】:

      • 这应该是公认的答案,因为这个查询比公认的要高效得多。
      • 事实并非如此。我一次性在 MSSQL 2012 R2 上运行了两个查询,我从生成的执行计划中发现子查询部分使用了 68% 的已用时间。非常分区使用了整个第二个查询的 77%。
      【解决方案5】:

      这是一篇较旧的帖子,但与我目前正在做的事情相关(2013 年)。如果您获得更大的数据集(在大多数数据库中很典型),则各种查询的性能(查看执行计划)说明了很多。首先我们创建一个“TALLY 表”来随机生成数字,然后使用任意公式为“MyTable”创建数据:

      CREATE TABLE #myTable(
          [id] [int] NOT NULL,
          [business_key] [int] NOT NULL,
          [result] [int] NOT NULL,
          PRIMARY KEY (Id)
      ) ON [PRIMARY];
      
      ; WITH
          -- Tally table Gen            Tally Rows:     X2                X3
      t1 AS (SELECT 1 N UNION ALL SELECT 1 N),    -- 4            ,    8
      t2 AS (SELECT 1 N FROM t1 x, t1 y),            -- 16            ,    64
      t3 AS (SELECT 1 N FROM t2 x, t2 y),            -- 256            ,    4096
      t4 AS (SELECT 1 N FROM t3 x, t3 y),            -- 65536        ,    16,777,216
      t5 AS (SELECT 1 N FROM t4 x, t4 y),            -- 4,294,967,296,    A lot
      Tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N
                FROM t5 x, t5 y)
      
      INSERT INTO #MyTable 
      SELECT N, CAST(N/RAND(N/8) AS bigINT)/5 , N%2
      FROM Tally
      WHERE N < 500000
      

      接下来我们运行三种不同类型的查询来检查性能(如果您使用的是 SQL Server Management Studio,请打开“实际执行计划”):

      SET STATISTICS IO ON
      SET STATISTICS TIME ON
      ----- Try #1 
      select  'T1' AS Qry, id, business_key, 
              result
      from    #myTable
      where   id in 
              (select max(id)
              from    #myTable
              group by business_key)
      
      ---- Try #2 
      select 'T2' AS Qry, id, business_key, 
             result
          from 
          (select id, 
              business_key, 
              result, 
              max(id) over (partition by business_key) as max_id
          from #mytable) x
      where id = max_id
      
      ---- Try #3 
      ;with cteRowNumber as (
          select id, 
              business_key, 
              result,
                 row_number() over(partition by business_key order by id desc) as RowNum
              from #mytable
      )
      
      SELECT 'T3' AS Qry, id, business_key, 
             result
      FROM cteRowNumber
      WHERE RowNum = 1
      

      清理:

      IF OBJECT_ID(N'TempDB..#myTable',N'U') IS NOT NULL 
          DROP TABLE #myTable;
          SET STATISTICS IO OFF
      SET STATISTICS TIME OFF
      

      您会发现,查看执行计划,“Try 1”具有最好的“查询成本”和最低的 CPU 时间,但“Try 3”的读取次数最少,CPU 时间也不算太差。我建议使用 CTE 方法来减少阅读次数

      【讨论】:

        猜你喜欢
        • 2011-04-17
        • 2016-07-02
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-03-03
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多