【问题标题】:Remove duplicates from SQL query result in SQLite从 SQLite 中的 SQL 查询结果中删除重复项
【发布时间】:2020-12-25 17:25:02
【问题描述】:

我已经编写了 SQL 查询并得到了以下结果,但我想根据每个产品的 min(sales) 从结果中删除重复的年份。

                SELECT petroleum_product AS Product,
                CAST(year / 5 * 5 AS CHAR) || '-' || 
                    CAST(year / 5 * 5 + 4 AS CHAR) AS Year,                       
                MIN(sale) AS MIN,      
                MAX(sale) AS Max,
                AVG(sale) AS AVG
                FROM REPORT
                GROUP BY Product, Year
                ORDER BY 2;

我从这个查询中得到了以下结果:

('Aviation Turbine Fuel', '2000-2004', 63131, 63131, 63131.0)  
('Aviation Turbine Fuel', '2000-2004', 47453, 47453, 47453.0)
('Aviation Turbine Fuel', '2000-2004', 52839, 52839, 52839.0)
('Aviation Turbine Fuel', '2000-2004', 64041, 64041, 64041.0)
('Aviation Turbine Fuel', '2000-2004', 66825, 66825, 66825.0)
('Diesel', '2000-2004', 326060, 326060, 326060.0)
('Diesel', '2000-2004', 286233, 286233, 286233.0)
('Diesel', '2000-2004', 299973, 299973, 299973.0)
('Diesel', '2000-2004', 299730, 299730, 299730.0)
('Diesel', '2000-2004', 315368, 315368, 315368.0)
('Aviation Turbine Fuel', '2010-2014', 101314, 101314, 101314.0)
('Aviation Turbine Fuel', '2010-2014', 109808, 109808, 109808.0)
('Aviation Turbine Fuel', '2010-2014', 115786, 115786, 115786.0)
('Aviation Turbine Fuel', '2010-2014', 123527, 123527, 123527.0)
('Aviation Turbine Fuel', '2010-2014', 139404, 139404, 139404.0)
('Diesel', '2010-2014', 655128, 655128, 655128.0)
('Diesel', '2010-2014', 648513, 648513, 648513.0)
('Diesel', '2010-2014', 716747, 716747, 716747.0)
('Diesel', '2010-2014', 811100, 811100, 811100.0)
('Diesel', '2010-2014', 901393, 901393, 901393.0)
('Aviation Turbine Fuel', '2005-2009', 64335, 64335, 64335.0)
('Aviation Turbine Fuel', '2005-2009', 63778, 63778, 63778.0)
('Aviation Turbine Fuel', '2005-2009', 68938, 68938, 68938.0)
('Aviation Turbine Fuel', '2005-2009', 68935, 68935, 68935.0)
('Aviation Turbine Fuel', '2005-2009', 82631, 82631, 82631.0)
('Diesel', '2005-2009', 294329, 294329, 294329.0)
('Diesel', '2005-2009', 306687, 306687, 306687.0)
('Diesel', '2005-2009', 302706, 302706, 302706.0)
('Diesel', '2005-2009', 446468, 446468, 446468.0)
('Diesel', '2005-2009', 612505, 612505, 612505.0)

经过我的查询,每年范围内的每个产品都有 5 个结果。但我希望最终结果有 min(sale) 是这 5 个中的最小值,max(sale) 是那些 5 中的最大值和 avg(sale) 是这 5 个中的平均值。其他年份范围的产品也是如此。

查询后的结果应该是这样的:

('Aviation Turbine Fuel', '2000-2004', 47453, 66825, 58857.8)
('Diesel', '2000-2004', 286233, 66825, 305472.8)
('Aviation Turbine Fuel', '2005-2009', 63778, 82631, 69723.4)
('Diesel', '2005-2009', 294329, 612505, 392539)
('Aviation Turbine Fuel', '2010-2014', 101314, 139404, 117967.8)
('Diesel', '2010-2014', 648513, 901393, 746576.2)

【问题讨论】:

    标签: sql sqlite subquery greatest-n-per-group window-functions


    【解决方案1】:

    您可以使用 2 级聚合来做到这一点:

    SELECT Product, 
           Year,
           MIN(min_sale) AS MIN,
           MAX(max_sale) AS MAX,
           AVG(avg_sale) AS AVG
    FROM (
      SELECT petroleum_product AS Product,
             (year / 5 * 5) || '-' || (year / 5 * 5 + 4) AS Year,       
             MIN(sale) AS min_sale,      
             MAX(sale) AS max_sale,
             AVG(sale) AS avg_sale
      FROM REPORT
      GROUP BY Product, Year
    )
    GROUP BY Product, Year
    

    在连接它们之前,无需将整数值转换为 CHAR

    【讨论】:

      【解决方案2】:

      如果我理解正确,您可以使用窗口函数:

      select t.*
      from (
          select 
              petroleum_product as product,
              cast(year / 5 * 5 as char) || '-' || cast(year / 5 * 5 + 4 as char) as year,                       
              min(sale) as min_sale,
              max(sale) as max_sale,
              avg(sale) as avg_sale,
              row_number() over(partition by product order by min(sale)) rn
          from report
          group by product, year
      ) t
      where rn = 1
      order by 2;
      

      从您现有的查询开始,这将为每个 product 提供最少 min(sale) 的行。

      【讨论】:

      • 此代码仅给出一年范围内的一种产品,尽管该产品还有其他年份范围。像 Diesel 有“2000-2004”、“2005-2009”、“2010-2014”等。其他产品也是如此。我想要的是柴油,汽油,......应该有来自每年范围的单一值,并且该值应该基于最小值。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-07-13
      • 1970-01-01
      • 2014-07-03
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多