从按查询分组的查询中选择最大值答案

【问题标题】：Selecting highest value from with a grouped by query从按查询分组的查询中选择最大值
【发布时间】：2012-06-21 15:27:16
【问题描述】：

我有一个问题：

select substr(name,7,50) as location, points,sum(if (p1=r1,10,-10))as total from
dq.data 
group by points,location order by location,total desc

产生此数据的设备：

FRANCE  |0|2|0|0|0|0|1  110.0    
FRANCE  |0|2|1|0|1|2|1  100.0    
FRANCE  |0|2|0|0|0|1|1  100.0    
FRANCE  |0|2|1|0|0|1|1  100.0    
FRANCE  |0|2|0|1|1|2|1  100.0    
FRANCE  |0|2|0|0|1|1|1  100.0
GERMANY |1|0|2|2|2|1|0  120.0    
GERMANY |1|0|2|2|2|0|0  110.0    
GERMANY |1|0|2|2|2|2|0  110.0    
GERMANY |1|0|2|2|2|0|2  110.0    
GERMANY |1|0|2|2|2|1|1  110.0

我想达到最高的total 和每个location 的相关points。

我最终应该得到：

FRANCE  |0|2|0|0|0|0|1  110.0
GERMANY |1|0|2|2|2|1|0  120.0

我相信我需要使用子查询和MAX(total)，但我无法让它工作。在子查询中，我想选择points，但我不想按它进行分组，这显然是不允许的。

我该怎么做？

【问题讨论】：

标签： sql google-bigquery

【解决方案1】：

你的直觉是正确的。您可以通过计算最大总数然后将其连接回原始数据来做到这一点：

select t.*
from (select substr(name,7,50) as location, points,sum(if (p1=r1,10,-10))as total
      from dq.data 
      group by points,location
     ) t join
     (select location, max(total) as maxtotal
      from (select substr(name,7,50) as location, points,sum(if (p1=r1,10,-10))as total
            from dq.data 
            group by points,location
           ) t
      group by location
     ) tsum
     on t.location = tsum.location and t.total = tsum.maxtotal

请注意，如果顶部有平局，此版本将返回重复项。

我对 google-biggquery 不是很熟悉。如果它支持“with”语句，那么您可以通过以下方式简化查询：

with t as (select substr(name,7,50) as location, points,sum(if (p1=r1,10,-10))as total
           from dq.data 
           group by points,location
          )
select t.*
from t join
     (select location, max(total) as maxtotal
      from t
      group by location
     ) tsum
     on t.location = tsum.location and t.total = tsum.maxtotal

如果它支持windows函数（例如row_number()），那么你可以完全消除显式连接。

【讨论】：

谢谢戈登，我会试一试的。它不支持 row_number() 并且它也不允许您选择 * （或 t.* 我认为）。我假设我可以硬编码字段的名称？
硬拷贝字段是正确的选择。我只是在答案中使用“*”，因为它输入起来更快。不过，一般来说，您希望明确字段名称。
那很好 - 我试一试，回来报告，如果一切顺利，我会接受。再次感谢
我收到这条消息，很遗憾没有行号：BAD_QUERY（SELECT 子句中聚合和非聚合字段的混合无效）
糟糕，我在子查询中遗漏了 group by。添加“按位置分组”。

【解决方案2】：

我最近遇到了类似的问题，类似这样解决了：

SELECT substr(name,7,50) as location, points,sum(if (p1=r1,10,-10))as total
FROM ( 
   SELECT * FROM dq.data ORDER BY location,sum(if (p1=r1,10,-10)) desc 
) tmp
GROUP BY points,location;

不确定它是否能正常工作，因为我的数据库是 MySQL，但它是一个很好的直观解决方案。按照您希望汇总行退出的方式对子查询进行排序。

【讨论】：

标准 SQL 不支持子查询中的“排序依据”，因此这不适用于 mst 数据库。
我得到：BAD_QUERY（位置出现在 ORDER BY 中，但它不是 SELECT 中的命名列），然后当显式声明它时：BAD_QUERY（表达式 SUM(IF([p1] = [r1] , 10, - 10), DESC) 在 ORDER BY 中无效)
SELECT substr(name,7,50) as location, points,sum(if (p1=r1,10,-10)) as total FROM ( SELECT substr(name,7,50) as location, points,sum(if (p1=r1,10,-10)) as total FROM dq.data ORDER BY location,total desc ) tmp GROUP BY points,location;
找不到任何关于 google biqquery 是否支持子查询中的 order by 的明确信息？