【问题标题】:Returning column duplicates in PostgreSQL在 PostgreSQL 中返回列重复项
【发布时间】:2014-03-12 02:03:07
【问题描述】:

我正在学习 PostgreSQL,并且正在努力解决这个问题。

我已经设置了一个 olympics 表,我从中查询并返回结果。我正在查询以查找国家和他们赢得的金牌数量,如下所示:

SELECT country, golds 
FROM (SELECT distinct country, sum(gold_medals) as golds
      FROM olympics where year >= 2000 group by country
     ) foo
WHERE (golds < 10) 
ORDER BY golds desc limit 10;

这准确地返回:

   country   | golds 
-------------+-------
 Turkey      |     9
 Bulgaria    |     8
 Azerbaijan  |     6
 Estonia     |     6
 Georgia     |     6
 North Korea |     6
 Thailand    |     6
 Nigeria     |     6
 Uzbekistan  |     5
 Lithuania   |     5

我需要返回在这段时间内赢得相同数量金牌的国家(即立陶宛和乌兹别克斯坦获得 5 枚金牌,以及所有国家获得 6 枚金牌)。

我该怎么做呢?

【问题讨论】:

标签: sql postgresql aggregate common-table-expression exists


【解决方案1】:

@ErwinBrandstetter 的解决方案有效,但为了完整起见,我还将添加 array_agg 版本,它将国家集中在一起作为单个单元格中的字符串数组:

WITH golds as (
    select
        sum(gold_medals) golds,
        country
    from olympics
    where year >= 2000
    group by country
    )

select 
    golds, 
    array_length(array_agg(country),1) n_countries, 
    array_agg(country) countries
from golds
group by golds
having array_length(array_agg(country),1) > 1
order by golds asc

-- golds , n_countries , countries
--   5   ,     2       , '{lithuania,uzbekistan}'
--   6   ,     6       , '{thailand,"north korea",azerbaijan,nigeria,estonia,georgia}'

【讨论】:

  • 不错的补充。我有一些不适合评论的建议。将其添加到我的答案中,您可能会感兴趣。
【解决方案2】:

运行另一个聚合。这次GROUP BY金牌数。然后JOIN

WITH cte AS (
   SELECT country, sum(gold_medals) AS golds
   FROM   olympics
   WHERE  year >= 2000
   GROUP  BY country
   HAVING sum(gold_medals) < 10   -- ? No more allowed?
   )
SELECT c.*
FROM   cte c
JOIN  (
    SELECT golds
    FROM   cte
    HAVING count(*) > 1
    ) ties USING (golds)
ORDER  BY golds DESC, country
LIMIT  10;
  • 从原始查询中删除无用的DISTINCTGROUP BY 已经完成了这项工作。

  • 我正在使用CTE 来简化我的工作。

EXISTS半连接的替代方案:

WITH cte AS (
   SELECT country, sum(gold_medals) AS golds
   FROM   olympics
   WHERE  year >= 2000
   GROUP  BY country
   HAVING sum(gold_medals) < 10   -- ? No more allowed?
   )
SELECT c.*
FROM   cte c
AND    EXISTS (
   SELECT 1
   FROM   cte
   WHERE  country <> c.country -- exclude "self"
   AND    golds  = c.golds
   )
ORDER  BY golds DESC, country
LIMIT  10;

或者,要将最多 10 枚金牌的国家/地区排列成每金牌数量的单行,但前提是有 2 个或更多国家/地区:

SELECT golds, array_agg(country) AS country_list
FROM  (
   SELECT country, sum(gold_medals) AS golds
   FROM   olympics
   WHERE  year >= 2000
   GROUP  BY country
   HAVING sum(gold_medals) < 10   -- ? No more allowed?
   ) sub
GROUP   BY golds
HAVING  count(*) > 1
ORDER   BY 1  DESC;
-- LIMIT   10;    -- not needed, there cannot be more than 10 in this case.

最后一个基本上是@Nisan.H´s answer的简化版。

【讨论】:

    【解决方案3】:

    尝试按国家和金牌分组

    SELECT country, golds 
    FROM (SELECT distinct country, sum(gold_medals) as golds FROM 
    olympics where year >= 2000 group by country, golds) foo
    WHERE (golds < 10) ORDER BY golds desc limit 10;
    

    【讨论】:

    • 这与检测重复没有任何关系。
    猜你喜欢
    • 2015-10-04
    • 1970-01-01
    • 2021-04-05
    • 1970-01-01
    • 2014-04-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多