【问题标题】:Getting single row from JOIN given an additional condition在给定附加条件的情况下从 JOIN 获取单行
【发布时间】:2020-08-09 09:21:40
【问题描述】:

我正在选择一个年份(硬编码为下面的 1981),我希望每个合格的乐队都有一个行。主要问题是为每个乐队找到最年长的在世成员:

SELECT b.id_band,
    COUNT(DISTINCT a.id_album),
    COUNT(DISTINCT s.id_song),
    COUNT(DISTINCT m.id_musician),
    (SELECT name FROM MUSICIAN WHERE year_death IS NULL ORDER BY(birth)LIMIT 1)
FROM BAND b
    LEFT JOIN ALBUM a ON(b.id_band  = a.id_band)
    LEFT JOIN SONG  s ON(a.id_album = s.id_album)
    JOIN MEMBER m ON(b.id_band= m.id_band)
    JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)

  /*LEFT JOIN(SELECT name FROM MUSICIAN WHERE year_death IS NULL
              ORDER BY(birth) LIMIT 1) AS alive FROM mu*/ -- ??

WHERE b.year_formed = 1981
GROUP BY b.id_band;

我想为每个乐队从mu 获取最年长的在世成员。但我只是从关系MUSICIAN 中得到了最年长的音乐家。

这是显示我当前查询的输出的屏幕截图:

【问题讨论】:

  • 如果你想从每个乐队中获得最年长的成员,你可以使用window function
  • 您的设计是否允许同一首歌 (id_song) 在一张专辑或同一乐队的多张专辑中多次出现?如果没有,这可以更快......另外,是否有没有歌曲的专辑/没有专辑的乐队/没有成员?请(始终)声明您的 Postgres 版本。

标签: sql postgresql left-join distinct greatest-n-per-group


【解决方案1】:

好吧,我认为你可以按照你所拥有的结构,但你需要在子查询中JOINs。

SELECT b.id_band,
       COUNT(DISTINCT a.id_album),
       COUNT(DISTINCT s.id_song),
       COUNT(DISTINCT mem.id_musician),
       (SELECT m.name
        FROM MUSICIAN m JOIN
             MEMBER mem
             ON mem.id_musician = m.id_musician
        WHERE m.year_death IS NULL AND mem.id_band = b.id_band
        ORDER BY m.birth
        LIMIT 1
       ) as oldest_member
FROM BAND b LEFT JOIN
     ALBUM a
     ON b.id_band  = a.id_band LEFT JOIN
     SONG s
     ON a.id_album = s.id_album LEFT JOIN
     MEMBER mem
     ON mem.id_band = b.id_band
WHERE b.year_formed = 1981       
GROUP BY b.id_band

【讨论】:

    【解决方案2】:

    以下查询将为您提供每个乐队组中最年长的成员。如果需要,您可以按year_formed = 1981 过滤。

    SELECT
        b.id_band,
        total_albums,
        total_songs,
        total_musicians
    FROM
    (
        SELECT b.id_band,
            COUNT(DISTINCT a.id_album) as total_albums,
            COUNT(DISTINCT s.id_song) as total_songs,
            COUNT(DISTINCT m.id_musician) as total_musicians,
            dense_rank() over (partition by b.id_band order by mu.year_death desc) as rnk
        FROM BAND b
            LEFT JOIN ALBUM a ON(b.id_band  = a.id_band)
            LEFT JOIN SONG  s ON(a.id_album = s.id_album)
            JOIN MEMBER m ON(b.id_band= m.id_band)
            JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
        WHERE mu.year_death is NULL
    )
    
    where rnk = 1
    

    【讨论】:

      【解决方案3】:

      你可以引用一个不在这个嵌套选择中的表,像这样

      SELECT b.id_band,
      COUNT(DISTINCT a.id_album),
      COUNT(DISTINCT s.id_song),
      COUNT(DISTINCT m.id_musician),
      (SELECT name FROM MUSICIAN WHERE year_death IS NULL ORDER BY(birth) AND 
      MUSICIAN.id_BAND = b.id_band LIMIT 1)
      FROM BAND b
      LEFT JOIN ALBUM a ON(b.id_band  = a.id_band)
      LEFT JOIN SONG  s ON(a.id_album = s.id_album)
      JOIN MEMBER m ON(b.id_band= m.id_band)
      JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
      
      /*LEFT JOIN(SELECT name FROM MUSICIAN WHERE year_death IS NULL ORDER 
      BY(birth)LIMIT 1) AS alive FROM mu*/
      WHERE b.year_formed= 1981       
      GROUP BY b.id_band
      

      【讨论】:

      • 太棒了!您能否编辑您的答案以说明 MUSICIAN 不包含 id_band,但表 MEMBER 将每个 id_musician 与一个 id_band 相关联?
      【解决方案4】:

      对于要查找“按年龄划分的最大人数”的查询,您可以使用按乐队分组的 ROW_NUMBER()

      SELECT b.id_band,
          COUNT(DISTINCT a.id_album),
          COUNT(DISTINCT s.id_song),
          COUNT(DISTINCT m.id_musician),
          oldest_living_members.*
      FROM 
          band b
          LEFT JOIN album a ON(b.id_band  = a.id_band)
          LEFT JOIN song s ON(a.id_album = s.id_album)
          LEFT JOIN 
          (
            SELECT
               m.id_band
               mu.*,
               ROW_NUMBER() OVER(PARTITION BY m.id_band ORDER BY mu.birthdate ASC) rown
             FROM
               MEMBER m
               JOIN MUSICIAN mu ON(m.id_musician = mu.id_musician)
             WHERE year_death IS NULL
           ) oldest_living_members 
           ON 
               b.id_band = oldest_living_members.id_band AND
               oldest_living_members.rown = 1
      WHERE b.year_formed= 1981       
      GROUP BY b.id_band
      

      如果您只运行子查询,您将看到它是如何工作的 = 艺术家加入成员以获取乐队 ID,这形成了一个分区。行号将根据生日的顺序从 1 开始编号(我不知道你的生日列名是什么;你必须编辑它)所以最年长的人(最早的生日)得到 1.. 每次乐队 ID 更改后,编号将从 1 重新开始,那个乐队中最年长的人。然后当我们加入它时,我们只需选择 1s

      【讨论】:

        【解决方案5】:

        我认为这应该更快(同时也解决您的问题):

        SELECT b.id_band, a.*, m.*
        FROM   band b
        LEFT   JOIN LATERAL (
           SELECT count(*) AS ct_albums, sum(ct_songs) AS ct_songs
           FROM  (
              SELECT id_album, count(*) AS ct_songs
              FROM   album a
              LEFT   JOIN song s USING (id_album)
              WHERE  a.id_band = b.id_band
              GROUP  BY 1
              ) ab
           ) a ON true
        LEFT   JOIN LATERAL (
           SELECT count(*) OVER () AS ct_musicians
                , name AS senior_member  -- any other columns you need?
           FROM   member   m
           JOIN   musician mu USING (id_musician)
           WHERE  m.id_band  = b.id_band
           ORDER  BY year_death IS NOT NULL  -- sorts the living first
                   , birth
                   , name  -- as tiebreaker (my optional addition)
           LIMIT  1
           ) m ON true
        WHERE  b.year_formed = 1981;
        

        LATERAL 子查询m 中解决了获取高级乐队成员的问题 - 无需增加基本查询的成本。它之所以有效,是因为在应用 ORDER BYLIMIT 之前计算了窗口函数 count(*) OVER ()。由于乐队自然只有很少的成员,这应该是最快的方式。见:

        计算专辑和歌曲的另一个优化是基于相同的id_song 永远不会包含在同一乐队的多个专辑中的假设。否则,这些会被计算多次。 (很容易解决,并且与获得高级乐队成员的任务无关。)

        关键是在 N 端重复乘以行之后,在顶层消除对 DISTINCT 的需要(我喜欢称之为“代理交叉连接”)。这可能会在派生表中产生大量行,而无需。

        此外,与其他一些查询样式相比,检索额外的列(例如为高级乐队成员提供更多列)要方便得多。

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2021-05-23
          • 2011-06-16
          • 2014-07-02
          • 1970-01-01
          • 2015-01-12
          • 1970-01-01
          • 2014-05-18
          • 1970-01-01
          相关资源
          最近更新 更多