左连接没有左表中的重复行答案

【问题标题】：Left Join without duplicate rows from left table左连接没有左表中的重复行
【发布时间】：2014-05-11 06:26:17
【问题描述】：

请看下面的查询：

tbl_Contents

Content_Id  Content_Title    Content_Text
10002   New case Study   New case Study
10003   New case Study   New case Study
10004   New case Study   New case Study
10005   New case Study   New case Study
10006   New case Study   New case Study
10007   New case Study   New case Study
10008   New case Study   New case Study
10009   New case Study   New case Study
10010   SEO News Title   SEO News Text
10011   SEO News Title   SEO News Text
10012   Publish Contents SEO News Text

tbl_Media

Media_Id    Media_Title  Content_Id
1000    New case Study   10012
1001    SEO News Title   10010
1002    SEO News Title   10011
1003    Publish Contents 10012

查询

SELECT 
C.Content_ID,
C.Content_Title,
M.Media_Id

FROM tbl_Contents C
LEFT JOIN tbl_Media M ON M.Content_Id = C.Content_Id 
ORDER BY C.Content_DatePublished ASC

结果

10002   New case Study  2014-03-31 13:39:29.280 NULL
10003   New case Study  2014-03-31 14:23:06.727 NULL
10004   New case Study  2014-03-31 14:25:53.143 NULL
10005   New case Study  2014-03-31 14:26:06.993 NULL
10006   New case Study  2014-03-31 14:30:18.153 NULL
10007   New case Study  2014-03-31 14:30:42.513 NULL
10008   New case Study  2014-03-31 14:31:56.830 NULL
10009   New case Study  2014-03-31 14:35:18.040 NULL
10010   SEO News Title  2014-03-31 15:22:15.983 1001
10011   SEO News Title  2014-03-31 15:22:30.333 1002
10012   Publish         2014-03-31 15:25:11.753 1000
10012   Publish         2014-03-31 15:25:11.753 1003

10012 来了两次...！

我的查询从 tbl_Contents（连接中的左表）返回重复的行

tbl_Contents 中的某些行在 tbl_Media 中有超过 1 个关联的行。我需要 tbl_Contents 中的所有行，即使 tbl_Media 中存在 Null 值但没有重复记录。

【问题讨论】：

这是按设计运行的——行没有重复，它们每个都有不同的media_id。在您的示例中，您会保留哪一行？
你的 tbl_Media 中没有 Content_Id。
我同意这里没有问题。这是两个不同的行。
请检查更新/更正的 tbl_Media - 现在有 Content_Id 列，谢谢

标签： sql join duplicates

【解决方案1】：

试试OUTER APPLY

SELECT 
    C.Content_ID,
    C.Content_Title,
    C.Content_DatePublished,
    M.Media_Id
FROM 
    tbl_Contents C
    OUTER APPLY
    (
        SELECT TOP 1 *
        FROM tbl_Media M 
        WHERE M.Content_Id = C.Content_Id 
    ) m
ORDER BY 
    C.Content_DatePublished ASC

或者，您可以GROUP BY 结果

SELECT 
    C.Content_ID,
    C.Content_Title,
    C.Content_DatePublished,
    M.Media_Id
FROM 
    tbl_Contents C
    LEFT OUTER JOIN tbl_Media M ON M.Content_Id = C.Content_Id 
GROUP BY
    C.Content_ID,
    C.Content_Title,
    C.Content_DatePublished,
    M.Media_Id
ORDER BY
    C.Content_DatePublished ASC

OUTER APPLY 选择与左表中的每一行匹配的单行（或无）。

GROUP BY 执行整个连接，但随后折叠提供的列上的最终结果行。

【讨论】：

在这种情况下使用 GROUP BY 如果与 SUM 或 COUNT 等聚合函数一起使用会导致行被重复计算
仅当聚合函数没有适当考虑这种情况时...
我认为您忘记为 M.Media_Id 的选择添加聚合函数。该列不会自行折叠。在这种形式下，查询应该给出错误恕我直言。
@vargen_ 这是准确的。我编辑了答案以反映正确的聚合

【解决方案2】：

您可以使用带有group by 的通用 SQL 来做到这一点：

SELECT C.Content_ID, C.Content_Title, MAX(M.Media_Id)
FROM tbl_Contents C LEFT JOIN
     tbl_Media M
     ON M.Content_Id = C.Content_Id 
GROUP BY C.Content_ID, C.Content_Title
ORDER BY MAX(C.Content_DatePublished) ASC;

或者使用相关的子查询：

SELECT C.Content_ID, C.Contt_Title,
       (SELECT M.Media_Id
        FROM tbl_Media M
        WHERE M.Content_Id = C.Content_Id
        ORDER BY M.MEDIA_ID DESC
        LIMIT 1
       ) as Media_Id
FROM tbl_Contents C 
ORDER BY C.Content_DatePublished ASC;

当然，limit 1 的语法因数据库而异。可能是top。或rownum = 1。或fetch first 1 rows。或者类似的东西。

【讨论】：

我在 OP 中看到的表格 tbl_Media 没有列 Content_Id。
请检查更新/更正的 tbl_Media - 现在有 Content_Id 列，谢谢
关联子查询的性能会不会很差？我在大型结果集上遇到过这种查询，这很糟糕。正在为匹配的每一行执行附加查询。
@mdon88 。 . .使用tbl_Media(ContentId, Media_id) 上的索引，性能应该很好。

【解决方案3】：

使用 DISTINCT 标志将删除重复的行。

SELECT DISTINCT
C.Content_ID,
C.Content_Title,
M.Media_Id

FROM tbl_Contents C
LEFT JOIN tbl_Media M ON M.Content_Id = C.Content_Id 
ORDER BY C.Content_DatePublished ASC

【讨论】：

在查询中添加DISTINCT 效率非常低（在超过 5000 行的 postgress 中，我发现查询时间增加了 10 倍）