【问题标题】:Why is SQL doing an inner join where an outer join is needed为什么 SQL 在需要外连接的地方进行内连接
【发布时间】:2017-12-25 21:01:49
【问题描述】:

我有两个表,我想使用 SQL “外部”连接(然后获取)。确切的 SQL 查询(有问题)是:

SELECT
    LEFT(a.cusip, 6) AS cusip6, 
    a.date, a.prc, a.ret, a.vol, a.spread, a.shrout,
    b.epsf12, (b.seqq-b.pstkq) / b.cshoq AS bps
FROM
    crsp.msf a 
FULL JOIN 
    compa.fundq b ON (LEFT(a.cusip, 6) = LEFT(b.cusip, 6) 
                  AND a.date = b.datadate)
WHERE 
    (b.datadate BETWEEN '2010-01-01' and '2015-12-31') 
    AND (a.date BETWEEN '2010-01-01' and '2015-12-31') 
    AND (b.cshoq > 0)

这将返回 670'293 行。

但是当我分别获取两个数据集并(外部)通过 R-merge() 加入它们时,我得到 1'182'093 行。我使用的两个单独的查询是:

SELECT  
    LEFT(cusip, 6) AS cusip6, date, prc, ret, vol, spread, shrout 
FROM
    crsp.msf 
WHERE 
    date BETWEEN '2010-01-01' and '2015-12-31'

SELECT 
    LEFT(cusip, 6) AS cusip6, datadate AS date, epsf12, 
    (seqq-pstkq)/cshoq AS bps 
FROM
    compa.fundq 
WHERE 
    datadate BETWEEN '2010-01-01' and '2015-12-31' 
    AND cshoq > 0

然后我合并(外部连接)使用:

merge(x = data_1, y = data_2, by.x = c("cusip6", "date"), by.y = c("cusip6", "date"), all = T)

这将返回 1'182'093 行,这是正确的。因此,当我明确指定外连接时,我原来的(第一个)SQL 查询实际上是在执行“内连接”。下面的 R-merge() 返回 670'293 行,重新验证从 SQL 获取的数据确实是一个内连接。

merge(x = data_1, y = data_2, by.x = c("cusip6", "date"), by.y = c("cusip6", "date"))

我的 SQL 查询做错了什么?

【问题讨论】:

  • 这适用于哪个 RDBMS?请添加标签以指定您使用的是mysqlpostgresqlsql-serveroracle 还是db2 - 或其他完全不同的东西。
  • a 中的列在没有匹配行时如何填充?这将如何与 where 子句一起使用,该子句对来自 a 的某些列具有显式子句?

标签: sql join merge


【解决方案1】:

因为 WHERE 子句应用在 JOIN 之后。此时有 NULL 值(作为“失败”JOIN 的结果),这些行在 WHERE 子句中失败。

如果您需要 OUTER JOIN 和过滤器,请将过滤器放在 JOIN 或子查询中。

SELECT
    LEFT(a.cusip, 6) AS cusip6, 
    a.date, a.prc, a.ret, a.vol, a.spread, a.shrout,
    b.epsf12, (b.seqq-b.pstkq) / b.cshoq AS bps
FROM
    (SELECT * FROM crsp.msf WHERE date BETWEEN '2010-01-01' and '2015-12-31') a
FULL JOIN 
    (SELECT * FROM compa.fundq WHERE datadate BETWEEN '2010-01-01' and '2015-12-31' AND cshoq > 0) b
        ON  LEFT(a.cusip, 6) = LEFT(b.cusip, 6) 
        AND a.date = b.datadate

【讨论】:

    猜你喜欢
    • 2018-01-04
    • 1970-01-01
    • 2015-01-28
    • 2011-04-22
    • 1970-01-01
    • 1970-01-01
    • 2013-08-25
    • 2015-03-23
    • 1970-01-01
    相关资源
    最近更新 更多