不同的sql行答案

【问题标题】：distinct sql rows不同的sql行
【发布时间】：2015-04-11 05:27:01
【问题描述】：

我正在尝试连接两行并仅获取第一行中包含的行的地址，但我无法获取要计算的值。我有以下查询，总共有 55,059 条记录：

SELECT 
      AccountID,
      AccountParameter1 
 FROM AccountBaseExtension

如果我使用以下 sql，我会得到总计 110,118 的重复行：

SELECT 
      AccountID,
      AccountParameter1,
      AddressParameter1 
 FROM AccountBaseExtension AS A
INNER JOIN CustomerAddressBase AS B ON a.AccountID = b.ParentID

我试图使其与众不同，以便我只是检索客户地址的邮政编码，但下面的查询会产生 56,496 条记录

SELECT 
      DISTINCT AccountID,
      AccountParameter1,
      AddressParameter1 
 FROM AccountBaseExtension AS A
INNER JOIN CustomerAddressBase AS B ON a.AccountID = b.ParentID

谁能告诉我我在这里做错了什么？

【问题讨论】：

抱歉应该在其 SQL server 2008 发布后提及
b.parentID 在 CustomerAddressBase 中是唯一的吗？

标签： sql sql-server-2008 join distinct

【解决方案1】：

原因可能是表AccountBaseExtension 和CustomerAddressBase 之间存在1-->many 关系。即对于同一个 a.AccountID.For 可能有多个 b.parentID。例如，请考虑下表。

AccountBaseExtension

*---------------*-------------------*
|AccountID      |AccountParameter1  |               count=1
*---------------*-------------------*
|123            |xyzc               |
*---------------*-------------------*

客户地址库

*---------------*-------------------*
|ParentID       |AddressParameter1  |                count=3
*---------------*-------------------*
|123            |Addr1              |
*---------------*-------------------*
|123            |Addr2              |
*---------------*-------------------*
|123            |Addr2              |
*---------------*-------------------*

选择帐户ID， AccountParameter1, 地址参数1 FROM AccountBaseExtension 作为一个在 a.AccountID = b.ParentID 上的 INNER JOIN CustomerAddressBase AS B 会导致

*---------------*-------------------*------------------*
|AccountID      |AccountParameter1  |AddressParameter1 |     count=3
*---------------*-------------------*------------------*
|123            |xyzc               |Addr1             |
*---------------*-------------------*------------------*
|123            |xyzc               |Addr2             | 
*---------------*-------------------*------------------*
|123            |xyzc               |Addr2             | 
*---------------*-------------------*------------------*

而 distinct 只会产生 2 个项目的输出。

【讨论】：

这并不能解释带有和不带有'Distinct'的查询之间结果计数的差异。
@TavoloPerUno 更新了我的答案。尝试 SELECT ParentID ,AddressParameter1 FROM CustomerAddressBase （带和不带 DISTINCT），看看您在 CustomerAddressBase 中是否有重复行

【解决方案2】：

AccountID 是主键的一部分还是唯一约束？可能的原因可能是，

AccountID、AccountParameter1、AddressParameter1的组合重复。

测试：尝试在查询中包含 AccountBaseExtension 的唯一列。使用和不使用 distinct 关键字执行它。
AddressParameter1 在地址表中不是唯一的。

测试：尝试包含地址表的唯一列，并查询有无不同表。比如可能有这样的记录：

AccountID                     AccountParameter1        AddressParameter1     ParentID

10223490                      Cisco                    West Tasman Dr        1
10223490                      Cisco                    West Tasman Dr        2

【讨论】：

accountID 是主键
你能用AccountId、AccountParameter1、AddressId试试吗？可能地址表中有多个地址具有相同的 AddressParameter..

【解决方案3】：

原因是表AccountBaseExtension和CustomerAddressBase之间存在1-->many关系。因此JOIN 获取的记录多于第一个表中的记录。可能会有多个 AddressParameter1 用于同一个 AccountID。因此，如果您包含此列，您将获得比不包含此列时稍多的行数。对于统计结果，您可以使用：

SELECT DISTINCT
      AccountID,
      AccountParameter1 
 FROM AccountBaseExtension AS A
INNER JOIN CustomerAddressBase AS B ON a.AccountID = b.ParentID

或者您甚至可以尝试以下方法：

SELECT DISTINCT AccountID, AccountParameter1 FROM 
    (
    SELECT 
          AccountID,
          AccountParameter1,
          AddressParameter1 
     FROM AccountBaseExtension AS A
    INNER JOIN CustomerAddressBase AS B 
    ON a.AccountID = b.ParentID
    )A

【讨论】：

编辑了第二个查询。没有。现在的记录应该更少了。但我永远不会同意如果从 SELECT 子句中删除第三列，记录数会变高。如果有的话，它应该减少。再试一次，伙计。