【问题标题】:Joining tables when data may or may not exist在数据可能存在或可能不存在时连接表
【发布时间】:2012-11-10 02:09:33
【问题描述】:

首先让我说我没有设计这个数据库;只是尝试使用它。

我正在尝试检索一组自行车的故障,其中最重要的决定因素是自行车中的任何部件是否具有特定属性。该属性在部件表中设置。该零件是装配的一部分,它引用了更大的装配。该组件可能具有分配给它的特定自行车类型;如果不是,我们假设所有自行车类型都分配给组件。这些零件也可能分配有特定的自行车,由序列号标识。

所以,我们可以假设:

  1. 故障表中的记录将始终包含序列号、更高的组件和自行车类型。
  2. 零件的装配将始终具有对更高装配的引用
  3. 零件的装配可能有也可能没有参考自行车类型。
  4. 零件可能有也可能没有具体的序列号

在搜索具有特定属性的零件的故障时,如果该零件引用了特定的自行车,我们只想找到那些。如果没有,并且零件的装配有对特定自行车类型的引用,我们只想找到与具有这些类型的引用并包含这些零件的装配相关的故障。否则,我们希望找到与包含这些零件的更高组件相关的所有故障。

我的问题是,如果我加入序列号,我总是只会得到分配了序列号的零件,如果我加入自行车类型,我只会得到组件分配了类型的零件。我不确定我是否正在尝试考虑到数据库设计不现实的东西,或者我是否正在错误地处理连接。

以下是目前的查询。

SELECT f_bicycle_type, f_serial_number, f_big_assembly
FROM ( 
    SELECT DISTINCT f.f_bicycle_type, f.f_serial_number, f.f_big_assembly, p_important_attr 
    from failures f 
    left outer join (    
        select distinct bt.bt_bicycle_type, b_serial_number, a_big_assembly, p_important_attr  
        from (          
            select distinct b.b_serial_number, a.a_big_assembly, p.p_assembly_id, p.p_important_attr
            from parts p
            join assemblies a on p.p_assembly_id = a.a_assembly_id
            left outer join parts_bicycles b on b.b_part_id = p.p_id  
            where p.p_important_attr = 'awesome'
        ) p_join_a_and_b 
        left outer join assembly_bicycle_types bt on bt.bt_assembly_id = p_join_a_and_b.p_assembly_id 
    ) p_join_a_and_b_join_bt 
    on f.f_big_assembly = p_join_a_and_b_join_bt.a_big_assembly 
    -- problem join clause - if an explicit type has not been assigned to the assembly, we want to include ALL types
    and f_bicycle_type = p_join_a_and_b_join_bt.bt_bicycle_type
    -- problem join clause - there may not be explicit serial numbers assigned to a given part
    and f_serial_number = b_serial_number
) z
WHERE p_important_attr = 'awesome';

测试用例 sql(用于 Oracle):

CREATE TABLE failures (
f_bicycle_type VARCHAR(20),
f_serial_number NUMBER(20),
f_big_assembly VARCHAR(5)); 

CREATE TABLE parts (
p_id NUMBER(20),
p_assembly_id NUMBER(20),
p_important_attr VARCHAR(20));

CREATE TABLE assemblies (
a_assembly_id NUMBER(20),
a_big_assembly VARCHAR(5)); 

CREATE TABLE parts_bicycles (
b_part_id NUMBER(20),
b_serial_number NUMBER(20));    

CREATE TABLE assembly_bicycle_types (
bt_assembly_id NUMBER(20),
bt_bicycle_type VARCHAR(20));

INSERT ALL
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('tandem', 1000001, 'A1000')
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('bmx', 1000002, 'A1000')
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('tandem', 1000003, 'B1000')
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('cruiser', 1000004, 'B1000')  
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('bmx', 1000005, 'C1000')  
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('motocross', 1000006, 'C1000')
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('cruiser', 1000007, 'C1000')
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('bmx', 1000008, 'D1000')
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('bmx', 1000009, 'D1000')
INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('cruiser', 1000010, 'E1000')
INTO parts (p_id, p_assembly_id, p_important_attr)
VALUES (1, 1001, 'awesome')
INTO parts (p_id, p_assembly_id, p_important_attr)
VALUES (2, 1001, 'ordinary')
INTO parts (p_id, p_assembly_id, p_important_attr)
VALUES (3, 2001, 'awesome')
INTO parts (p_id, p_assembly_id, p_important_attr)
VALUES (4, 3001, 'awesome')
INTO parts (p_id, p_assembly_id, p_important_attr)
VALUES (5, 4001, 'awesome')
INTO parts (p_id, p_assembly_id, p_important_attr)
VALUES (6, 5001, 'ordinary')
INTO assemblies (a_assembly_id, a_big_assembly)
VALUES (1001, 'A1000')
INTO assemblies (a_assembly_id, a_big_assembly)
VALUES (2001, 'B1000')
INTO assemblies (a_assembly_id, a_big_assembly)
VALUES (3001, 'C1000')
INTO assemblies (a_assembly_id, a_big_assembly)
VALUES (4001, 'D1000')
INTO assemblies (a_assembly_id, a_big_assembly)
VALUES (5001, 'E1000')
INTO parts_bicycles (b_part_id, b_serial_number)
VALUES (4, 1000005)
INTO parts_bicycles (b_part_id, b_serial_number)
VALUES (4, 1000006)
INTO parts_bicycles (b_part_id, b_serial_number)
VALUES (5, 1000008)
INTO assembly_bicycle_types (bt_assembly_id, bt_bicycle_type)
VALUES (02001, 'tandem')
INTO assembly_bicycle_types (bt_assembly_id, bt_bicycle_type)
VALUES (04001, 'bmx')
SELECT * FROM DUAL;

对于 MySQL:

 CREATE TABLE failures (
f_bicycle_type VARCHAR(20),
f_serial_number INTEGER(20),
f_big_assembly VARCHAR(5));
CREATE TABLE parts(
p_id INTEGER( 20 ) ,
p_assembly_id INTEGER( 20 ) ,
p_important_attr VARCHAR( 20 )
);
CREATE TABLE assemblies(
a_assembly_id INTEGER( 20 ) ,
a_big_assembly VARCHAR( 5 )
);
CREATE TABLE parts_bicycles(
b_part_id INTEGER( 20 ) ,
b_serial_number INTEGER( 20 )
);
CREATE TABLE assembly_bicycle_types(
bt_assembly_id INTEGER( 20 ) ,
bt_bicycle_type VARCHAR( 20 )
);

INSERT INTO failures (f_bicycle_type, f_serial_number, f_big_assembly)
VALUES ('tandem', 1000001, 'A1000'),('bmx', 1000002, 'A1000'), ('tandem', 1000003, 'B1000'),    ('cruiser', 1000004, 'B1000') ,('bmx', 1000005, 'C1000'), ('motocross', 1000006, 'C1000')
,('cruiser', 1000007, 'C1000')
,('bmx', 1000008, 'D1000')
,('bmx', 1000009, 'D1000')
, ('cruiser', 1000010, 'E1000');
insert INTO parts (p_id, p_assembly_id, p_important_attr)
VALUES (1, 1001, 'awesome'), (2, 1001, 'ordinary'), (3, 2001, 'awesome'), (4, 3001, 'awesome'), (5, 4001, 'awesome'),(6, 5001, 'ordinary');
INSERT INTO assemblies (a_assembly_id, a_big_assembly)
VALUES (1001, 'A1000'), (2001, 'B1000'), (3001, 'C1000'), (4001, 'D1000'),(5001, 'E1000');
    INSERT INTO parts_bicycles (b_part_id, b_serial_number)
VALUES (4, 1000005),(4, 1000006),(5, 1000008)
INSERT INTO assembly_bicycle_types (bt_assembly_id, bt_bicycle_type)
VALUES (02001, 'tandem'), (04001, 'bmx');

样本数据和预期结果:

-- failures table
-- f_bicycle_type   || f_serial_number  || f_big_assembly
---------------------------------------------------------
  tandem               1000001             A1000
  bmx                  1000002             A1000
  tandem               1000003             B1000
  cruiser              1000004             B1000
  bmx                  1000005             C1000
  motocross            1000006             C1000
  cruiser              1000007             C1000
  bmx                  1000008             D1000
  bmx                  1000009             D1000
  cruiser              1000010             E1000

  -- parts table
  -- p_id   || p_assembly_id    || p_important_attr
  ------------------------------------------------
     1          1001                awesome
     2          1001                ordinary
     3          2001                awesome
     4          3001                awesome
     5          4001                awesome
     6          5001                ordinary

  -- assemblies table
  -- a_assembly_id  || a_big_assembly
  -----------------------------------
     1001              A1000
     2001              B1000
     3001              C1000
     4001              D1000
     5001              E1000

  -- parts_bicycles table
  -- b_part_id  || b_serial_number
  --------------------------------
     4              1000005
     4              1000006
     5              1000008

  -- assembly_bicycle_types table
  -- bt_assembly_id || bt_bicycle_type
  ------------------------------------
     02001             tandem
     04001             bmx

-- desired results from failures table
-- f_bicycle_type   || f_serial_number  || f_big_assembly
---------------------------------------------------------
  tandem               1000001             A1000
  bmx                  1000002             A1000
  tandem               1000003             B1000
  bmx                  1000005             C1000
      motocross            1000006             C1000
  bmx                  1000008             D1000

而实际结果,有问题的地方加入:

-- actual results from failures table
-- f_bicycle_type   || f_serial_number  || f_big_assembly
---------------------------------------------------------
  bmx                  1000008             D1000

【问题讨论】:

  • 程序集可能是递归的吗?如果是这样,是否在没有“父”组件时决定“分配给所有自行车类型”?另外,我们可以请你的桌子布局吗?可能还有一些样本数据和期望的结果?
  • @Clockwork,不是递归的。不幸的是,类型是通过 assembly_bike_types 中缺少记录(表示所有可能的自行车类型)或存在记录来确定的。我知道这不是最佳的,但系统架构师不允许我更改它。可能要等到假期之后才能提供样本数据/期望的结果。
  • 我认为您的查询将更容易理解,无论是没有嵌套还是将嵌套分解为公用表表达式(WITH 子句),其中至少每个块的含义可以从名称中推断出来的 CTE。
  • 由于我的英语很差,我无法关注你,所以我只给你这个fiddle,它离你很近,但不仅仅是你的要求:) 在加入时通知or ... is null
  • +1 用于提供测试数据和创建表语句。

标签: mysql sql oracle join


【解决方案1】:

你去(PostgreSQL 风格):

WITH chosen_parts AS (
  SELECT * FROM parts LEFT JOIN parts_bicycles ON b_part_id = p_id 
    WHERE p_important_attr = 'awesome'
), chosen_assemblies AS (
  SELECT * FROM assemblies JOIN chosen_parts ON p_assembly_id = a_assembly_id 
    LEFT JOIN assembly_bicycle_types ON bt_assembly_id = a_assembly_id 
  WHERE b_serial_number IS NULL
)
SELECT failures.* FROM chosen_parts JOIN failures 
  ON f_serial_number = b_serial_number
UNION
SELECT failures.* FROM chosen_assemblies JOIN failures 
  ON f_big_assembly = a_big_assembly 
  WHERE bt_bicycle_type = f_bicycle_type
    OR bt_bicycle_type IS NULL;

如果担心重复,请随意添加外部 SELECT DISTINCT * FROM

【讨论】:

    【解决方案2】:

    以下查询返回您想要的结果集。从本质上讲,这首先建立了零件、组件和自行车类型之间的关系,然后对故障执行复杂的优先连接以获得实际结果。

    SELECT DISTINCT f.f_bicycle_type, f.f_serial_number, f.f_big_assembly
    FROM  parts p
          LEFT JOIN parts_bicycles pb
             ON p.p_id = pb.b_part_id
          LEFT JOIN assemblies a
             ON p.p_assembly_id = a.a_assembly_id
          LEFT JOIN assembly_bicycle_types abt
             ON a.a_assembly_id = abt.bt_assembly_id
          LEFT JOIN failures f
             ON -- First priority is parts that map directly
                pb.b_serial_number = f.f_serial_number 
                -- Second priority is assemblies that map to type
                OR (pb.b_serial_number IS NULL 
                    AND abt.bt_bicycle_type = f.f_bicycle_type) 
                -- Third priority is assemblies that map directly
                OR (pb.b_serial_number IS NULL 
                    AND abt.bt_bicycle_type IS NULL 
                    AND a.a_big_assembly = f.f_big_assembly)
    WHERE  p.p_important_attr = 'awesome'
    ORDER BY f.f_serial_number  
    

    SQL Fiddle

    我不认为这解决了未分配的组件属于所有自行车的问题,但从您的示例数据中不清楚它是如何工作的。

    【讨论】:

      【解决方案3】:

      这个查询连接了parts、parts_bicycles、assembly、assembly_bicycle_type。让我们将其保存为视图:

      create view j_parts as
      select p_important_attr, b_serial_number, a_big_assembly, bt_bicycle_type
      from
        parts left join parts_bicycles
          on parts.p_id = parts_bicycles.b_part_id
        left join assemblies
          on parts.p_assembly_id=assemblies.a_assembly_id
        left join assembly_bicycle_types
          on assemblies.a_assembly_id =assembly_bicycle_types.bt_assembly_id
      

      这(我认为!)是为您提供所需结果的查询:

      SELECT failures.*
      FROM
        failures inner join j_parts
        on f_serial_number=b_serial_number
           and p_important_attr = 'awesome'
      UNION
      SELECT failures.*
      FROM
        failures inner join j_parts
        on f_big_assembly=a_big_assembly
           and b_serial_number is null
           and j_parts.bt_bicycle_type=f_bicycle_type
           and p_important_attr = 'awesome'
      UNION
      SELECT failures.*
      FROM
        failures inner join j_parts
        on f_big_assembly=a_big_assembly
           and b_serial_number is null
           and j_parts.bt_bicycle_type is null
           and p_important_attr = 'awesome'
      

      编辑:我想这样写是因为它更容易阅读和维护。然后可以优化查询。这里有一个选择的所有条件:

      SELECT failures.*
      FROM
        failures inner join
        (parts left join parts_bicycles
         on parts.p_id = parts_bicycles.b_part_id
         left join assemblies
         on parts.p_assembly_id=assemblies.a_assembly_id
         left join assembly_bicycle_types
         on assemblies.a_assembly_id =assembly_bicycle_types.bt_assembly_id)
        on f_serial_number=b_serial_number
           or (f_big_assembly=a_big_assembly
               and b_serial_number is null
               and (bt_bicycle_type=f_bicycle_type
                    or bt_bicycle_type is null))
        and p_important_attr = 'awesome'
      

      【讨论】:

      • 也许还有一些问题需要修复...但是使用这种结构应该不会太难...
      • select * 是一种糟糕的做法,尤其是在您有连接并且数据会重复的视图中。租约不要再使用它或建议使用 SQL 反模式。我很喜欢你关于使用工会的想法。
      • @HLGEM 你是对的......现在在视图中我只选择联合查询中我们需要的字段......感谢您的建议!
      • @fthiella,我也喜欢你使用联合的想法。不幸的是,从您的第一个查询中的所有匹配项中删除“and p_important_attribute = 'awesome'”子句不会产生预期的结果 - 在这种情况下,应该返回失败表中的所有值,但它缺少 #4、7、和 9. 现在看看查询 #2。
      • @earachefl 第二个查询与第一个查询相同,但我将连接替换为 or... 带有联合的查询可以稍微优化(并非每个选择都需要所有连接)但第二个应该更快,但除此之外......我不知道为什么应该返回 4,7 和 9......我还在想我是否可以解决这个问题......
      【解决方案4】:

      可以修改查询

      SELECT f_bicycle_type, f_serial_number, f_big_assembly
      FROM ( 
      SELECT DISTINCT f.f_bicycle_type, f.f_serial_number, f.f_big_assembly, p_important_attr 
      from failures f 
      left outer join (    
          select distinct bt.bt_bicycle_type, b_serial_number, a_big_assembly, p_important_attr  
          from (          
              select distinct b.b_serial_number, a.a_big_assembly, p.p_assembly_id, p.p_important_attr
              from parts p
              join assemblies a on p.p_assembly_id = a.a_assembly_id
              left outer join parts_bicycles b on b.b_part_id = p.p_id  
              where p.p_important_attr = 'awesome'
          ) p_join_a_and_b 
          left join assembly_bicycle_types bt on bt.bt_assembly_id = p_join_a_and_b.p_assembly_id 
      ) p_join_a_and_b_join_bt 
      on f.f_big_assembly = p_join_a_and_b_join_bt.a_big_assembly 
      -- problem join clause - if an explicit type has not been assigned to the assembly, we want to include ALL types
      and (f_bicycle_type = p_join_a_and_b_join_bt.bt_bicycle_type or p_join_a_and_b_join_bt.bt_bicycle_type is null)
      -- problem join clause - there may not be explicit serial numbers assigned to a given part
      and (f_serial_number = b_serial_number or b_serial_number is null)
      ) z
      WHERE p_important_attr = 'awesome';
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2021-05-22
        • 2013-12-26
        • 2013-11-18
        • 2021-02-23
        • 2011-10-03
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多