【问题标题】:Left outer join acting like inner join左外连接的作用类似于内连接
【发布时间】:2013-09-28 16:13:42
【问题描述】:

总结

我的目标是找到曾经被分配到某项任务的每个用户,然后生成特定日期范围内的一些统计数据,并将这些统计数据与原始用户集相关联。当特定用户不存在统计信息时,我希望在输出中为该用户提供一行,但为统计信息提供 NULL 值。


我有一个复杂的 SQL 查询,看起来像这样(底部的实际查询):

SELECT
  user_name, changeday, project_name
  sum(hour_delta) AS hours,
FROM ( … ) tasked_users
LEFT OUTER JOIN ( … ) a
ON tasked_users.id=a.assignee_id
WHERE
  (changeday IS NULL) OR (changeday >= … AND changeday <= …)
GROUP BY user_name, changeday, a.project_name
ORDER BY user_name, changeday, a.project_name;

我的愿望是找到大量用户并将他们与a 表中的数据进行匹配;当存在在a 中没有任何匹配条目的用户时,我想要空值或0 小时。

不幸的是,此查询仅返回存在于“a”中的用户的行。例如,一组特定的日期返回:

{:user_name=>"Gavin", :hours=>0.0, :changeday=>2013-09-08, :project_name=>"Foo"}
{:user_name=>"Steve", :hours=>1.0, :changeday=>2013-09-08, :project_name=>"Bar"}

虽然不同的日期范围会导致找到不同的用户。 tasked_users 子查询的内容有 14 个不同的用户 ID/名称对。我需要 所有 在结果中表示出来。


查询示例

如果它有所作为,或者如果您有其他有用的提示来改进查询,这里是完整的查询。

SELECT
  user_name,
  sum(hour_delta) AS hours,
  changeday,
  project_name
FROM (
  SELECT DISTINCT
    users.id,
    users.name AS user_name
  FROM users
  INNER JOIN tasks AS tasks1
  ON users.id=tasks1.assignee_id
) tasked_users
LEFT OUTER JOIN
(
  SELECT
    (
      coalesce(cast(nullif(new_value,'') AS float),0) -
      coalesce(cast(nullif(old_value,'') AS float),0)
    ) AS hour_delta,
    task_id,
    tasks2.assignee_id AS assigned_log,
    fixin_id,
    projects.name AS project_name,
    date_trunc('day',task_log_entries.created_on) AS changeday
  FROM task_log_entries
  INNER JOIN tasks AS tasks2
  ON task_id=tasks2.id
  INNER JOIN fixins
  ON fixins.id=tasks2.fixin_id
  INNER JOIN projects
  ON projects.id=fixins.project_id
  WHERE field_id=18
) a
ON tasked_users.id=a.assigned_log
WHERE
  (changeday IS NULL)
  OR
  (changeday >= '2013-09-08' AND changeday <= '2013-09-08')
GROUP BY user_name, changeday, a.project_name
ORDER BY user_name, changeday, a.project_name;

解释输出

这是EXPLAIN 的查询结果,以防万一(我不知道如何阅读并得出我需要的内容):

GroupAggregate  (cost=1116.40..1116.99 rows=13 width=144)"}
  ->  Sort  (cost=1116.40..1116.43 rows=13 width=144)"}
        Sort Key: users.name, (date_trunc('day'::text, task_log_entries.created_on)), projects.name"}
        ->  Hash Left Join  (cost=1024.32..1116.16 rows=13 width=144)"}
              Hash Cond: (users.id = tasks2.assignee_id)"}
              Filter: ((date_trunc('day'::text, task_log_entries.created_on) IS NULL) OR ((date_trunc('day'::text, task_log_entries.created_on) >= '2013-09-08 00:00:00'::timestamp without time zone) AND (date_trunc('day'::text, task_log_entries.created_on) <= '2013-09-08 00:00:00'::timestamp without time zone)))"}
              ->  HashAggregate  (cost=44.07..45.46 rows=139 width=12)"}
                    ->  Hash Join  (cost=5.13..40.09 rows=795 width=12)"}
                          Hash Cond: (tasks1.assignee_id = users.id)"}
                          ->  Seq Scan on tasks tasks1  (cost=0.00..24.01 rows=801 width=4)"}
                          ->  Hash  (cost=3.39..3.39 rows=139 width=12)"}
                                ->  Seq Scan on users  (cost=0.00..3.39 rows=139 width=12)"}
              ->  Hash  (cost=963.51..963.51 rows=1339 width=30)"}
                    ->  Hash Join  (cost=729.23..963.51 rows=1339 width=30)"}
                          Hash Cond: (fixins.project_id = projects.id)"}
                          ->  Hash Join  (cost=727.91..943.79 rows=1339 width=24)"}
                                Hash Cond: (task_log_entries.task_id = tasks2.id)"}
                                ->  Seq Scan on task_log_entries  (cost=0.00..197.46 rows=1339 width=20)"}
                                      Filter: (field_id = 18)"}
                                ->  Hash  (cost=717.90..717.90 rows=801 width=12)"}
                                      ->  Hash Join  (cost=676.87..717.90 rows=801 width=12)"}
                                            Hash Cond: (tasks2.fixin_id = fixins.id)"}
                                            ->  Seq Scan on tasks tasks2  (cost=0.00..24.01 rows=801 width=12)"}
                                            ->  Hash  (cost=589.72..589.72 rows=6972 width=8)"}
                                                  ->  Seq Scan on fixins  (cost=0.00..589.72 rows=6972 width=8)"}
                          ->  Hash  (cost=1.14..1.14 rows=14 width=14)"}
                                ->  Seq Scan on projects  (cost=0.00..1.14 rows=14 width=14)"}

表定义

这里是所有涉及的表的描述。我没有修剪它们以删除任何“不相关”的列,因此您可以确定是否存在任何模棱两可的列名冲突。

app=> \d task_log_entries
                                     Table "public.task_log_entries"
   Column   |            Type             |                           Modifiers
------------+-----------------------------+---------------------------------------------------------------
 id         | integer                     | not null default nextval('task_log_entries_id_seq'::regclass)
 task_id    | integer                     | not null
 user_id    | integer                     |
 field_id   | integer                     | not null
 created_on | timestamp without time zone | not null default now()
 new_value  | text                        |
 old_value  | text                        |
Indexes:
    "task_log_entries_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "task_log_entries_field_id_fkey" FOREIGN KEY (field_id) REFERENCES log_fields(id)
    "task_log_entries_task_id_fkey" FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
    "task_log_entries_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL


app=> \d tasks
                                        Table "public.tasks"
     Column     |            Type             |                      Modifiers
----------------+-----------------------------+-----------------------------------------------------
 id             | integer                     | not null default nextval('fixins_id_seq'::regclass)
 fixin_id       | integer                     | not null
 created_on     | timestamp without time zone | not null default now()
 updated_on     | timestamp without time zone | not null default now()
 name           | character varying(200)      | not null
 description    | text                        |
 blocked_by     | character varying(200)      |
 estimate       | double precision            |
 actual         | double precision            |
 remaining      | double precision            |
 relative_order | integer                     |
 status_id      | integer                     | not null
 assignee_id    | integer                     |
Indexes:
    "tasks_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "tasks_assignee_id_fkey" FOREIGN KEY (assignee_id) REFERENCES users(id) ON DELETE SET NULL
    "tasks_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
    "tasks_status_id_fkey" FOREIGN KEY (status_id) REFERENCES task_statuses(id)
Referenced by:
    TABLE "task_comments" CONSTRAINT "task_comments_task_id_fkey" FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
    TABLE "task_log_entries" CONSTRAINT "task_log_entries_task_id_fkey" FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
    TABLE "users_tasks_notifications" CONSTRAINT "users_tasks_notifications_task_id_fkey" FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE


app=> \d fixins
                                       Table "public.fixins"
     Column     |            Type             |                      Modifiers
----------------+-----------------------------+-----------------------------------------------------
 id             | integer                     | not null default nextval('fixins_id_seq'::regclass)
 project_id     | integer                     | not null
 created_on     | timestamp without time zone | not null default now()
 updated_on     | timestamp without time zone | not null default now()
 name           | character varying(200)      | not null
 description    | text                        | not null
 status_id      | integer                     | not null
 reporter_id    | integer                     |
 assignee_id    | integer                     |
 priority_id    | integer                     | not null
 severity_id    | integer                     | not null
 likelihood_id  | integer                     | not null
 maturity       | integer                     | not null default 0
 version        | character varying(100)      |
 iteration_id   | integer                     |
 relative_order | integer                     |
 kind           | character varying(16)       | not null default 'Bug'::character varying
 specs          | character varying(50)       |
 estimate       | double precision            |
 blocked_by     | character varying(200)      |
 plan_estimate  | double precision            |
 actual         | double precision            |
 remaining      | double precision            |
 promise_date   | date                        |
Indexes:
    "fixins_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "fixins_assignee_id_fkey" FOREIGN KEY (assignee_id) REFERENCES users(id) ON DELETE SET NULL
    "fixins_iteration_id_fkey" FOREIGN KEY (iteration_id) REFERENCES iterations(id) ON DELETE SET NULL
    "fixins_likelihood_id_fkey" FOREIGN KEY (likelihood_id) REFERENCES likelihoods(id)
    "fixins_priority_id_fkey" FOREIGN KEY (priority_id) REFERENCES priorities(id)
    "fixins_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id)
    "fixins_reporter_id_fkey" FOREIGN KEY (reporter_id) REFERENCES users(id) ON DELETE SET NULL
    "fixins_severity_id_fkey" FOREIGN KEY (severity_id) REFERENCES severities(id)
    "fixins_status_id_fkey" FOREIGN KEY (status_id) REFERENCES statuses(id)
Referenced by:
    TABLE "bug_snapshots" CONSTRAINT "bug_snapshots_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE SET NULL
    TABLE "comments" CONSTRAINT "comments_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
    TABLE "customers_fixins" CONSTRAINT "customers_fixins_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id)
    TABLE "fixins_tags" CONSTRAINT "fixins_tags_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
    TABLE "log_entries" CONSTRAINT "log_entries_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
    TABLE "relationships" CONSTRAINT "relationships_fixin1_id_fkey" FOREIGN KEY (fixin1_id) REFERENCES fixins(id) ON DELETE CASCADE
    TABLE "relationships" CONSTRAINT "relationships_fixin2_id_fkey" FOREIGN KEY (fixin2_id) REFERENCES fixins(id) ON DELETE CASCADE
    TABLE "tasks" CONSTRAINT "tasks_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
    TABLE "users_notifications" CONSTRAINT "users_notifications_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
    TABLE "votes" CONSTRAINT "votes_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id)


app=> \d projects
                                     Table "public.projects"
     Column     |          Type           |                       Modifiers
----------------+-------------------------+-------------------------------------------------------
 id             | integer                 | not null default nextval('projects_id_seq'::regclass)
 name           | character varying(50)   | not null
 link_name      | character varying(50)   | not null
 pain_threshold | integer                 | not null
 wiki_server    | character varying(100)  |
 wiki_wiki      | character varying(100)  |
 wiki_pattern   | character varying(1000) |
 active         | boolean                 | not null default true
Indexes:
    "projects_pkey" PRIMARY KEY, btree (id)
    "projects_link_name_key" UNIQUE, btree (link_name)
Referenced by:
    TABLE "fixins" CONSTRAINT "fixins_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id)
    TABLE "iterations" CONSTRAINT "iterations_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
    TABLE "project_preferences" CONSTRAINT "project_preferences_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id)
    TABLE "projects_users_notifications" CONSTRAINT "projects_users_notifications_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id)
    TABLE "releases" CONSTRAINT "releases_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE


app=> \d users
                                 Table "public.users"
  Column  |         Type          |                     Modifiers
----------+-----------------------+----------------------------------------------------
 id       | integer               | not null default nextval('users_id_seq'::regclass)
 name     | character varying(50) | not null
 email    | character varying(50) |
 active   | boolean               | not null default true
 passhash | character varying(40) |
 salt     | character varying(4)  |
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "comments" CONSTRAINT "comments_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
    TABLE "fixins" CONSTRAINT "fixins_assignee_id_fkey" FOREIGN KEY (assignee_id) REFERENCES users(id) ON DELETE SET NULL
    TABLE "fixins" CONSTRAINT "fixins_reporter_id_fkey" FOREIGN KEY (reporter_id) REFERENCES users(id) ON DELETE SET NULL
    TABLE "log_entries" CONSTRAINT "log_entries_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
    TABLE "project_preferences" CONSTRAINT "project_preferences_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
    TABLE "projects_users_notifications" CONSTRAINT "projects_users_notifications_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
    TABLE "task_comments" CONSTRAINT "task_comments_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
    TABLE "task_log_entries" CONSTRAINT "task_log_entries_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
    TABLE "tasks" CONSTRAINT "tasks_assignee_id_fkey" FOREIGN KEY (assignee_id) REFERENCES users(id) ON DELETE SET NULL
    TABLE "users_notifications" CONSTRAINT "users_notifications_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
    TABLE "users_tasks_notifications" CONSTRAINT "users_tasks_notifications_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE


app=> \d log_fields
          Table "public.log_fields"
 Column |          Type          | Modifiers
--------+------------------------+-----------
 id     | integer                | not null
 name   | character varying(200) | not null
Indexes:
    "log_fields_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "log_entries" CONSTRAINT "log_entries_field_id_fkey" FOREIGN KEY (field_id) REFERENCES log_fields(id)
    TABLE "task_log_entries" CONSTRAINT "task_log_entries_field_id_fkey" FOREIGN KEY (field_id) REFERENCES log_fields(id)

【问题讨论】:

  • 进行一些健全性检查(我在这里拉扯稻草)。如果您删除 GROUPORDER by 子句会发生什么情况(当然,也删除聚合 SUM)?
  • @slugonamission 好问题!删除这些子句,sum 仅显示与两个用户相关的行(针对该日期范围)。
  • 使用正确的别名完全限定您的字段名称。这将使它具有可读性。 (并可能使问题浮出水面)
  • 请在WHERE field_id=18 中添加表格限定条件以使其明确。或者更好的是,查询中的 all 列名 - 甚至更多,因为您没有提供表定义。
  • "unambiguous" 在我的评论中是针对我们的 SO,而不是 Postgres - 它知道该列属于哪个表。

标签: sql postgresql left-join aggregate-functions date-range


【解决方案1】:

它不起作用,因为第一个查询是与任务内部连接的。同一张表用于执行外连接(通过子查询但仍然如此),但第一个查询(任务用户)首先没有相关记录(缺少匹配)。

尝试使用

....
FROM (
  SELECT DISTINCT
    users.id,
    users.name AS user_name
  FROM users    
) tasked_users
...

【讨论】:

  • 这看起来像问题,+1。当我从子选择中删除 INNER JOIN 时,它按预期工作。那么,我如何获得我想要的最终结果,我不选择所有用户,而只选择过去已完成任务的用户?
  • 但是如果从第一个子查询中删除 INNER JOIN 可以解决问题,那么剩下的问题是什么?
  • 问题是结果包括所有用户(大约数百个)的行,而不仅仅是过去被赋予任务的所有用户的行。
【解决方案2】:

查询大概可以简化为:

SELECT u.name AS user_name
     , p.name AS project_name
     , tl.created_on::date AS changeday
     , coalesce(sum(nullif(new_value, '')::numeric), 0)
     - coalesce(sum(nullif(old_value, '')::numeric), 0) AS hours
FROM   users             u
LEFT   JOIN (
        tasks            t 
   JOIN fixins           f  ON  f.id = t.fixin_id
   JOIN projects         p  ON  p.id = f.project_id
   JOIN task_log_entries tl ON  tl.task_id = t.id
                           AND  tl.field_id = 18
                           AND (tl.created_on IS NULL OR
                                tl.created_on >= '2013-09-08' AND
                                tl.created_on <  '2013-09-09') -- upper border!
       ) ON t.assignee_id = u.id
WHERE  EXISTS (SELECT 1 FROM tasks t1 WHERE t1.assignee_id = u.id)
GROUP  BY 1, 2, 3
ORDER  BY 1, 2, 3;

这将返回所有曾经有过任何任务的用户。
加上数据每个项目和日期,其中数据存在于task_log_entries 中的指定日期范围内。

要点

  • aggregate function sum() 忽略 NULL 值。 COALESCE() 每行 不再需要,只要您将计算重新转换为两个和的差:

     ,coalesce(sum(nullif(new_value, '')::numeric), 0) -
      coalesce(sum(nullif(old_value, '')::numeric), 0) AS hours
    

    但是,如果选择的所有列可能有NULL或空字符串,将总和包装到COALESCE一次。
    我正在使用numeric 而不是float,这是一种更安全的替代方案,可以最大限度地减少舍入错误。

  • 您尝试从userstasks 的连接中获取不同的值是徒劳的,因为您再次连接到task。展平整个查询,使其更简单、更快捷。

  • 这些positional references 只是一个符号方便:

    GROUP BY 1, 2, 3
    ORDER BY 1, 2, 3
    

    ... 执行与原始查询相同的操作。

  • 要从 timestamp 获取 date,您可以简单地转换为 date

    tl.created_on::date AS changeday
    

    但最好使用WHERE 子句或JOIN 条件中的原始值进行测试(如果可能,这里也有可能),因此 Postgres 可以在列上使用普通索引(如果可用):

     AND (tl.created_on IS NULL OR
          tl.created_on >= '2013-09-08' AND
          tl.created_on <  '2013-09-09')  -- next day as excluded upper border
    

    请注意,日期文字00:00当天at your current time zone处转换为timestamp。您需要选择 next 天并将其 排除 作为上边框。或者提供更明确的时间戳文字,例如 '2013-09-22 0:0 +2':: timestamptz。更多关于排除上边框:

  • 对于every user who has ever been assigned to a task 需求,添加WHERE 子句:

    WHERE EXISTS (SELECT 1 FROM tasks t1 WHERE t1.assignee_id = u.id)
    
  • 最重要的是LEFT [OUTER] JOIN 保留连接左侧的所有行。在 right 表上添加WHERE 子句可以使这种效果无效。相反,将过滤器表达式移动到JOIN 子句。更多解释在这里:

  • 括号 可用于强制连接表的顺序。简单查询很少需要,但在这种情况下非常有用。我使用该功能加入taskfixinsprojectstask_log_entries,然后将其全部左连接到users - 没有子查询。

  • Table aliases 让编写复杂查询变得更容易。

【讨论】:

  • +1 这些是很棒的提示——谢谢!——但它们并没有给出我想要的结果。我不希望我的两个任务加入是徒劳的,我想这就是问题所在。我想找到“曾经分配过任务的所有用户”,然后将其与“特定时间范围内的所有任务日志条目”一起加入(然后我将其与其他一些表一起加入,以便找到与任务,最终是任务所属项目的名称)。我认为这可以通过单个查询来实现;我错了吗?我将使用所涉及的表的架构来编辑答案。
  • 没有荣耀。通过您更新的查询,我为每个 user_name 得到一行(耶!),但对于所有其他列(哎呀)只有 NULL 值。
  • 不;仍然为除user_name 之外的所有列提供空值。 (FWIW 每个任务都有一个fixin_id,每个修复程序都有一个project_id。)
  • 啊;出于某种原因,我需要将第二个日期修改为您的未来一天(时间戳被强制转换为日期的方式存在某种栅栏问题?)。看起来这正在使用该调整。我会在进一步测试后接受答案。
  • @Phrogz:我添加了更多提示,尤其是关于日期范围的上边界。
猜你喜欢
  • 2015-01-11
  • 1970-01-01
  • 2018-04-16
  • 1970-01-01
  • 1970-01-01
  • 2010-09-16
  • 1970-01-01
  • 2011-10-01
  • 1970-01-01
相关资源
最近更新 更多