【发布时间】:2018-07-11 18:47:49
【问题描述】:
我在示例中给出的查询运行速度非常慢。我已经在 my_task 表中关闭了 400 万条记录。
我们可以对此做任何形式的性能改进吗?
以下表为例,
这里我放了数字start_dt和end_dt,而不是放timestamp格式。
补充说明,end_dt 为空的地方表示它是一条活动记录,正在由工作人员处理。
T_ID |start_dt |end_dt |code |p_id
-----|---------|-------|-----------|---
1 |8 |4 |INPROGRESS |110
1 |4 | |ASSIGNED |110
4 |10 |4 |INPROGRESS |110
4 |4 | |ASSIGNED |110
5 |4 |4 |INPROGRESS |110
6 |12 |12 |INPROGRESS |110
6 |8 |8 |ASSIGNED |110
6 |8 | |DONE |110
2 |12 |12 |INPROGRESS |210
2 |8 |8 |ASSIGNED |210
2 |8 | |DONE |210
3 |12 |12 |INPROGRESS |111
输出看起来像,
P_ID |avg_bgn_diff |assigned |in_progress |completed | comp_diff
-----|-------------|---------|------------|----------|----------
110 | 4 | 2 | 1 | 1 | 10
210 | null | 0 | 0 | 1 | 8
111 | null | 0 | 1 | 0 | null
输出解释:我已经用虚构的名称掩盖了原始查询表 ref 可以被破坏,对此我提前道歉。
- MY_TASK 表有唯一的 T_ID
- MY_PEOPLE 表是员工表
- MY_TASK_REF 表包含有关谁有什么任务的详细信息
- TASK 具有状态,因为每个状态更改操作都会导致在任务表中创建记录。雕像,例如 ASSIGNED、INPROGRESS 和 DONE
- 现在凡不存在 END_DT 的地方都代表活动记录
- 第一个输出字段
avg_bgn_diff我们只想找到所有(平均 END_DT 为空)“ASSIGNED”任务的平均时间 - 这个输出字段
assigned |in_progress |completed表示每个员工在每个类别中有多少活动任务。 - 查找每个员工的平均
comp_diff完成时间。当记录进入 INPROGRESS 时,员工开始工作。我们今天完成了状态为 DONE 的任务的平均值。我们得到 INPROGRESS 的开始日期和 DONE 的开始日期。
我有以下查询,
WITH a AS (
SELECT
t1.t_id AS t_id,
t1.start_dt AS start_dt,
t1.end_dt AS end_dt,
t1.code AS code,
t2.p_id AS p_id
FROM
my_task t2
INNER JOIN my_task_ref t1 ON t1.t_id = t2.t_id
INNER JOIN my_people p1 ON t2.p_id = p1.p_id
WHERE
-- ignore DONE tasks
t1.t_id NOT IN (
SELECT t.t_id
FROM my_task t
WHERE t.code = 'DONE' AND trunc(t.execution_dt) < trunc(current_timestamp)
)
and p1.department_id = '1234'
ORDER BY p_id DESC
) SELECT
d.p_id,
d.avg_bgn_diff
,e.assigned
,e.in_progress
,e.completed
,g.comp_diff
FROM
`-- find average time for persons for diff ASSIGNMENT
(
SELECT c.p_id,AVG(c.bgn_diff) AS avg_bgn_diff
FROM(
SELECT b.p_id,timestampdiff(4,current_timestamp - a.start_dt) AS bgn_diff
FROM ( SELECT p_id,t_id,start_dt FROM a WHERE end_dt IS NULL ) b
LEFT OUTER JOIN ( SELECT p_id, t_id,start_dt FROM a WHERE
code = 'ASSIGNED' AND end_dt IS NULL ) x ON x.p_id = b.p_id
) c GROUP BY C.p_id
) d
-- find count of each codes person has
INNER JOIN (
SELECT
p_id,
SUM( CASE WHEN code = 'ASSIGNED' THEN 1 ELSE 0 END ) AS assigned,
SUM( CASE WHEN code = 'INPROGRESS' THEN 1 ELSE 0 END ) AS in_progress,
SUM( CASE WHEN code = 'DONE' AND trunc(start_dt) = trunc(current_timestamp)
THEN 1 ELSE 0 END ) AS completed
FROM
a where end_dt IS NULL
GROUP BY p_id
) e on D.p_id=E.p_id
-- find total avg diff of entire task took to compelete.
LEFT OUTER JOIN (
SELECT F.p_id,AVG(f.bgn_diff) AS comp_diff
FROM
(
SELECT a.p_id, timestampdiff(4,b.start_dt - a.start_dt) AS bgn_diff
FROM (
SELECT p_id, t_id, start_dt FROM a WHERE code = 'INPROGRESS'
) a
INNER JOIN (
SELECT p_id, t_id, start_dt FROM a
WHERE code = 'DONE' AND trunc(start_dt) = trunc(current_timestamp)
) b ON a.t_id = b.t_id
) f GROUP BY F.p_id
) g ON D.p_id=G.p_id
WITH
ur;
我们可以用不同的方式写这个来提高性能吗?
注意:索引存在于所有必要的列中。
提前致谢。
【问题讨论】:
-
对于初学者,尝试用左连接替换 NOT IN
-
@DanielMarcus 我有这个想法。还有其他变化吗?
-
为什么有这么多嵌套查询?我觉得其中很多也可以用左连接重写-例如,最后,您从表“a”中选择了三次,如果需要,您应该能够进行一次选择并使用条件逻辑
-
@DanielMarcus 我想了一会儿,但没有想出一个完整的左连接查询。如果你能证明
find average time for persons for diff ASSIGNMENT写这部分作为我的例子? -
请解释您的数据集如何与您的查询结果相匹配 - 我看到似乎有很多不一致之处
标签: sql performance db2