【发布时间】:2014-03-25 21:53:25
【问题描述】:
我有一个查询需要很长时间,我想在这里提出它,希望我错过了什么 - 这是查询(基本上是说“给我所有至少有一个职位的资金” )
SELECT org_name.legacy_id,
org_name. name,
org_desc.description,
org_name.instrument_style_code,
org_name.investment_orientation,
org_name.is_active,
org_name.organization_id,
mgr_org.eng_name as manager_name,
mgrs.manager_org_id as manager_organization_id,
mgrs.manager_legacy_id as manager_legacy_id
FROM ownership_organization_names org_name
INNER JOIN (SELECT fund.legacy_id
FROM ownership_organization_names fund
INNER JOIN ownership_ownerships own
ON fund.legacy_id = own.legacy_id
LEFT JOIN ownership_unconsolidated_holding_positions pos
ON own.ownership_id = pos.ownership_id
GROUP BY fund.legacy_id
HAVING COUNT(pos.holding_position_id) > 0) funds_with_positions
ON funds_with_positions.legacy_id = org_name.legacy_id
LEFT JOIN ownership_organization_descriptions org_desc
on org_name.legacy_id = org_desc.legacy_id
LEFT JOIN ownership_fund_mgrs mgrs
on org_name.legacy_id = mgrs.fund_legacy_id
LEFT JOIN organization mgr_org
on mgr_org.id = mgrs.manager_org_id
内部查询需要 42 秒的持续时间和 320 秒的获取时间(听起来不对!)并返回 135,683 行。
整个查询需要 372 秒的持续时间和 2 秒的提取时间(这听起来绝对不对)
这是来自查询的解释(持续时间 350 秒)并为格式化(或缺少)道歉
1 PRIMARY <derived2> ALL 135683
1 PRIMARY org_name ref PRIMARY PRIMARY 8 funds_with_positions.legacy_id 22303
1 PRIMARY org_desc eq_ref PRIMARY PRIMARY 8 funds_with_positions.legacy_id 1
1 PRIMARY mgrs ref PRIMARY PRIMARY 8 people_directory.org_name.legacy_id 665
1 PRIMARY mgr_org eq_ref PRIMARY PRIMARY 8 people_directory.mgrs.manager_org_id 1
2 DERIVED fund index PRIMARY PRIMARY 16 46728 Using index
2 DERIVED own ref legacy_id_idx legacy_id_idx 9 people_directory.fund.legacy_id 15 Using where
2 DERIVED pos ref ownership_id_idx ownership_id_idx 9 people_directory.own.ownership_id 3
我已经为每个连接列建立了索引,并且通过将子查询移动到 INNER JOIN 而不是 WHERE 中获得了巨大的性能提升。
我也尝试创建一个索引临时表并加入它,但我发现填充它需要大约 360 秒 - 但是它上面的外部连接变得微不足道(比如 1 秒),这告诉我内部查询非常糟糕未优化,但我不确定我能做些什么来进一步优化它
我也来自 Microsoft SQL 背景,但假设所有其他原则都是相同的。我已经看到各种线程讨论更改数据库存储引擎和调整缓冲区大小,但我想看看在采取这些措施之前我是否已经用尽了优化查询本身的所有可能性
更新: 最终,最大的性能提升来自于我的内部查询中有一个不必要的连接,这将它从大约 360 秒减少到了大约 70 秒。然而,尝试其他一些逻辑上等效的优化技术会产生一些有趣的怪癖:
按照建议,我尝试了:
SELECT
org_name.legacy_id,
org_name.`name`,
org_desc.description,
org_name.instrument_style_code,
org_name.investment_orientation,
org_name.is_active,
org_name.organization_id,
mgr_org.eng_name as manager_name,
mgrs.manager_org_id as manager_organization_id,
mgrs.manager_legacy_id as manager_legacy_id
FROM ownership_organization_names org_name
INNER JOIN (SELECT own.legacy_id
FROM ownership_ownerships own
WHERE EXISTS (SELECT 1
FROM ownership_unconsolidated_holding_positions pos
WHERE own.ownership_id = pos.ownership_id)
) funds_with_positions ON funds_with_positions.legacy_id = org_name.legacy_id
LEFT JOIN ownership_organization_descriptions org_desc on org_name.legacy_id = org_desc.legacy_id
LEFT JOIN ownership_fund_mgrs mgrs on org_name.legacy_id = mgrs.fund_legacy_id
LEFT JOIN organization mgr_org on mgr_org.id = mgrs.manager_org_id
MySQL Workbench 报告查询持续时间为 242.422 秒,获取部分超时,客户端返回错误“错误代码:2008 MySQL 客户端内存不足”
将 WHERE EXISTS 样式的子查询移动到 WHERE 子句中最终确实返回了,但是它需要 0.234 秒的持续时间/ 157.781 秒的获取时间。我怀疑这根本不准确
我很好奇这种将派生表作为子查询移动到 WHERE 子句中的优化方法背后的想法——不会在派生表中更早地对其进行 INNER JOIN 减少在查询而不是稍后在 WHERE 子句中?
当然,我承认我不熟悉 WHERE EXISTS 运算符,或者至少我从没想过经常使用它 - 它在性能/内存使用与子查询/派生表方法方面的含义是什么?原来有?
【问题讨论】:
标签: mysql sql query-optimization