优化 Sqlite 查询：在子查询中分组答案

【问题标题】：Optimizing Sqlite query: grouping in subqueries优化 Sqlite 查询：在子查询中分组
【发布时间】：2012-04-12 17:07:26
【问题描述】：

我有一个非常简单的 Sqlite 架构，用于按用户操作记录每日计数以及按天和操作记录的各种用户操作延迟百分位数：

create table user_actions (
  id integer primary key,
  name text not null
)

create table action_date_count (
  action_id integer not null
    references user_actions(id) on delete restrict on update restrict,
  date integer not null,
  count integer not null,
  unique (action_id, date) on conflict fail
)

create table latency_percentiles (
  action_id integer not null
    references user_actions(id) on delete restrict on update restrict,
  date integer not null,
  percentile integer not null,
  value real not null,
  unique (action_id, date, percentile) on conflict fail
)

这里所有日期都存储为每天午夜的 Unix 时间戳（如果有帮助，我可以更改它）。

现在这是我正在努力解决的一个查询：显示上周按平均数量降序排列的操作，包括 50%、90%、95% 级别的平均延迟百分位数。我提出了一个巨大的查询，解释计划说需要 17 个步骤，而且速度很慢。有人可以改进吗？

select ua.id, ua.name, ac.avg_count, al50.avg_lat_50, al90.avg_lat_90, al95.avg_lat_95
  from
    user_actions as ua,
    (
      select adc.action_id as action_id, avg(adc.count) as avg_count
      from
        action_date_count as adc,
        (select max(date) as max_date from action_date_count) as md
      where
        julianday(md.max_date, 'unixepoch', 'localtime') - julianday(adc.date, 'unixepoch', 'localtime') between 1 and 7
      group by action_id
    ) as ac,
    (
      select lp.action_id as action_id, avg(lp.value) as avg_lat_50
      from
        latency_percentiles as lp,
        (select max(date) as max_date from action_date_count) as md
      where
        lp.percentile = 50 and
        julianday(md.max_date, 'unixepoch', 'localtime') - julianday(lp.date, 'unixepoch', 'localtime') between 1 and 7
      group by action_id
    ) as al50,
    (
      select lp.action_id as action_id, avg(lp.value) as avg_lat_90
      from
        latency_percentiles as lp,
        (select max(date) as max_date from action_date_count) as md
      where
        lp.percentile = 90 and
        julianday(md.max_date, 'unixepoch', 'localtime') - julianday(lp.date, 'unixepoch', 'localtime') between 1 and 7
      group by action_id
    ) as al90,
    (
      select lp.action_id as action_id, avg(lp.value) as avg_lat_95
      from
        latency_percentiles as lp,
        (select max(date) as max_date from action_date_count) as md
      where
        lp.percentile = 95 and
        julianday(md.max_date, 'unixepoch', 'localtime') - julianday(lp.date, 'unixepoch', 'localtime') between 1 and 7
      group by action_id
    ) as al95
  where ua.id = ac.action_id and ua.id = al50.action_id and ua.id = al90.action_id and ua.id = al95.action_id
  order by ac.avg_count desc;

【问题讨论】：

标签： sql performance sqlite

【解决方案1】：

我假设您已在 action_date_count 和 latency_percentiles 表上索引了 date 列。

那么问题是 sqlite 无法使用您提供的查询的日期索引。您可以通过调整日期比较来解决此问题。

改为：

julianday(md.max_date, 'unixepoch', 'localtime') - julianday(lp.date, 'unixepoch', 'localtime') between 1 and 7

这样做：

lp.date between md.max_date - 7 * 24 * 3600 and md.max_date

您还可以通过在latency_percentiles (date, percentile, value) 上创建覆盖索引来获得良好的结果。 YMMV。

【讨论】：