【问题标题】:Improve DATETIME grouping queries performance提高 DATETIME 分组查询性能
【发布时间】:2013-10-01 10:26:22
【问题描述】:

这是我第一次尝试在 StackOverflow 上寻求帮助 :)

我有下表:

CREATE TABLE `tinfinite_visits` (
  `visit_id` int(255) NOT NULL AUTO_INCREMENT,
  `identity_id` int(255) NOT NULL,
  `ip` varchar(39) NOT NULL,
  `loggedin` enum('0','1') NOT NULL DEFAULT '0',
  `url` longtext NOT NULL,
  `realurl` longtext NOT NULL,
  `referrer` longtext NOT NULL,
  `method` enum('GET','POST','HEAD','OPTIONS','PUT','DELETE','TRACE','CONNECT','PATCH') NOT NULL,
  `client` longtext NOT NULL,
  `referring` longtext NOT NULL,
  `timestart` datetime NOT NULL,
  `timeend` datetime NOT NULL,
  PRIMARY KEY (`visit_id`),
  KEY `timestart` (`timestart`),
  KEY `identity_id` (`identity_id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;

在某些时候,我需要从该表中获取一些数据来生成折线图。我目前使用 5 个不同的查询,针对 5 个不同的时间间隔(总计、年、月、周、日)进行此操作。

该表目前有大约 200.000 行,但会变得更大(甚至数千万条记录)。

虽然我的查询非常适合此目的,但我正在努力寻找一种更好的性能方式。

因此,我非常感谢任何关于如何提高查询性能的提示/建议,如果可能的话,最好甚至将所有 5 个查询合并为 1 个。

我目前使用的查询、它们的解释以及它们的执行时间(大约 200.000 行)如下:

日查询:

SELECT COUNT(DISTINCT(`identity_id`)) AS visits, DATE_FORMAT(CONVERT_TZ(timestart, '-5:00', '+3:00'), '%l%p') AS unit
  FROM tinfinite_visits
  WHERE `timestart` >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
  GROUP BY unit
  ORDER BY `timestart` ASC

解释:

+----+-------------+----------------------+-------+---------------+-----------+---------+-----+-------+------------------------------+
| id | select_type | table                | type  | possible_keys | key       | key_len | ref | rows  | Extra                        |
+----+-------------+----------------------+-------+---------------+-----------+---------+-----+-------+------------------------------+
| 1  | SIMPLE      | tinfinite_visits     | range | timestart     | timestart | 8       |     | 11113 | Using where; Using temporary |
+----+-------------+----------------------+-------+---------------+-----------+---------+-----+-------+------------------------------+

时间:0.011280059814453

周查询:

SELECT COUNT(DISTINCT(`identity_id`)) AS visits, DATE_FORMAT(CONVERT_TZ(timestart, '-5:00', '+3:00'), '%a') AS unit
  FROM tinfinite_visits
  WHERE `timestart` >= DATE_SUB(NOW(), INTERVAL 7 DAY)
  GROUP BY unit
  ORDER BY `timestart` ASC

解释:

+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| id | select_type | table            | type | possible_keys | key | key_len | ref | rows   | Extra                                        |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| 1  | SIMPLE      | tinfinite_visits | ALL  | timestart     |     |         |     | 205897 | Using where; Using temporary; Using filesort |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+

时间:0.13543295860291

月份查询:

SELECT COUNT(DISTINCT(`identity_id`)) AS visits, DATE_FORMAT(CONVERT_TZ(timestart, '-5:00', '+3:00'), '%d') AS unit
  FROM tinfinite_visits
  WHERE `timestart` >= DATE_SUB(NOW(), INTERVAL 28 DAY)
  GROUP BY unit
  ORDER BY `timestart` ASC

解释:

+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| id | select_type | table            | type | possible_keys | key | key_len | ref | rows   | Extra                                        |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| 1  | SIMPLE      | tinfinite_visits | ALL  | timestart     |     |         |     | 205897 | Using where; Using temporary; Using filesort |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+

时间:0.21460795402527

年份查询:

SELECT COUNT(DISTINCT(`identity_id`)) AS visits, DATE_FORMAT(`timestart`, '%b') AS unit
  FROM tinfinite_visits
  WHERE `timestart` >= DATE_SUB(NOW(), INTERVAL 1 YEAR)
  GROUP BY unit
  ORDER BY `timestart` ASC

解释:

+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| id | select_type | table            | type | possible_keys | key | key_len | ref | rows   | Extra                                        |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| 1  | SIMPLE      | tinfinite_visits | ALL  | timestart     |     |         |     | 205897 | Using where; Using temporary; Using filesort |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+

时间:0.50977802276611

整体查询:

SELECT COUNT(DISTINCT(`identity_id`)) AS visits, DATE_FORMAT(`timestart`, '%b') AS unit
  FROM tinfinite_visits
  WHERE `timestart` >= DATE_SUB(NOW(), INTERVAL 100 YEAR)
  GROUP BY unit
  ORDER BY `timestart` ASC

解释:

+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| id | select_type | table            | type | possible_keys | key | key_len | ref | rows   | Extra                                        |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+
| 1  | SIMPLE      | tinfinite_visits | ALL  | timestart     |     |         |     | 205897 | Using where; Using temporary; Using filesort |
+----+-------------+------------------+------+---------------+-----+---------+-----+--------+----------------------------------------------+

时间:0.52196192741394

非常非常感谢!

【问题讨论】:

  • 过早的优化总会带来麻烦。你认为会有多少记录?现在有什么显示EXPLAIN
  • @AlmaDoMundo 请查看更新后的问题。谢谢。
  • 解释会随着时间而改变。现在它看起来是最佳的。它使用 where 进行过滤,然后存储分组值,然后对这些值进行排序。

标签: mysql sql myisam


【解决方案1】:

Salut Emilian,解释有时会随着行数而变化。

但是,由于您在分组和位置中使用计算列,您可以选择实现日期维度表。

您几乎可以找到任何数据库的日期维度表代码:

google.com/search?q=date+dimension+table

为什么?见下文 Time and date dimension in data warehouse

【讨论】:

    猜你喜欢
    • 2013-08-03
    • 2015-07-11
    • 1970-01-01
    • 2021-08-03
    • 2013-02-17
    • 2021-12-15
    • 2014-04-09
    • 2013-01-16
    相关资源
    最近更新 更多