【问题标题】:MySQL query slows way down when inserting into file插入文件时,MySQL查询速度变慢
【发布时间】:2014-10-28 08:36:30
【问题描述】:

这是我的第一个问题,因为我曾经提出的每个问题都已经在这里得到了答案。请原谅糟糕的格式。

查询本身在 1 毫秒内运行,这很棒。它从大约 300 万个条目中产生大约 600,000 个结果,而数据库每秒插入大约 10 个。我知道这对于数据库来说不是很多,所以我认为负载不是问题。我还有其他可以很好地插入文件的大型查询。具体来说,当添加“SELECT * INTO OUTFILE”时,这个运行大约需要 11 个小时。这对于运行查询来说太长了,我不知道为什么。

表:container_table

-Primary Key: containerID(bigint), mapID(int), cavityID(int)

-Index: timestamp(datetime)

表:cont_meas_table

-Primary Key: containerID(bigint), box(int), probe(int), inspectionID(int), measurementID(int)

表格:cavity_map

-Primary Key: mapID(int), gob(char), section(int), cavity(int)

查询:

(SELECT  'containerID','timestamp','mapID','lineID','fp','fpSequence','pocket','cavityID', 'location','inspResult',
     'otgMinThickMeasValuePrb2_1','otgMaxThickMeasValuePrb2_1','RatioPrb2_1','otgOORMeasValuePrb2_1',
     'otgMinThickMeasValuePrb2_2','otgMaxThickMeasValuePrb2_2','RatioPrb2_2','otgOORMeasValuePrb2_2',
     'otgMinThickMeasValuePrb2_3','otgMaxThickMeasValuePrb2_3','RatioPrb2_3')
UNION
(SELECT * INTO OUTFILE 'testcsv.csv'
   FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
   LINES TERMINATED BY '\n'
 FROM
(SELECT          containerID, timestamp, groupmeas.mapID, lineID, fp, fpSequence, pocket,     cavityID, CONCAT(MIN(section), MIN(gob)) AS location,
             inspResult, otgMinThickMeasValuePrb2_1, otgMaxThickMeasValuePrb2_1, 
             (COALESCE(otgMaxThickMeasValuePrb2_1/NULLIF(CAST(otgMinThickMeasValuePrb2_1 AS DECIMAL(10,5)), 0), 0)) AS RatioPrb2_1,
             otgOORMeasValuePrb2_1, otgMinThickMeasValuePrb2_2, otgMaxThickMeasValuePrb2_2,
             (COALESCE(otgMaxThickMeasValuePrb2_2/NULLIF(CAST(otgMinThickMeasValuePrb2_2 AS DECIMAL(10,5)), 0), 0)) AS RatioPrb2_2,
             otgOORMeasValuePrb2_2, otgMinThickMeasValuePrb2_3, otgMaxThickMeasValuePrb2_3,
             (COALESCE(otgMaxThickMeasValuePrb2_3/NULLIF(CAST(otgMinThickMeasValuePrb2_3 AS DECIMAL(10,5)), 0), 0)) AS RatioPrb2_3
FROM 
(SELECT   dbad.container_table.containerID, dbad.container_table.timestamp, dbad.container_table.mapID, dbad.container_table.lineID, dbad.container_table.fp, 
      dbad.container_table.fpSequence, dbad.container_table.pocket, dbad.container_table.cavityID, dbad.container_table.inspResult, 
      CASE WHEN aggMeas.otgMinThickMeasValuePrb2_1 IS NULL
         THEN - 1 ELSE aggMeas.otgMinThickMeasValuePrb2_1 END AS otgMinThickMeasValuePrb2_1, 
      CASE WHEN aggMeas.otgMaxThickMeasValuePrb2_1 IS NULL 
         THEN - 1 ELSE aggMeas.otgMaxThickMeasValuePrb2_1 END AS otgMaxThickMeasValuePrb2_1, 
      CASE WHEN aggMeas.otgOORMeasValuePrb2_1 IS NULL 
         THEN - 1 ELSE aggMeas.otgOORMeasValuePrb2_1 END AS otgOORMeasValuePrb2_1, 
      CASE WHEN aggMeas.otgMinThickMeasValuePrb2_2 IS NULL 
         THEN - 1 ELSE aggMeas.otgMinThickMeasValuePrb2_2 END AS otgMinThickMeasValuePrb2_2, 
      CASE WHEN aggMeas.otgMaxThickMeasValuePrb2_2 IS NULL 
         THEN - 1 ELSE aggMeas.otgMaxThickMeasValuePrb2_2 END AS otgMaxThickMeasValuePrb2_2, 
      CASE WHEN aggMeas.otgOORMeasValuePrb2_2 IS NULL 
         THEN - 1 ELSE aggMeas.otgOORMeasValuePrb2_2 END AS otgOORMeasValuePrb2_2, 
      CASE WHEN aggMeas.otgMinThickMeasValuePrb2_3 IS NULL 
         THEN - 1 ELSE aggMeas.otgMinThickMeasValuePrb2_3 END AS otgMinThickMeasValuePrb2_3, 
      CASE WHEN aggMeas.otgMaxThickMeasValuePrb2_3 IS NULL 
         THEN - 1 ELSE aggMeas.otgMaxThickMeasValuePrb2_3 END AS otgMaxThickMeasValuePrb2_3, 
      CASE WHEN aggMeas.otgOORMeasValuePrb2_3 IS NULL 
         THEN - 1 ELSE aggMeas.otgOORMeasValuePrb2_3 END AS otgOORMeasValuePrb2_3
 FROM   dbad.container_table 
      LEFT OUTER JOIN
      (SELECT     containerID, 
       COALESCE(MIN(CASE WHEN (meas.inspectionID = 1) AND (meas.measurementID = 0) AND (meas.probe = 0) THEN meas.value END), - 1) AS otgMinThickMeasValuePrb2_1, 
       COALESCE(MIN(CASE WHEN (meas.inspectionID = 1) AND (meas.measurementID = 1) AND (meas.probe = 0) THEN meas.value END), - 1) AS otgMaxThickMeasValuePrb2_1, 
       COALESCE(MIN(CASE WHEN (meas.inspectionID = 1) AND (meas.measurementID = 2) AND (meas.probe = 0) THEN meas.value END), - 1) AS otgOORMeasValuePrb2_1, 
       COALESCE(MIN(CASE WHEN (meas.inspectionID = 1) AND (meas.measurementID = 0) AND (meas.probe = 1) THEN meas.value END), - 1) AS otgMinThickMeasValuePrb2_2, 
       COALESCE(MIN(CASE WHEN (meas.inspectionID = 1) AND (meas.measurementID = 1) AND (meas.probe = 1) THEN meas.value END), - 1) AS otgMaxThickMeasValuePrb2_2, 
       COALESCE(MIN(CASE WHEN (meas.inspectionID = 1) AND (meas.measurementID = 2) AND (meas.probe = 1) THEN meas.value END), - 1) AS otgOORMeasValuePrb2_2, 
       COALESCE(MIN(CASE WHEN (meas.inspectionID = 1) AND (meas.measurementID = 0) AND (meas.probe = 2) THEN meas.value END), - 1) AS otgMinThickMeasValuePrb2_3, 
       COALESCE(MIN(CASE WHEN (meas.inspectionID = 1) AND (meas.measurementID = 1) AND (meas.probe = 2) THEN meas.value END), - 1) AS otgMaxThickMeasValuePrb2_3, 
       COALESCE(MIN(CASE WHEN (meas.inspectionID = 1) AND (meas.measurementID = 2) AND (meas.probe = 2) THEN meas.value END), - 1) AS otgOORMeasValuePrb2_3
       FROM  (SELECT  containerID, inspectionID, measurementID, probe, value, threshold, calibration FROM  dbad.cont_meas_table AS a) AS meas
       GROUP BY containerID) AS aggMeas 
    ON dbad.container_table.containerID = aggMeas.containerID) AS groupmeas
INNER JOIN
dbad.cavity_map
  ON groupmeas.mapID=dbad.cavity_map.mapID  AND
  groupmeas.cavityID=dbad.cavity_map.cavity
  WHERE timestamp LIKE '2014-08-29%'
    AND otgMinThickMeasValuePrb2_1 BETWEEN 1 AND 499
    AND otgMinThickMeasValuePrb2_2 BETWEEN 1 AND 499
    AND otgMinThickMeasValuePrb2_3 BETWEEN 1 AND 499
    AND otgMaxThickMeasValuePrb2_1 BETWEEN 1 AND 499
    AND otgMaxThickMeasValuePrb2_2 BETWEEN 1 AND 499
    AND otgMaxThickMeasValuePrb2_3 BETWEEN 1 AND 499
GROUP BY containerID) AS outside)

我已经摆脱了任何COUNT()DISTINCT 并删除了WHERE timestamp LIKE '2014-08-29%' 中的前导'%',以便可以使用时间戳的索引。不幸的是,这没有帮助。

编辑: 添加后

WHERE timestamp >= '2014-08-29' AND timestamp < '2014-08-29' + INTERVAL 1 DAY

查询实际上需要更长的时间。我知道这不应该是这样,所以我一定在这个查询中做错了什么。

【问题讨论】:

  • 你从派生表/子查询中选择大约 4 级,当然性能会很糟糕。
  • 天哪,这是一个复杂的查询。要对其进行性能故障排除,您可能需要将其分解为多个部分。
  • 我假设正在写入的文件与数据库引擎位于同一驱动器/位置? (它与托管数据库的物理机器没有什么不同,对吧?)不要忘记快乐的小树。
  • 放弃文件 i/o 问题,首先解决查询性能问题。
  • @sebas 他不是说查询本身在 1 毫秒内运行吗?我想我假设 600,000 是在 1 毫秒内生成的......

标签: mysql sql performance optimization query-optimization


【解决方案1】:

这里有一个东西跳起来打我的脸:

WHERE timestamp LIKE '2014-08-29%'  /* slow! */

这会破坏在timestamp 列上使用索引,因为它会将timestamp 隐式转换为字符串。

尝试改用这个:

WHERE timestamp >= '2014-08-29'
  AND timestamp <  '2014-08-29' + INTERVAL 1 DAY

这将允许查询在timestamp 上使用索引范围扫描,这可能会有很大帮助。它之所以有效,是因为它将常量日期转换为与 timestamp 相同的数据类型,而不是相反。

索引的意义在于避免所谓的全表扫描,在这种扫描中,MySQL 服务器必须遍历表的每一行以寻找匹配的数据。省略WHERE 子句也会使服务器查看表的每一行。

【讨论】:

  • 我投了赞成票,但是当数据库是全新的(WHERE timestamp 语句的情况下运行了这个查询,它运行了 6 个小时。不过,我真的很感激这一点,并将相应地更改我的查询。
【解决方案2】:

为了确保您的数据库已正确配置以处理此类工作负载,请运行开源工具 mysqltuner 并查看建议。

您的问题描述听起来您可能想要在 my.cnf 中使用不同的 tmp_table_size 和 max_heap_table_size

您可以在此处找到该工具: https://raw.githubusercontent.com/major/MySQLTuner-perl/master/mysqltuner.pl

【讨论】:

  • 我听说过这个,但公司政策禁止我使用它。
  • “公司政策禁止我使用它。” - 好的。调整脚本是一个糟糕的主意。只需专注于查询。
  • 一些 cmets 提供了很好的建议。由于INTO OUTFILE 无法访问查询缓存,没有INTO OUTFILE 的查询可能比您想象的要慢。 SELECT SQL_NO_CACHE ...(没有输出文件)将确认这一点。 EXPLAIN SELECT ... 将向您展示查询计划,对于性能故障排除非常有价值。
【解决方案3】:

您需要并且可以优化您的查询:替换

LIKE '% 29-08-2014'

>= "08-29-2014" and <'2014-08-30'

在某些情况下使用JOIN而不是多个子查询来处理临时表中的数据会更快,那么您可以尝试创建临时表

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-02-22
    • 2012-08-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-02-25
    • 1970-01-01
    • 2014-10-20
    相关资源
    最近更新 更多