【问题标题】:UPDATE works OK (but really, really slow) despite ORA-904 in subquery尽管子查询中有 ORA-904,UPDATE 工作正常(但真的非常慢)
【发布时间】:2016-11-09 22:09:19
【问题描述】:

我在 WHERE 中有一个带有子查询的 UPDATE 语句来查找重复项。子查询本身在运行子查询时会显示错误,但在 UPDATE 语句中运行时不会显示错误,并且 DML 运行正常(但速度很慢)。

查看表格设置:

CREATE TABLE RAW_table
(
  ERROR_LEVEL      NUMBER(3),
  RAW_DATA_ROW_ID  INTEGER,
  ATTRIBUTE_1      VARCHAR2(4000 BYTE)
)
;

INSERT INTO RAW_table VALUES (0,    2,  '509NTQD9Q868');
INSERT INTO RAW_table VALUES (0,    2,  '509NTQD9Q868');
INSERT INTO RAW_table VALUES (0,    2,  '509NTQD9Q868');
INSERT INTO RAW_table VALUES (0,    3,  '509NTVS9Q863');
INSERT INTO RAW_table VALUES (0,    3,  '509NTVS9Q863');
INSERT INTO RAW_table VALUES (0,    3,  '509NTVS9Q863');

COMMIT;

有错误的查询是:

SELECT UPPER(ATTRIBUTE_1), rid
  FROM ( SELECT UPPER(ATTRIBUTE_1)
              , ROWID AS rid
              , ROW_NUMBER() OVER ( PARTITION BY UPPER (ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) AS RN
           FROM RAW_table
       )
 WHERE RN > 1;

它在运行时给出ORA-00904: "ATTRIBUTE_1": invalid identifier

但是,在 WHERE 语句中使用上述查询(截至第 4 行)的以下 DML 工作正常:

set timing on

UPDATE RAW_table
   SET ERROR_LEVEL   = 4
 WHERE (UPPER (ATTRIBUTE_1), ROWID) 
       IN (SELECT UPPER (ATTRIBUTE_1), rid
           FROM (SELECT UPPER (ATTRIBUTE_1), ROWID AS rid
                     , ROW_NUMBER() OVER ( PARTITION BY UPPER (ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) AS RN
                  FROM RAW_table
                )
           WHERE RN > 1
          )
;

4 rows updated.
Elapsed: 00:00:00.36

为什么?为什么?为什么?

我预计 UPDATE 也会以 ORA-00904: "ATTRIBUTE_1": invalid identifier 失败。 为什么它不会失败?

然而,真正的问题是不是 UPDATE 真正起作用,而是 它的运行速度真的很慢。

当我将子查询更正为不触发 ORA-00904: "ATTRIBUTE_1": invalid identifier 时:

UPDATE RAW_table
   SET ERROR_LEVEL   = 4
 WHERE (UPPER (ATTRIBUTE_1), ROWID) 
        IN (SELECT checked_column, rid
           FROM (SELECT UPPER (ATTRIBUTE_1) AS checked_column, ROWID AS rid
                     , ROW_NUMBER() OVER ( PARTITION BY UPPER (ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) AS RN
                  FROM RAW_table
                )
           WHERE RN > 1
          )
;

在 11000 行的测试数据集上,查询加速了近 400 倍

SELECT COUNT(*) FROM RAW_table;

  COUNT(*)
----------
     11004
1 row selected.

更正查询:

1005 rows updated.
Elapsed: 00:00:00.28

使用 ORA-904 查询:

1005 rows updated.
Elapsed: 00:01:48.40

我没有足够的耐心等到 71.000 行测试结束:

SELECT COUNT(*) FROM RAW_table;
  COUNT(*)
----------
     71475
1 row selected.

Corrected query
11004 rows updated.
Elapsed: 00:00:00.60

Query with ORA-904

30 分钟后取消...

用 ORA-904 解释查询计划:

UPDATE STATEMENT  ALL_ROWS     Cost: **2 544 985 615**  Bytes: 8 464 752  Cardinality: 4 176  
     7 UPDATE RAW_TABLE 
          6 FILTER  
               1 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 169 282 878  Cardinality: 83 514  
               5 VIEW  Cost: 30 486  Bytes: 2 087 850  Cardinality: 83 514  
                    4 WINDOW SORT  Cost: 30 486  Bytes: 169 282 878  Cardinality: 83 514  
                         3 FILTER  
                              2 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 169 282 878  Cardinality: 83 514  

解释更正查询的计划:

UPDATE STATEMENT  ALL_ROWS     Cost: **36 637**  Bytes: 3 374 235  Cardinality: 835  
     7 UPDATE RAW_TABLE 
          6 HASH JOIN RIGHT SEMI  Cost: 36 637  Bytes: 3 374 235  Cardinality: 835  
               4 VIEW VIEW SYS.VW_NSO_1 Cost: 30 486  Bytes: 168 197 196  Cardinality: 83 514  
                    3 VIEW  Cost: 30 486  Bytes: 169 282 878  Cardinality: 83 514  
                         2 WINDOW SORT  Cost: 30 486  Bytes: 169 282 878  Cardinality: 83 514  
                              1 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 169 282 878  Cardinality: 83 514  
               5 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 169 282 878  Cardinality: 83 514  

分析表格后,成本计划是相同的。 用 ORA-904 解释查询计划:

UPDATE STATEMENT  ALL_ROWS     Cost: **29 381 690**  Bytes: 38  Cardinality: 2
     7 UPDATE RAW_TABLE
          6 FILTER
               1 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 1 358 025  Cardinality: 71 475
               5 VIEW  Cost: 427  Bytes: 1 786 875  Cardinality: 71 475
                    4 WINDOW SORT  Cost: 427  Bytes: 1 358 025  Cardinality: 71 475
                         3 FILTER
                              2 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 1 358 025  Cardinality: 71 475

解释更正查询的计划:

UPDATE STATEMENT  ALL_ROWS     Cost: **3 123**  Bytes: 1 453 595  Cardinality: 715
     7 UPDATE RAW_TABLE
          6 HASH JOIN SEMI  Cost: 3 123  Bytes: 1 453 595  Cardinality: 715
               5 VIEW VIEW SYS.VW_NSO_1 Cost: 427  Bytes: 143 950 650  Cardinality: 71 475
                    4 VIEW  Cost: 427  Bytes: 144 879 825  Cardinality: 71 475
                         3 WINDOW SORT  Cost: 427  Bytes: 1 358 025  Cardinality: 71 475
                              2 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 1 358 025  Cardinality: 71 475
               1 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 1 358 025  Cardinality: 71 475

解释计划成本说明了一切,但为什么会有如此大的不同?

我刚刚再次触发了 71.000 行测试,在计算表上的统计信息后,但它已经运行了几分钟......

这一切都在 Oracle Database 12c 企业版版本 12.1.0.2.0 - 64 位上。

【问题讨论】:

    标签: sql oracle oracle12c


    【解决方案1】:

    您的SELECT 失败,因为子查询中没有名为ATTRIBUTE_1 的列。您需要指定名称:

    SELECT UPPER(ATTRIBUTE_1), rid
      FROM ( SELECT UPPER(ATTRIBUTE_1) as ATTRIBUTE_1, 
                    ROWID AS rid,
                    ROW_NUMBER() OVER (PARTITION BY UPPER(ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) AS RN
             FROM RAW_table
           )
     WHERE RN > 1;
    

    UPDATE 不会产生错误,因为它从外部查询中提取值:

    UPDATE RAW_table
    -------^
    |   SET ERROR_LEVEL   = 4
    | WHERE (UPPER (ATTRIBUTE_1), ROWID) IN 
    |         (SELECT checked_column, rid
    |          FROM (SELECT UPPER(ATTRIBUTE_1) AS checked_column, ROWID AS rid,
    ------------------------------^  This is interpreted as RAW_table.ATTRIBUTE_1
                            ROW_NUMBER() OVER (PARTITION BY UPPER(ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) AS RN
                     FROM RAW_table
                    )
               WHERE RN > 1
              )
    

    这种相关性可能不是您想要的,也是我建议列名始终被限定(即包含表别名)的原因之一。

    【讨论】:

    • 我认为您想用错误(没有checked_column 别名)评论查询并将ASCII艺术放在下一行。无论如何,非常感谢,因为我终于明白了!
    【解决方案2】:

    这就是别名非常非常有用的原因。

    在查询中

    UPDATE RAW_table
       SET ERROR_LEVEL   = 4
     WHERE (UPPER (ATTRIBUTE_1), ROWID) 
           IN (SELECT UPPER (ATTRIBUTE_1), rid
               FROM (SELECT UPPER (ATTRIBUTE_1), ROWID AS rid
                         , ROW_NUMBER() OVER ( PARTITION BY UPPER (ATTRIBUTE_1) 
                                                   ORDER BY RAW_DATA_ROW_ID) AS RN
                      FROM RAW_table
                    )
               WHERE RN > 1
              )
    

    SELECT UPPER (ATTRIBUTE_1) 是有效的,因为它可以解析为对您正在更新的表的引用,而不是对FROM 中的表的引用。使用别名,该查询相当于

    UPDATE RAW_table dest
       SET dest.ERROR_LEVEL   = 4
     WHERE (UPPER (dest.ATTRIBUTE_1), ROWID) 
           IN (SELECT UPPER (dest.ATTRIBUTE_1), src.rid
               FROM (SELECT UPPER (rt.ATTRIBUTE_1), rt.ROWID AS rid
                         , ROW_NUMBER() OVER ( PARTITION BY UPPER (rt.ATTRIBUTE_1) 
                                                   ORDER BY rt.RAW_DATA_ROW_ID) AS RN
                      FROM RAW_table rt
                    ) src
               WHERE src.rid > 1
              )
    

    当然,如果你是这样写的,很明显你引用的是dest.attribute_1而不是src.attribute_1。这(以及许多其他原因)就是为什么给列加上别名是个好主意——它可以清楚地说明您打算引用哪个对象,并在预期引用无效时抛出错误,而不是可能将其解析为您没有的东西打算。

    【讨论】:

    • 呃..它是,当你展示的时候很明显。并解释这个相关的解释计划......顺便说一句,我认为你的意思是WHERE src.RN> 1而不是WHERE src.rid > 1
    【解决方案3】:
    SELECT UPPER(ATTRIBUTE_1), rid
      FROM ( SELECT UPPER(ATTRIBUTE_1) ATTRIBUTE_1
                  , ROWID AS rid
                  , ROW_NUMBER() OVER ( PARTITION BY UPPER (ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) AS RN
               FROM RAW_table
           )
     WHERE RN > 1
    

    【讨论】:

      【解决方案4】:

      也许这些版本更快(至少它们更紧凑):

      UPDATE RAW_table
      SET ERROR_LEVEL = 4
      WHERE ROWID <>ALL (SELECT MIN(ROWID) FROM RAW_table GROUP BY UPPER(ATTRIBUTE_1));
      
      
      UPDATE RAW_table
      SET ERROR_LEVEL = 4
      WHERE ROWID <>ALL (SELECT FIRST_VALUE(ROWID) OVER (PARTITION BY UPPER(ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) FROM RAW_table);
      

      注意,&lt;&gt;ALL 等同于 NOT IN - 使用 &lt;&gt;ALL 只是我个人的偏好。

      【讨论】:

        猜你喜欢
        • 2012-04-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-09-02
        相关资源
        最近更新 更多