【问题标题】:I am trying to improve the performance of an Oracle SQL that is finding the differences between two tables我正在尝试提高查找两个表之间差异的 Oracle SQL 的性能
【发布时间】:2016-01-13 21:38:18
【问题描述】:

我有两个 Oracle 表,我正在它们之间进行 UNION 以找出存储在这两个表中的数据的差异,但是当我在 SQL Developer 中运行查询时,查询太慢了,我使用的是相同的Informatica 中的查询,其吞吐量也较小。

表 1:W_SALES_INVOICE_LINE_FS EBS(NET_AMT, INVOICED_QTY, CREATED_ON_DT, CHANGED_ON_DT, INTEGRATION_ID, 'EBS' 作为 SOURCE_NAME)

表 2:W_SALES_INVOICE_LINE_F DWH (NET_AMT, INVOICED_QTY, CREATED_ON_DT, CHANGED_ON_DT, INTEGRATION_ID, 'EBS' 作为 SOURCE_NAME)

我附上带有问题的查询:

SELECT EBS.NET_AMT, 
nvl(EBS.INVOICED_QTY,
case nvl(EBS.NET_AMT,0) when 0 then EBS.INVOICED_QTY
else -1 end) INVOICED_QTY,
EBS.CREATED_ON_DT,
 EBS.CHANGED_ON_DT, 
     EBS.INTEGRATION_ID,
 'EBS' AS SOURCE_NAME
 FROM
 W_SALES_INVOICE_LINE_FS EBS
  WHERE NOT EXISTS (SELECT  INTEGRATION_ID FROM      W_SALES_INVOICE_LINE_F    DWH
  WHERE EBS.INTEGRATION_ID = DWH.INTEGRATION_ID)
UNION
 SELECT DWH.NET_AMT,
  DWH.INVOICED_QTY, 
  DWH.CREATED_ON_DT,
  DWH.CHANGED_ON_DT, 
  DWH.INTEGRATION_ID,
 'DWH' AS SOURCE_NAME
   FROM
 W_SALES_INVOICE_LINE_F DWH
 where DWH.IS_POS = 'N' and
  not exists (SELECT  INTEGRATION_ID FROM W_SALES_INVOICE_LINE_FS EBS
 WHERE EBS.INTEGRATION_ID = DWH.INTEGRATION_ID);    

如果您想查看解释计划,请告诉我。有人可以告诉我如何提高性能,或者让我知道问题是否与其他问题有关,而不是与上述查询有关!

【问题讨论】:

  • UnionNot Exists 可能会成为性能杀手。您确定这里需要Union 而不能使用Union All 代替吗?
  • 考虑使用 UNION ALL 而不是 UNION 以避免不必要的排序 -> 查询的两个部分的结果总是不同的,因为最后一列 SOURCE_NAME 是“EBS”或“DWH” ,但数据库不知道这一点,必须对两个结果进行排序才能执行联合。

标签: sql oracle informatica


【解决方案1】:

Not exists 和 not in statements 通常是性能瓶颈。解决这个问题的一个性能技巧是使用 LEFT OUTER JOIN 和一个声明第二个表列为空的子句,即没有匹配的行。所以试试:

SELECT EBS.NET_AMT, 
nvl(EBS.INVOICED_QTY,
case nvl(EBS.NET_AMT,0) when 0 then EBS.INVOICED_QTY
else -1 end) INVOICED_QTY,
EBS.CREATED_ON_DT,
 EBS.CHANGED_ON_DT, 
     EBS.INTEGRATION_ID,
 'EBS' AS SOURCE_NAME
 FROM
 W_SALES_INVOICE_LINE_FS EBS
 LEFT OUTER JOIN 
  W_SALES_INVOICE_LINE_F    DWH
  ON EBS.INTEGRATION_ID = DWH.INTEGRATION_ID
  WHERE DWH.INTEGRATION_ID IS NULL
UNION
 SELECT DWH.NET_AMT,
  DWH.INVOICED_QTY, 
  DWH.CREATED_ON_DT,
  DWH.CHANGED_ON_DT, 
  DWH.INTEGRATION_ID,
 'DWH' AS SOURCE_NAME
   FROM
 W_SALES_INVOICE_LINE_F DWH
 LEFT OUTER JOIN W_SALES_INVOICE_LINE_FS EBS
 ON EBS.INTEGRATION_ID = DWH.INTEGRATION_ID
 where EBS.INTEGRATION_ID IS NULL
 AND DWH.IS_POS = 'N'

【讨论】:

    【解决方案2】:

    您不是在执行JOIN,而是在执行UNION。但是,您正在执行子查询,这些可能会降低整体性能。您可以将EXISTS 更改为IN,这样可以利用索引(如果存在)。

    尝试以下方法:

    SELECT EBS.NET_AMT, 
    nvl(EBS.INVOICED_QTY,
    case nvl(EBS.NET_AMT,0) when 0 then EBS.INVOICED_QTY
    else -1 end) INVOICED_QTY,
    EBS.CREATED_ON_DT,
     EBS.CHANGED_ON_DT, 
         EBS.INTEGRATION_ID,
     'EBS' AS SOURCE_NAME
     FROM
     W_SALES_INVOICE_LINE_FS EBS
      WHERE EBS.INTEGRATION_ID NOT IN (
         SELECT  INTEGRATION_ID 
         FROM W_SALES_INVOICE_LINE_F
    )
    UNION ALL
     SELECT DWH.NET_AMT,
      DWH.INVOICED_QTY, 
      DWH.CREATED_ON_DT,
      DWH.CHANGED_ON_DT, 
      DWH.INTEGRATION_ID,
     'DWH' AS SOURCE_NAME
       FROM
     W_SALES_INVOICE_LINE_F DWH
     where DWH.IS_POS = 'N' 
     and DWH.INTEGRATION_ID not in (
         SELECT  INTEGRATION_ID 
         FROM W_SALES_INVOICE_LINE_FS
      );  
    

    另外,正如 cmets 中其他人所提到的,UNION ALL 可能更合适。

    此外,您可以尝试使用LEFT OUTER JOIN,如果您有索引,则可以更明确地执行上述操作。我无法从当前位置访问我的预言机来尝试解释计划,但实际上上面和下面的优化可能类似。

    SELECT EBS.NET_AMT, 
        Nvl(EBS.INVOICED_QTY,
            CASE Nvl(EBS.NET_AMT, 0) WHEN 0 
            THEN EBS.INVOICED_QTY
            ELSE -1 END
        ) AS INVOICED_QTY,
        EBS.CREATED_ON_DT,
        EBS.CHANGED_ON_DT, 
        EBS.INTEGRATION_ID,
        'EBS' AS SOURCE_NAME
    FROM W_SALES_INVOICE_LINE_FS EBS
    LEFT OUTER JOIN W_SALES_INVOICE_LINE_F DWH
    ON DWH.INTEGRATION_ID = EBS.INTEGRATION_ID
    WHERE DWH.INTEGRATION_ID IS NULL
    UNION ALL
    SELECT DWH.NET_AMT,
        DWH.INVOICED_QTY, 
        DWH.CREATED_ON_DT,
        DWH.CHANGED_ON_DT, 
        DWH.INTEGRATION_ID,
        'DWH' AS SOURCE_NAME
    FROM W_SALES_INVOICE_LINE_F DWH
    LEFT OUTER JOIN W_SALES_INVOICE_LINE_FS EBS
    ON EBS.INTEGRATION_ID = DWH.INTEGRATION_ID
    WHERE EBS.INTEGRATION_ID IS NULL
    AND DWH.IS_POS = 'N'
    ;
    

    您能否简要说明您问题中的表格?每个表中有多少(大约)记录?你有索引吗?是否有任何字段计算/派生?当您对这些或您的原始查询执行解释计划时,它在哪里显示瓶颈?

    【讨论】:

    • 我已根据您的更正更改了问题。我正在我的数据库上测试查询,它会立即返回默认的行数,即:50。我正在尝试获取所有行的计数,但它仍在运行并花费大量时间,这意味着它的性能仍然很差?如果我错了,请纠正我。
    • 您的表格上的INTEGRATION_ID 列是否有索引?
    • 是的,我在两个 INTEGRATION_ID 列上都创建了一个 BITMAP INDEX。
    • 如果 INTEGRATION_ID 是唯一的,我会使用 CREATE UNIQUE INDEX ... 或者你有一个非常高的基数(并且唯一性与记录计数的比率比基数更重要),那么我会使用CREATE INDEX ... 而不是CREATE BITMAP INDEX ...
    • 我的意思是BITMAP INDEX 可能不适合您的情况,INTEGRATION_ID 是一个独特的列吗?表中不同的INTEGRATION_ID 值与记录总数的比率是多少?
    【解决方案3】:

    您正在手动编写一个完整的外部联接,Oracle 可以自动为此类比较任务执行此操作(我猜它可能运行得更快)

    select
    ebs.net_amt ebs_net_amt,
    dwh.net_amt dwh_net_amt,
    nvl(ebs.invoiced_qty,case nvl(ebs.net_amt,0) when 0 then ebs.invoiced_qty else -1 end) invoiced_qty_ebs,
    dwh.invoiced_qty invoiced_qty_dwh,
    ebs.created_on_dt ebs_created_on_dt,
    dwh.created_on_dt dwh_created_on_dt,
    ebs.changed_on_dt ebs_changed_on_dt,
    dwh.changed_on_dt dwh_changed_on_dt,
    nvl(ebs.integration_id,ebs.integration_id) integration_id,
    case
    when ebs.integration_id is not null and ebs.integration_id is not null and then 'EBS and DWH'
    when dwh.integration_id is not null then 'EBS'
    else 'DWH' 
    end source_name
    from
    w_sales_invoice_line_fs ebs
    full outer join
    (select * from w_sales_invoice_line_f dwh where dwh.is_pos = 'N') dwh
    on 
    (ebs.integration_id = dwh.integration_id)
    where
    ebs.integration_id is null or dwh.integration_id is null --restrict to records missing on one side 
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2014-01-12
      • 2015-03-19
      • 1970-01-01
      • 2021-01-02
      • 2021-04-29
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多