【问题标题】:Migration of Merge statement from teradata to MYSQL将 Merge 语句从 teradata 迁移到 MYSQL
【发布时间】:2020-09-05 15:59:24
【问题描述】:

表格:- schema.INFA_TASK_RUN_STG schema.INFA_TASK_RUN

schema.INFA_TASK_RUN_STG 的主要索引:- SUBJECT_AREA schema.INFA_TASK_RUN 的主要索引:- SUBJECT_ID ,WORKFLOW_ID ,WORKFLOW_RUN_ID ,WORKLET_RUN_ID , INSTANCE_ID ,TASK_ID ,START_TIME

Teradata 中的合并语句:-

MERGE INTO schema.INFA_TASK_RUN USING schema.INFA_TASK_RUN_STG src
ON
        INFA_TASK_RUN_RAW.SUBJECT_ID = src.SUBJECT_ID
AND     INFA_TASK_RUN_RAW.WORKFLOW_ID = src.WORKFLOW_ID
AND     INFA_TASK_RUN_RAW.WORKFLOW_RUN_ID = src.WORKFLOW_RUN_ID
AND     INFA_TASK_RUN_RAW.WORKLET_RUN_ID = src.WORKLET_RUN_ID
AND     INFA_TASK_RUN_RAW.INSTANCE_ID = src.INSTANCE_ID
AND     INFA_TASK_RUN_RAW.TASK_ID = src.TASK_ID
AND     INFA_TASK_RUN_RAW.START_TIME = src.START_TIME
WHEN MATCHED THEN UPDATE SET
        END_TIME = src.END_TIME
,       RUN_ERR_CODE = src.RUN_ERR_CODE
,       RUN_ERR_MSG = src.RUN_ERR_MSG
,       RUN_STATUS_CODE = src.RUN_STATUS_CODE
WHEN NOT MATCHED THEN INSERT(
                SUBJECT_AREA
        ,       WORKFLOW_NAME
        ,       VERSION_NUMBER
        ,       SUBJECT_ID
        ,       WORKFLOW_ID
        ,       WORKFLOW_RUN_ID
        ,       WORKLET_RUN_ID
        ,       CHILD_RUN_ID
        ,       INSTANCE_ID
        ,       INSTANCE_NAME
        ,       TASK_ID
        ,       TASK_TYPE_NAME
        ,       TASK_TYPE
        ,       START_TIME
        ,       END_TIME
        ,       RUN_ERR_CODE
        ,       RUN_ERR_MSG
        ,       RUN_STATUS_CODE
        ,       TASK_NAME
        ,       TASK_VERSION_NUMBER
        ,       SERVER_ID
        ,       SERVER_NAME
        )VALUES(
                src.SUBJECT_AREA
        ,       src.WORKFLOW_NAME
        ,       src.VERSION_NUMBER
        ,       src.SUBJECT_ID
        ,       src.WORKFLOW_ID
        ,       src.WORKFLOW_RUN_ID
        ,       src.WORKLET_RUN_ID
        ,       src.CHILD_RUN_ID
        ,       src.INSTANCE_ID
        ,       src.INSTANCE_NAME
        ,       src.TASK_ID
        ,       src.TASK_TYPE_NAME
        ,       src.TASK_TYPE
        ,       src.START_TIME
        ,       src.END_TIME
        ,       src.RUN_ERR_CODE
        ,       src.RUN_ERR_MSG
        ,       src.RUN_STATUS_CODE
        ,       src.TASK_NAME
        ,       src.TASK_VERSION_NUMBER
        ,       src.SERVER_ID
        ,       src.SERVER_NAME
        );

据我所知,MYSQL DB 不支持 Merge 语句。我正在尝试更新和插入语句。但好像不太对。

UPDATE schema.INFA_TASK_RUN tgt INNER JOIN schema.INFA_TASK_RUN_STG src
ON
       tgt.SUBJECT_ID = src.SUBJECT_ID
AND     tgt.WORKFLOW_ID = src.WORKFLOW_ID
AND     tgt.WORKFLOW_RUN_ID = src.WORKFLOW_RUN_ID
AND     tgt.WORKLET_RUN_ID = src.WORKLET_RUN_ID
AND     tgt.INSTANCE_ID = src.INSTANCE_ID
AND     tgt.TASK_ID = src.TASK_ID
AND     tgt.START_TIME = src.START_TIME
 SET
        tgt.END_TIME = src.END_TIME
,       tgt.RUN_ERR_CODE = src.RUN_ERR_CODE
,       tgt.RUN_ERR_MSG = src.RUN_ERR_MSG
,       tgt.RUN_STATUS_CODE = src.RUN_STATUS_CODE;

insert into schema.INFA_TASK_RUN (SUBJECT_AREA         ,       WORKFLOW_NAME         ,       VERSION_NUMBER         ,       SUBJECT_ID         ,       WORKFLOW_ID         ,       WORKFLOW_RUN_ID         ,       WORKLET_RUN_ID         ,       CHILD_RUN_ID         ,       INSTANCE_ID         ,       INSTANCE_NAME         ,       TASK_ID         ,       TASK_TYPE_NAME         ,       TASK_TYPE         ,       START_TIME         ,       END_TIME         ,       RUN_ERR_CODE         ,       RUN_ERR_MSG         ,       RUN_STATUS_CODE         ,       TASK_NAME         ,       TASK_VERSION_NUMBER         ,       SERVER_ID         ,       SERVER_NAME)
    select src.SUBJECT_AREA         ,       src.WORKFLOW_NAME         ,       src.VERSION_NUMBER         ,       src.SUBJECT_ID         ,       src.WORKFLOW_ID         ,       src.WORKFLOW_RUN_ID         ,       src.WORKLET_RUN_ID         ,       src.CHILD_RUN_ID         ,       src.INSTANCE_ID         ,       src.INSTANCE_NAME         ,       src.TASK_ID         ,       src.TASK_TYPE_NAME         ,       src.TASK_TYPE         ,       src.START_TIME         ,       src.END_TIME         ,       src.RUN_ERR_CODE         ,       src.RUN_ERR_MSG         ,       src.RUN_STATUS_CODE         ,       src.TASK_NAME         ,       src.TASK_VERSION_NUMBER         ,       src.SERVER_ID         ,       src.SERVER_NAME
    from schema.INFA_TASK_RUN_STG as src
        left outer join schema.INFA_TASK_RUN as tgt  ON
       tgt.SUBJECT_ID != src.SUBJECT_ID
AND     tgt.WORKFLOW_ID != src.WORKFLOW_ID
AND     tgt.WORKFLOW_RUN_ID != src.WORKFLOW_RUN_ID
AND     tgt.WORKLET_RUN_ID != src.WORKLET_RUN_ID
AND     tgt.INSTANCE_ID != src.INSTANCE_ID
AND     tgt.TASK_ID != src.TASK_ID
AND     tgt.START_TIME != src.START_TIME

【问题讨论】:

    标签: mysql sql merge teradata teradata-sql-assistant


    【解决方案1】:

    相信您正在寻找的是这样的(未经测试,只有在正确设置主键时才有效):

    INSERT INTO schema.INFA_TASK_RUN (
         SUBJECT_AREA
        ,WORKFLOW_NAME
        ,VERSION_NUMBER
        ,SUBJECT_ID
        ,WORKFLOW_ID
        ,WORKFLOW_RUN_ID
        ,WORKLET_RUN_ID
        ,CHILD_RUN_ID
        ,INSTANCE_ID
        ,INSTANCE_NAME
        ,TASK_ID
        ,TASK_TYPE_NAME
        ,TASK_TYPE
        ,START_TIME
        ,END_TIME
        ,RUN_ERR_CODE
        ,RUN_ERR_MSG
        ,RUN_STATUS_CODE
        ,TASK_NAME
        ,TASK_VERSION_NUMBER
        ,SERVER_ID
        ,SERVER_NAME
        )
    SELECT
         SUBJECT_AREA
        ,WORKFLOW_NAME
        ,VERSION_NUMBER
        ,SUBJECT_ID
        ,WORKFLOW_ID
        ,WORKFLOW_RUN_ID
        ,WORKLET_RUN_ID
        ,CHILD_RUN_ID
        ,INSTANCE_ID
        ,INSTANCE_NAME
        ,TASK_ID
        ,TASK_TYPE_NAME
        ,TASK_TYPE
        ,START_TIME
        ,END_TIME
        ,RUN_ERR_CODE
        ,RUN_ERR_MSG
        ,RUN_STATUS_CODE
        ,TASK_NAME
        ,TASK_VERSION_NUMBER
        ,SERVER_ID
        ,SERVER_NAME
    FROM schema.INFA_TASK_RUN_STG src
    ON DUPLICATE KEY UPDATE
         END_TIME = src.END_TIME
        ,RUN_ERR_CODE = src.RUN_ERR_CODE
        ,RUN_ERR_MSG = src.RUN_ERR_MSG
        ,RUN_STATUS_CODE = src.RUN_STATUS_CODE;
    

    于 2020-05-21 编辑以显示基于 cmets 的单独更新和插入语句:

    INSERT ... ON DUPLICATE KEY 语句可能会更快。

    我从 cmets 测试了原始语句以从问题中插入和更新。

    请注意,您的更新语句可以正常工作。 唯一的问题是即使没有更改,每一行都会更新。

    可以向连接添加条件,例如 tgt.END_TIME != src.END_TIME 以确保仅更新更改的记录。

    您问题中的原始更新查询:

    UPDATE schema.INFA_TASK_RUN tgt INNER JOIN schema.INFA_TASK_RUN_STG src
    ON
           tgt.SUBJECT_ID = src.SUBJECT_ID
    AND     tgt.WORKFLOW_ID = src.WORKFLOW_ID
    AND     tgt.WORKFLOW_RUN_ID = src.WORKFLOW_RUN_ID
    AND     tgt.WORKLET_RUN_ID = src.WORKLET_RUN_ID
    AND     tgt.INSTANCE_ID = src.INSTANCE_ID
    AND     tgt.TASK_ID = src.TASK_ID
    AND     tgt.START_TIME = src.START_TIME
     SET
            tgt.END_TIME = src.END_TIME
    ,       tgt.RUN_ERR_CODE = src.RUN_ERR_CODE
    ,       tgt.RUN_ERR_MSG = src.RUN_ERR_MSG
    ,       tgt.RUN_STATUS_CODE = src.RUN_STATUS_CODE;
    

    更新插入:

    必须更改插入语句,请查看 JOIN 是列 equal 的位置,我们仅选择目标表中没有匹配值的位置,并检查目标表列是否为空:

    INSERT INTO schema.INFA_TASK_RUN (
         SUBJECT_AREA
        ,WORKFLOW_NAME
        ,VERSION_NUMBER
        ,SUBJECT_ID
        ,WORKFLOW_ID
        ,WORKFLOW_RUN_ID
        ,WORKLET_RUN_ID
        ,CHILD_RUN_ID
        ,INSTANCE_ID
        ,INSTANCE_NAME
        ,TASK_ID
        ,TASK_TYPE_NAME
        ,TASK_TYPE
        ,START_TIME
        ,END_TIME
        ,RUN_ERR_CODE
        ,RUN_ERR_MSG
        ,RUN_STATUS_CODE
        ,TASK_NAME
        ,TASK_VERSION_NUMBER
        ,SERVER_ID
        ,SERVER_NAME
        )
        select src.SUBJECT_AREA
        ,src.WORKFLOW_NAME
        ,src.VERSION_NUMBER
        ,src.SUBJECT_ID
        ,src.WORKFLOW_ID
        ,src.WORKFLOW_RUN_ID
        ,src.WORKLET_RUN_ID
        ,src.CHILD_RUN_ID
        ,src.INSTANCE_ID
        ,src.INSTANCE_NAME
        ,src.TASK_ID
        ,src.TASK_TYPE_NAME
        ,src.TASK_TYPE
        ,src.START_TIME
        ,src.END_TIME
        ,src.RUN_ERR_CODE
        ,src.RUN_ERR_MSG
        ,src.RUN_STATUS_CODE
        ,src.TASK_NAME
        ,src.TASK_VERSION_NUMBER
        ,src.SERVER_ID
        ,src.SERVER_NAME
        FROM schema.INFA_TASK_RUN as tgt
            RIGHT JOIN schema.INFA_TASK_RUN_STG as src  ON
           tgt.SUBJECT_ID = src.SUBJECT_ID
    AND     tgt.WORKFLOW_ID = src.WORKFLOW_ID
    AND     tgt.WORKFLOW_RUN_ID = src.WORKFLOW_RUN_ID
    AND     tgt.WORKLET_RUN_ID = src.WORKLET_RUN_ID
    AND     tgt.INSTANCE_ID = src.INSTANCE_ID
    AND     tgt.TASK_ID = src.TASK_ID
    AND     tgt.START_TIME = src.START_TIME
    WHERE tgt.SUBJECT_ID IS NULL;
    

    【讨论】:

    • 您能否解释一下为什么在重复密钥更新时包含 end_time、运行 err 代码、运行 err、msg、运行状态代码。是正确的组合吗?
    • ON DUPLICATE KEY UPDATEWHEN MATCHED THEN UPDATE SET 相同,因此只使用了与初始示例相同的列。因此,当表 INFA_TASK_RUN 中的 主键 匹配时,只有这 4 列将被更新,否则将插入一条新记录。希望能解释清楚。
    • 但我在两个表中都没有任何主键。因此,当我第一次尝试加载它时,它插入了 2 次重复记录。需要插入 443 条记录,但正在加载 886 条记录。
    • 我们正在合并这些条件INFA_TASK_RUN_RAW.SUBJECT_ID = src.SUBJECT_ID AND INFA_TASK_RUN_RAW.WORKFLOW_ID = src.WORKFLOW_ID AND INFA_TASK_RUN_RAW.WORKFLOW_RUN_ID = src.WORKFLOW_RUN_ID AND INFA_TASK_RUN_RAW.WORKLET_RUN_ID = src.WORKLET_RUN_ID AND INFA_TASK_RUN_RAW.INSTANCE_ID = src.INSTANCE_ID AND INFA_TASK_RUN_RAW.TASK_ID = src.TASK_ID AND INFA_TASK_RUN_RAW.START_TIME = src.START_TIME
    • 在您的原始帖子中,您声明 Schema.INFA_TASK_RUN 的主要索引:- SUBJECT_ID ,WORKFLOW_ID ,WORKFLOW_RUN_ID ,WORKLET_RUN_ID , INSTANCE_ID ,TASK_ID ,START_TIME MySql INSERT ... ON DUPLICATE KEY UPDATE 仅适用于主键,是您将获得的最快选项。另一种选择是运行单独的 sql 语句 1 如果在目标表中找不到则插入,然后找到更新。
    【解决方案2】:

    正确答案以避免混淆

     INSERT INTO schema.INFA_TASK_RUN (
             SUBJECT_AREA
            ,WORKFLOW_NAME
            ,VERSION_NUMBER
            ,SUBJECT_ID
            ,WORKFLOW_ID
            ,WORKFLOW_RUN_ID
            ,WORKLET_RUN_ID
            ,CHILD_RUN_ID
            ,INSTANCE_ID
            ,INSTANCE_NAME
            ,TASK_ID
            ,TASK_TYPE_NAME
            ,TASK_TYPE
            ,START_TIME
            ,END_TIME
            ,RUN_ERR_CODE
            ,RUN_ERR_MSG
            ,RUN_STATUS_CODE
            ,TASK_NAME
            ,TASK_VERSION_NUMBER
            ,SERVER_ID
            ,SERVER_NAME
            )
        SELECT
             SUBJECT_AREA
            ,WORKFLOW_NAME
            ,VERSION_NUMBER
            ,SUBJECT_ID
            ,WORKFLOW_ID
            ,WORKFLOW_RUN_ID
            ,WORKLET_RUN_ID
            ,CHILD_RUN_ID
            ,INSTANCE_ID
            ,INSTANCE_NAME
            ,TASK_ID
            ,TASK_TYPE_NAME
            ,TASK_TYPE
            ,START_TIME
            ,END_TIME
            ,RUN_ERR_CODE
            ,RUN_ERR_MSG
            ,RUN_STATUS_CODE
            ,TASK_NAME
            ,TASK_VERSION_NUMBER
            ,SERVER_ID
            ,SERVER_NAME
        FROM schema.INFA_TASK_RUN_STG src
        ON DUPLICATE KEY UPDATE
             END_TIME = src.END_TIME
            ,RUN_ERR_CODE = src.RUN_ERR_CODE
            ,RUN_ERR_MSG = src.RUN_ERR_MSG
            ,RUN_STATUS_CODE = src.RUN_STATUS_CODE;
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2011-10-28
      • 1970-01-01
      • 1970-01-01
      • 2022-07-21
      • 2018-02-13
      • 1970-01-01
      • 2013-09-04
      • 2012-05-23
      相关资源
      最近更新 更多