【问题标题】:Use T-SQL Merge to update existing records and insert records that don't exist but avoid duplicates使用 T-SQL Merge 更新现有记录并插入不存在但避免重复的记录
【发布时间】:2015-10-27 20:31:51
【问题描述】:

我有两个结构相同的表 t1 和 t2。
表 t1 比 t2 多出大约 100+ 条记录。

这是 t1 的一个小样本。

| pid   | tid    | amt         | paymentdt  | paymentmnth   | startdate                 | enddate                   | updtby
| 670   | 1      | 690.00      | 2015-07-07 | 2015-07-07    | 2015-10-26 14:36:27.000   | 2015-10-26 15:42:42.000   | NULL
| 670   | 11     | 855.00      | 2015-07-07 | 2015-07-07    | 2015-10-26 14:36:27.000   | NULL                      | NULL
| 670   | 13     | 129.00      | 2015-07-29 | 2015-07-29    | 2015-10-26 14:36:27.000   | NULL                      | NULL
| 670   | 2      | 855.00      | 2015-09-01 | 2015-09-01    | 2015-10-26 15:42:42.000   | NULL                      | NULL
| Z41   | 1      | 62.35       | 2015-05-08 | 2015-05-08    | 2015-10-26 10:15:24.000   | 2015-10-26 13:08:05.000   | NULL
| Z41   | 11     | 800.00      | 2015-05-08 | 2015-05-08    | 2015-10-26 10:15:24.000   | NULL                      | NULL
| Z41   | 2      | 298.00      | 2015-06-01 | 2015-06-01    | 2015-10-26 13:08:05.000   | 2015-10-26 14:36:27.000   | NULL
| Z41   | 3      | 298.00      | 2015-07-01 | 2015-07-01    | 2015-10-26 14:36:27.000   | 2015-10-26 15:15:45.000   | NULL
| Z41   | 4      | 298.00      | 2015-08-01 | 2015-08-01    | 2015-10-26 15:15:45.000   | 2015-10-26 15:42:42.000   | NULL
| Z41   | 5      | 238.00      | 2015-09-01 | 2015-09-01    | 2015-10-26 15:42:42.000   | NULL                      | NULL

还有一个小样本 t2。

| pid   | tid    | amt         | paymentdt   | paymentmnt   | startdate                 | enddate                   | updtby
| 670   | 1      | 690.00      | 2015-07-07  | 2015-07-07   | 2015-10-02 16:10:50.000   | 2015-10-02 16:35:50.000   | NULL  
| 670   | 11     | 855.00      | 2015-07-07  | 2015-07-07   | 2015-10-02 16:10:50.000   | NULL                      | NULL  
| 670   | 13     | 129.00      | 2015-07-29  | 2015-07-29   | 2015-10-02 16:10:50.000   | NULL                      | NULL  
| 670   | 2      | 855.00      | 2015-09-01  | 2015-09-01   | 2015-10-02 16:35:50.000   | NULL                      | NULL  
| Z41   | 1      | 298.00      | 2015-07-01  | 2015-07-01   | 2015-10-02 16:10:50.000   | 2015-10-02 16:23:26.000   | NULL  
| Z41   | 11     | 800.00      | 2015-05-08  | 2015-05-08   | 2015-10-02 16:10:50.000   | NULL                      | NULL  
| Z41   | 2      | 298.00      | 2015-08-01  | 2015-08-01   | 2015-10-02 16:23:26.000   | 2015-10-02 16:35:50.000   | NULL  
| Z41   | 3      | 238.00      | 2015-09-01  | 2015-09-01   | 2015-10-02 16:35:50.000   | NULL                      | NULL  
| 173   | 1      | 785.00      | 2015-07-01  | 2015-07-01   | 2015-10-02 16:16:30.000   | 2015-10-02 16:27:36.000   | NULL  
| 173   | 11     | 465.00      | 2015-05-01  | 2015-05-01   | 2015-10-02 16:16:30.000   | NULL                      | NULL  

现在比较 t1 和 t2 显示 t1 中 pid Z41 的值更多,例如 tid 包括 1、2、3、4 5 和 11。但在 t2 中只存在 1、2、3 和11.

但是,t1 和 t2 之间的 startdate 完全不同,所以这会造成麻烦。下面是我尝试过的合并,但它基本上只是在 t2 中插入与 t1 不同的 startdate 的每一行。

MERGE INTO t2 AS tgt
USING t1 AS src
    ON tgt.pid = src.pid AND
       tgt.tid = src.tid AND
       tgt.paymentdt = src.paymentdt AND
       tgt.paymentmnt = src.paymentmnt AND
       tgt.startdate = src.startdate
WHEN MATCHED THEN
    UPDATE SET
        tgt.amt = src.amt,
        tgt.paymentdt = src.paymentdt,
        tgt.updatedby = 'MERGEDUPDATE'
WHEN NOT MATCHED THEN
    INSERT (pid, tid, amt, paymentdt, paymentmnt, startdate, enddate, updtby)
    VALUES (src.pid, src.tid, src.amt, src.paymentdt, src.paymentmnt, src.startdate, src.enddate, 'MERGEDINSERT');

通过此合并,我留下了 pid and tid 的重复项,其中 updtby 列显示为“MERGEDINSERT”。但我想避免重复。

我如何正确地进行此合并以不产生重复但 插入 t1 中存在但 t2 中不存在的行,同时更新 amt, paymentdt, and paymentmnth 值同时保持开始日期?

【问题讨论】:

  • 可以从 ON 子句中删除日期和金额吗?您显示的数据的期望输出是什么?

标签: tsql merge sql-update sql-insert sql-merge


【解决方案1】:

按照您描述的方式,您的合并标准应仅基于 pidtid 。试试这个

MERGE INTO t2 AS tgt
USING t1 AS src
    ON tgt.pid = src.pid AND
       tgt.tid = src.tid 

WHEN MATCHED THEN
    UPDATE SET
        tgt.amt = src.amt,
        tgt.paymentdt = src.paymentdt,
        tgt.paymentmnth  = src.paymentmnth, 
        tgt.updatedby = 'MERGEDUPDATE'
WHEN NOT MATCHED THEN
    INSERT (pid, tid, amt, paymentdt, paymentmnt, startdate, enddate, updtby)
    VALUES (src.pid, src.tid, src.amt, src.paymentdt, src.paymentmnt, src.startdate, src.enddate, 'MERGEDINSERT');

【讨论】:

  • on 子句中只留下pidtid 会产生以下结果:The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
  • @gh0st 这意味着您将获得pid/tid 组合的多个匹配项。期望的输出是什么?你可以在日期上使用 MAX() 吗?
  • 基本上,我想用 t1 中的匹配记录更新 t2 中的任何记录,如果 pid + tid 组合记录不存在,我想插入它。
  • 但是您的错误是说某些pid + tid 记录不止一次存在,那么您如何知道要在更新中使用哪些值?
  • 以t2记录pid = Z41, tid = 3为例。我想更新该记录以匹配 t1 中的内容,以便将 startdate 和 enddate 设置为 t1 中的相应值。然后对于记录(pid = Z41, tid = 4) and (pid = Z41, tid = 5) 我想将这些记录插入到 t2 中。
【解决方案2】:

如果您的 MERGE 由于获得多个记录/匹配而引发错误,您可以在子查询中使用聚合来限制源表,例如:

;WITH t1cte AS (
    select pid, tid, amt, paymentdt, paymentmnt, startdate, enddate
    from t1 a
      inner join (select pid,tid,MAX(paymentdt) as maxdt from t1 group by pid,tid) b
        on a.pid = b.pid and a.tid = b.tid and a.paymentdt = b.maxdt
        )

MERGE INTO t2 AS tgt
USING t1cte AS src
    ON tgt.pid = src.pid AND
       tgt.tid = src.tid AND
       tgt.paymentdt = src.paymentdt AND
       tgt.paymentmnt = src.paymentmnt AND
       tgt.startdate = src.startdate
WHEN MATCHED THEN
    UPDATE SET
        tgt.amt = src.amt,
        tgt.paymentdt = src.paymentdt,
        tgt.updatedby = 'MERGEDUPDATE'
WHEN NOT MATCHED THEN
    INSERT (pid, tid, amt, paymentdt, paymentmnt, startdate, enddate, updtby)
    VALUES (src.pid, src.tid, src.amt, src.paymentdt, src.paymentmnt, src.startdate, src.enddate, 'MERGEDINSERT');

【讨论】:

  • Column 'dbo.t1.pid' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
  • inner join()中的select更改为select pid, tid, max(paymentdt) as maxdt from t1 group by pid, tid
  • 但是由于tgt.startdate = src.startdate 上没有匹配项,仍然会插入重复项。
猜你喜欢
  • 2011-09-29
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2014-05-07
  • 2019-10-15
相关资源
最近更新 更多