【问题标题】:Updating duplicate records on the basis of date, so no two dates are same根据日期更新重复记录,因此没有两个日期相同
【发布时间】:2018-12-18 17:54:18
【问题描述】:
Hkey | Observation dt|      Retriment_dt | Name |Code | Masterkey
---------+------------+------
23        10/8/2018        01/01/3030     Sam     XYZ       99
23        10/8/2018        01/01/3030     Sam     XYZ       98
23        10/8/2018        01/01/3030     Sam     XYZ       97
21        11/8/2018        01/01/3030     JOHN   TGI        65 
21        11/8/2018        01/01/3030     JOHN   TGI        64
21        11/8/2018        01/01/3030     JOHN   TGI        63
30        11/8/2018        01/01/3030     Chris  MNY        70

好的,所以假设我有这个表并且我的表总数超过一百万我想更新重复行的表(Observation dtretirement dt) - 我不想将所有观察日期更新为相同的日期,但我希望它们一天不同。我在下面手动输入了它。我如何在 Sql 或 SSIS 或任何编程语言中做到这一点。这是 Mssql Db 表。我是 sql 新手,不胜感激。谢谢!

HKeyObservation_dt 的组合是主键,当我应用约束时它会抛出错误,所以我试图通过同时更改 retirement_dtobservation_dt 来淘汰所有重复记录。 Retirement dt 将是今天的日期,observation_dt 可以是任何 date-1(每个重复日期递增)

代码运行时的样子

Hkey | Observation dt|      Retriment_dt | Name |Code | Masterkey
---------+------------+------
23        10/8/2018        01/01/3030     Sam     XYZ       99
23        10/7/2018        12/17/2018     Sam     XYZ       98
23        10/6/2018        12/17/2018     Sam     XYZ       97
21        11/8/2018        01/01/3030     JOHN   TGI        65 
21        11/7/2018        12/17/2018     JOHN   TGI        64
21        11/6/2018        12/17/2018     JOHN   TGI        63
30        11/8/2018        01/01/3030     Chris  MNY        70

【问题讨论】:

  • 您在问题中提到该表是 MySQL 数据库,但您已使用 SQL Server、SSIS 和 TSQL 标记了该问题。请问是哪个?
  • 我很抱歉马丁我的意思是它是 mssql。

标签: sql sql-server database tsql ssis


【解决方案1】:

您可以使用以下解决方案:

IF OBJECT_ID('tempdb..#YourTable') IS NOT NULL
    DROP TABLE #YourTable

SELECT
    V.Hkey,
    [Observation dt] = CONVERT(DATE, V.[Observation dt]),
    [Retriment_dt] = CONVERT(DATE, V.[Retriment_dt])
INTO
    #YourTable
FROM
    (VALUES
    (23,'2018-08-10','3030-01-01'),
    (23,'2018-08-10','3030-01-01'),
    (23,'2018-08-10','3030-01-01'),
    (21,'2018-08-10','3030-01-01'),
    (21,'2018-08-10','3030-01-01'),
    (21,'2018-08-10','3030-01-01'),
    (30,'2018-08-10','3030-01-01')) V(Hkey, [Observation dt], [Retriment_dt])

;WITH DuplicateRecords AS
(
    SELECT
        T.HKey,
        T.[Observation dt]
    FROM
        #YourTable T
    GROUP BY
        T.HKey,
        T.[Observation dt]
    HAVING
        COUNT(1) > 1
),
RowNumber AS
(
    SELECT
        T.Hkey,
        T.[Observation dt],
        T.[Retriment_dt],
        RowNumberByHkey = ROW_NUMBER() OVER (PARTITION BY T.Hkey ORDER BY T.[Observation dt], T.[Retriment_dt])
    FROM
        #YourTable AS T
        INNER JOIN DuplicateRecords AS D ON
            T.Hkey = D.Hkey AND
            T.[Observation dt] = D.[Observation dt]
),
UpdatedValues AS
(
    SELECT
        R.Hkey,
        R.[Observation dt],
        R.[Retriment_dt],
        NewObservationDT = DATEADD(
            DAY,
            -1 * (R.RowNumberByHkey - 1),
            R.[Observation dt]),
        NewRetirementDT = GETDATE(),
        R.RowNumberByHkey
    FROM
        RowNumber AS R
),
RecordsToUpdate AS
(
    -- Need a row number to be able to update correctly, since the record is duplicated (need an ID to join)
    SELECT
        T.Hkey,
        T.[Observation dt],
        T.[Retriment_dt],
        RowNumberByHkey = ROW_NUMBER() OVER (PARTITION BY T.Hkey ORDER BY T.[Observation dt], T.[Retriment_dt])
    FROM
        #YourTable AS T
)
UPDATE T SET
    [Observation dt] = R.NewObservationDT,
    [Retriment_dt] = R.NewRetirementDT
FROM
    RecordsToUpdate AS T
    INNER JOIN UpdatedValues AS R ON
        T.HKey = R.HKey AND
        T.[Observation dt] = R.[Observation dt] AND
        T.RowNumberByHkey = R.RowNumberByHkey




SELECT 
    * 
FROM 
    #YourTable AS T 
ORDER BY 
    T.Hkey, 
    T.[Observation dt] DESC

结果:

Hkey    Observation dt  Retriment_dt
21      2018-08-10      2018-12-18
21      2018-08-09      2018-12-18
21      2018-08-08      2018-12-18
23      2018-08-10      2018-12-18
23      2018-08-09      2018-12-18
23      2018-08-08      2018-12-18
30      2018-08-10      3030-01-01

这有点棘手,因为您需要更新每个具有不同值的重复记录,因此您需要生成某种唯一 ID(我使用行号)才能匹配它们。

生成不同日期的方法是应用带有行号的DATEADD,该行号由HKey 分区。这会产生相差 1 天的不同日期。

【讨论】:

    【解决方案2】:

    我的同事以类似的方式做到了这一点,但感谢您的回复。我已经发布了使用的代码。

    SELECT [healthplanentryhistory_avi_hkey]
        ,[effective_date]
        ,[expiration_date]
        ,[healthplanentryhistoryid]
        ,[hospitalmasterid]
        ,[plancode]
        ,[plangeneration]
        ,[code]
        ,[pawvalue]
        ,[quantitycoveredbyplan]
        ,[healthplanentrymasterid]
        ,[healthplanentryid]
        ,[healthplanid]
        ,[lastupdate]
        ,[origpawvalue]
        ,[active_ind]
        ,[hash_diff]
        ,[source_sys_id]
        ,[create_date]
        ,[update_date]
        ,cnt
        ,Rank
    INTO ##tmphph
    FROM (
        SELECT *
            ,COUNT(*) OVER (PARTITION BY [healthplanentryhistory_avi_hkey]) AS cnt
            ,RANK() OVER (
                PARTITION BY [healthplanentryhistory_avi_hkey] ORDER BY healthplanentryhistoryid DESC
                ) AS Rank
        FROM [atf_healthplanentryhistory_avi]
        ) AS t
    WHERE t.cnt > 1
        AND t.rank > 1
    ORDER BY healthplanentryhistoryid DESC;
    
    ---SELECT * FROM ##tmphph where healthplanentryhistory_avi_hkey = 0x039E7D809F8138B703FC9991E9D8F655
    MERGE INTO [atf_healthplanentryhistory_avi] atf
    USING ##tmphph TEMP
        ON atf.healthplanentryhistory_avi_hkey = TEMP.[healthplanentryhistory_avi_hkey]
            AND atf.effective_date = TEMP.effective_date
            AND atf.healthplanentryhistoryid = TEMP.healthplanentryhistoryid
            AND TEMP.rank > 1
    WHEN MATCHED
        THEN
            UPDATE
            SET atf.effective_date = getdate() - TEMP.rank /*This will update the effective_date to efective_date - rank#*/
                ,expiration_date = getdate() - TEMP.rank
                ,active_ind = 0;
    
    DROP TABLE ##tmphph
    

    【讨论】:

      【解决方案3】:

      使用临时表:

      Create Table #tbl
      (
      hkey Int,
      Observation Date,
      Retriment Date
      )
      Insert Into #tbl Values
      (23,'2018-10-08','3030-01-01'),
      (23,'2018-10-08','3030-01-01'),
      (23,'2018-10-08','3030-01-01'),
      (21,'2018-11-08','3030-01-01'),
      (21,'2018-11-08','3030-01-01'),
      (21,'2018-11-08','3030-01-01'),
      (30,'2018-11-08','3030-01-01')
      
      
      Select Row_Number() OVER(Order By (Select Null)) As raworder,*  Into #temp From #tbl
      
      Select hkey,
              DateAdd(Day,-Row_Number() Over (Partition By hkey Order By hkey)+1 , Observation) As newDT,  
              Case When (Row_Number() Over (Partition By hkey Order By hkey) = 1) Then Retriment Else Convert(Date,GetDate()) End As Retriment
          From #temp
         Order By raworder
      

      结果:

      hkey    newDT       Retriment
      23      2018-10-08  3030-01-01
      23      2018-10-07  2018-12-18
      23      2018-10-06  2018-12-18
      21      2018-11-08  3030-01-01
      21      2018-11-07  2018-12-18
      21      2018-11-06  2018-12-18
      30      2018-11-08  3030-01-01
      

      【讨论】:

        猜你喜欢
        • 2020-04-21
        • 2018-03-04
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多