【问题标题】:Merge employee history records if there is no change between the rows based on start date如果基于开始日期的行之间没有更改,则合并员工历史记录
【发布时间】:2021-11-17 09:51:50
【问题描述】:

当任何其他维度列(员工、部门、工作、职位状态)中没有其他更改时,我正在尝试合并员工历史记录并获取最小开始日期和最大结束日期。

输入:

输出:

用于创建表和填充数据的脚本:

create table EmployeeHistory (EmployeeHistoryID INT,
                              EmployeeID INT,
                              DepartmentID  INT,
                              JobID INT,
                              PositionStatusID  INT,
                              StartDate DATE,
                              EndDate DATE)

insert into EmployeeHistory values (123, 362880, 450, 243, 1, '2019-05-28', '2020-05-03')
insert into EmployeeHistory values (124, 362880, 450, 243, 2, '2020-05-04', '2020-08-20')
insert into EmployeeHistory values (125, 362880, 450, 243, 1, '2020-08-21', '2020-08-31')
insert into EmployeeHistory values (126, 362880, 450, 243, 1, '2020-09-01',  '2021-09-23')
insert into EmployeeHistory values (127, 362881, 450, 243, 1, '2019-07-01', '2019-07-31')
insert into EmployeeHistory values (128, 362881, 450, 243, 1, '2019-08-01',  '2021-09-23')

当我使用分析函数或 group by 时,它正在合并第 1、3 和 4 行,但我只想合并 3 和 4,因为所有其他列都相同。即使第 1 行与第 3 行和第 4 行相同,但在这种情况下,为了维护历史第 1 行不应该合并到第 3 行和第 4 行。

示例代码,我正在使用:

select distinct *
  from (select MAX(EmployeeHistoryID) OVER (PARTITION BY EmployeeID, DepartmentID, JobID, PositionStatusID)  AS EmployeeHistoryID,
               EmployeeID,
               DepartmentID,
               JobID,
               PositionStatusID,
               MIN(StartDate) OVER (PARTITION BY EmployeeID, DepartmentID, JobID, PositionStatusID)  AS StartDate,
               MAX(EndDate) OVER (PARTITION BY EmployeeID, DepartmentID, JobID, PositionStatusID)  AS EndDate
          from EmployeeHistory) m

【问题讨论】:

    标签: sql sql-server tsql grouping aggregate-functions


    【解决方案1】:

    这是一种差距和孤岛问题(一种与将相邻行与相似信息组合相关的问题类型)。

    在您的数据中,您的每个员工的记录完美地“平铺”在一起。没有间隙。一行的开始日期是员工的结束日期加上前一行的一天。

    这允许您仅使用窗口函数来解决问题。避免聚合通常是一种性能优势。这个想法是找到有变化的第一行,保留该行并计算结束日期。最终结束日期有点复杂:

    select eh.EmployeeHistoryID, eh.EmployeeID, eh.DepartmentID, eh.JobID, eh.PositionStatusID, eh.StartDate,
           lead(dateadd(day, -1, StartDate), 1, max_EndDate) over (partition by EmployeeId order by StartDate) as EndDate
    from (select eh.*,
                 lag(StartDate) over (partition by EmployeeID order by StartDate) as prev_StartDate,
                 lag(StartDate) over (partition by EmployeeID, DepartmentID, JobID, PositionStatusID order by StartDate) as prev_StartDate_same,
                 max(EndDate) over (partition by EmployeeId) as max_EndDate
          from EmployeeHistory eh
         ) eh
    where prev_StartDate_same is null or prev_StartDate_same <> prev_StartDate
    order by EmployeeHistoryID;
    

    Here 是一个 dbfiddle。

    【讨论】:

      【解决方案2】:

      如果我理解正确的话,这很容易使用 group by 来实现。看看这是否符合预期:

      SELECT Max(employeehistoryid) AS EmployeeHistoryID,
             employeeid,
             departmentid,
             jobid,
             positionstatusid,
             Min(startdate)         AS StartDate,
             Max(enddate)           AS EndDate
      FROM   employeehistory
      GROUP  BY employeeid,
                departmentid,
                jobid,
                positionstatusid 
      

      【讨论】:

        猜你喜欢
        • 2018-04-30
        • 2018-08-06
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2023-03-27
        • 1970-01-01
        • 1970-01-01
        • 2016-11-26
        相关资源
        最近更新 更多