【问题标题】:Partition By over Two Columns in Row_Number function在 Row_Number 函数中按两列分区
【发布时间】:2017-09-15 15:15:30
【问题描述】:

我正在尝试使用以下查询对记录进行排名:

SELECT 
ROW_NUMBER() over (partition by 
TW.EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate 
order by TW.EMPL_ID,TW.Effective_Bdate) RN,
TW.EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate,Effective_BDate from 
TT_EMPLOYEE_WORKDAY TW
where TW.HR_DOMAIN_CODE = 'SGP'

但是,生成的 Row_Number 计算列仅显示第一列的分区。理想情况下,我希望 Row_Number 具有相同的值,其中按列数据分区是相同的。

任何线索我可能会出错?

使用排名或密集排名不是一个选项,因为我想识别多个员工的所有此类行,其中 EMPL_ID、HR_DEPT_ID 和 Transfer_StartDate 相同 (RN=1)

Sample data:
RN  AON_EMPL_ID   HR_DEPT_ID    Transfer_Startdate  Effective_BDate
1   0100690       69895             01/01/2017       2017-01-01
2   0100690       69895             01/01/2017       2017-01-03
3   0100690       69895             01/01/2017       2017-01-04

【问题讨论】:

  • 我认为样本数据和期望的结果会帮助你解释你想要什么。
  • 将数据作为 ddl+dml 进行采样,期望的结果会有所帮助...
  • 因为你Order by的结果,会打乱Partition部分,试试Order by TW.EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate,TW.RN或者去掉Order by
  • 即使最后一行没有使用 order by,RN 编号也会有所不同。它应该是 1,因为 Partition By 中使用的前三列具有相同的数据。
  • @Sharktooth 你说的是RANK 或者DENSE_RANK,而不是ROW_NUMBER,把后一个替换成前一个,取决于你想如何显示第二名数据

标签: sql sql-server window-functions


【解决方案1】:

将样本数据扩展到:

create table t (
    aon_empl_id varchar(16)
  , hr_dept_id varchar(16)
  , Transfer_Startdate date
  , Effective_bdate date
);
insert into t values 
 ('0100690','69895','01/01/2017','2017-01-01')
,('0100690','69895','01/01/2017','2017-01-03')
,('0100690','69895','01/01/2017','2017-01-04')
,('0200700','69895','01/01/2016','2016-01-01')
,('0200700','69895','01/01/2016','2016-01-03')
,('0200700','69896','01/01/2017','2017-01-04')
,('0200700','69896','01/01/2017','2017-01-04');

使用top with ties

select top 1 with ties
    aon_empl_id
  , hr_dept_id
  , Transfer_Startdate = convert(char(10),Transfer_Startdate,120)
  , Effective_bdate    = convert(char(10),Effective_bdate,120)
from t
order by row_number() over (
      partition by aon_empl_id, hr_dept_id, Transfer_Startdate 
      order by Effective_bdate
      )

rextester 演示:http://rextester.com/KOIZ42069

返回:

+-------------+------------+--------------------+-----------------+
| aon_empl_id | hr_dept_id | Transfer_Startdate | Effective_bdate |
+-------------+------------+--------------------+-----------------+
|     0100690 |      69895 | 2017-01-01         | 2017-01-01      |
|     0200700 |      69895 | 2016-01-01         | 2016-01-01      |
|     0200700 |      69896 | 2017-01-01         | 2017-01-04      |
+-------------+------------+--------------------+-----------------+

使用common table expressionrow_number() 的替代方法:

;with cte as (
select
    rn = row_number() over (
      partition by aon_empl_id, hr_dept_id, Transfer_Startdate 
      order by Effective_bdate
    )
  , aon_empl_id
  , hr_dept_id
  , Transfer_Startdate = convert(char(10),Transfer_Startdate,120)
  , Effective_bdate    = convert(char(10),Effective_bdate,120)
from t tw
)

select *
from cte
where rn = 1

返回:

+----+-------------+------------+--------------------+-----------------+
| rn | aon_empl_id | hr_dept_id | Transfer_Startdate | Effective_bdate |
+----+-------------+------------+--------------------+-----------------+
|  1 |     0100690 |      69895 | 2017-01-01         | 2017-01-01      |
|  1 |     0200700 |      69895 | 2016-01-01         | 2016-01-01      |
|  1 |     0200700 |      69896 | 2017-01-01         | 2017-01-04      |
+----+-------------+------------+--------------------+-----------------+

【讨论】:

  • 这适用于我用作样本记录的一名员工,对于许多员工,它将重复排名,例如,对于上述行集,所有行都将为 1,但对于下一个员工它将为 2。我希望所有 Employee 记录的 RN = 1,其中 empl_id、hr_dept_id、transfer_startdate 相同。
  • @Sharktooth 你还没有解释什么时候你希望它不是1partition 中有什么部分,例如(partition by aon_empl_id order by aon_empl_id, hr_dept_id, Transfer_Startdate)
  • partition by 将具有以下列:aon_empl_id、hr_dept_id、Transfer_Startdate 如果这些列对于多于一行具有不同的唯一值,则 RN 应递增 1,否则应保持为 1。
  • @Sharktooth 这听起来像是你已经做过的事情。但我又试了一下并更新了我的答案。
【解决方案2】:
SELECT 
RANK() over (partition by   --or DENSE_RANK()
TW.EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate 
order by TW.EMPL_ID,TW.Effective_Bdate) RN,
TW.EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate,Effective_BDate from 
TT_EMPLOYEE_WORKDAY TW
where TW.HR_DOMAIN_CODE = 'SGP'

更新

SELECT 
RANK() over (partition by   --or DENSE_RANK()
TW.EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate 
order by TW.EMPL_ID) RN,
TW.EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate,Effective_BDate from 
TT_EMPLOYEE_WORKDAY TW
where TW.HR_DOMAIN_CODE = 'SGP'
Order by RN,TW.Effective_Bdate

【讨论】:

    【解决方案3】:

    这段代码似乎可以正常工作:

    SELECT 
    dense_rank() over (partition by AON_EMPL_ID 
    order by AON_EMPL_ID,HR_DEPT_ID,Transfer_StartDate) RN,
    TW.AON_EMPL_ID,TW.HR_DEPT_ID,TW.Transfer_Startdate,Effective_BDate from 
    TT_AON_EMPLOYEE_WORKDAY TW
    where TW.HR_DOMAIN_CODE = 'SGP'
    

    显然,我只需要按 AON_EMPL_ID 进行分区,其他所有内容都应该转到 Order By 子句。

    【讨论】:

      猜你喜欢
      • 2013-10-02
      • 2018-09-16
      • 1970-01-01
      • 2017-07-31
      • 2018-09-12
      • 2016-01-10
      • 2020-08-28
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多