【问题标题】:T-SQL: Partitioning on multiple columnsT-SQL:对多列进行分区
【发布时间】:2022-01-18 23:25:43
【问题描述】:

我在使用以下 SQL 语句进行分区时遇到问题:

Declare @Total int;
Declare @MaxBlockSize int = 3;
Declare @testGrpPct float = .25;

DECLARE @students TABLE(
    id bigint NOT NULL
    ,TimeZone nvarchar(50)
)

Insert  into @students (id, TimeZone)
values  (154058701677130000,'Central Standard Time')
        ,(157089441549513000,'Central Standard Time')
        ,(152873971640300000,'Central Standard Time')
        ,(153611923609744000,'Mountain Standard Time')
        ,(157091006083626000,'Mountain Standard Time')
        ,(157087925333783000,'Mountain Standard Time')
        ,(153610132054733000,'Central Standard Time')
        ,(154060631031804000,'Central Standard Time')
        ,(157088513769468000,'Central Standard Time')
        ,(153615959083840000,'Central Standard Time')
        ,(152813428061631000,'Central Standard Time')
        ,(156948713062134000,'Central Standard Time')
        ,(153609396063433000,'Central Standard Time')
        ,(157092455047885000,'Central Standard Time')
        ,(153505362979714000,'Central Standard Time')
        ,(152814176216413000,'Central Standard Time')
        ,(157094637059044000,'Mountain Standard Time')
        ,(157089221575046000,'Mountain Standard Time')
        ,(152806972331521000,'Mountain Standard Time')
        ,(157087495031747000,'Mountain Standard Time')
        ,(157092954337834000,'Mountain Standard Time')
        ,(157094331126510000,'Mountain Standard Time')
        ,(152873684187870000,'Mountain Standard Time')
        ,(157090267743515000,'Mountain Standard Time')
        ,(157093842020332000,'Mountain Standard Time')
        ,(157088933174703000,'Mountain Standard Time')

Set @Total = (
    Select  Count(*)
    FROM    @students
)

Select  WinningGroup
        ,CEILING(((ROW_NUMBER() over (partition by WinningGroup order by WinningGroup, timezone))-1)/@MaxBlockSize) BlockNbr
        ,id
        ,TimeZone
from    (
            --Determines who is in test vs winning groups
            SELECT  case when (ROW_NUMBER() OVER (ORDER BY Newid())) <= @testGrpPct * @Total then 0 else 1 end as WinningGroup
                    ,id
                    ,TimeZone
            FROM    @students
        ) A
ORDER   by WinningGroup
        ,CEILING(((ROW_NUMBER() over (partition by WinningGroup order by WinningGroup, timezone))-1)/@MaxBlockSize)

所需的结果应如下所示:

WinningGroup BlockNbr id TimeZone
0 0 152813428061631000 Central Standard Time
0 0 152813428061631000 Central Standard Time
0 0 153610132054733000 Central Standard Time
0 1 157087925333783000 Mountain Standard Time
0 1 157094331126510000 Mountain Standard Time
0 1 157094637059044000 Mountain Standard Time
0 2 152873684187870000 Mountain Standard Time
1 0 156948713062134000 Central Standard Time
1 0 154058701677130000 Central Standard Time
1 0 152814176216413000 Central Standard Time
1 1 154060631031804000 Central Standard Time
1 1 153609396063433000 Central Standard Time
1 1 157088513769468000 Central Standard Time
1 2 157092455047885000 Central Standard Time
1 2 152873971640300000 Central Standard Time
1 2 153505362979714000 Central Standard Time
1 3 153615959083840000 Central Standard Time
1 3 157089441549513000 Central Standard Time
1 4 157090267743515000 Mountain Standard Time
1 4 157092954337834000 Mountain Standard Time
1 4 153611923609744000 Mountain Standard Time
1 5 157091006083626000 Mountain Standard Time
1 5 157089221575046000 Mountain Standard Time
1 5 157087495031747000 Mountain Standard Time
1 6 157093842020332000 Mountain Standard Time
1 6 157088933174703000 Mountain Standard Time
1 6 152806972331521000 Mountain Standard Time

目标是根据wgrouptimezone 分配blockNbr。每个街区最多可以有 3 名学生(为此使用上限和 MaxBlockSize)。但是,每个块可能只包含一个时区和一个 wgroup。如果您参考上表,您会看到 WinningGroup 1、BlockNbr 3 在下一条记录位于不同时区之前只有 2 条记录,因此人们被放入不同的块中,因此每个块仅包含 1 个时区。

【问题讨论】:

  • 请向minimal reproducible example 提供样本数据和所需结果。
  • 你试过 NTILE(@MaxBlockSize) OVER(PARTITION BY wgroup, timezone ORDER BY wgroup, timezone)
  • 样本数据和期望的结果对于 SQL 查询问题总是有用的。
  • 您的代码看起来会根据 wgroup、时区的分区返回一个 rowid。您需要添加成员,然后过滤 rowid 的出现小于等于 3 的位置。定义什么是块号?
  • NTILE(@MaxBlockSize) 将返回最多 3 个块中的所有记录,而不是每个块 3 个记录。不管怎样,我仍然很感激这个建议!

标签: sql sql-server tsql window-functions


【解决方案1】:

在订购之前尝试分组

    Select  WinningGroup
        ,CEILING(((ROW_NUMBER() over (partition by WinningGroup,WinningGroup, timezone order by WinningGroup, timezone))-1)/@MaxBlockSize) BlockNbr
        ,TimeZone
from    (
            --Determines who is in test vs winning groups
            SELECT  case when (ROW_NUMBER() OVER (ORDER BY Newid())) <= @testGrpPct * @Total then 0 else 1 end as WinningGroup
                    ,TimeZone
            FROM    @students
        ) A
  GROUP BY  WinningGroup,WinningGroup, timezone
ORDER   by WinningGroup
        ,CEILING(((ROW_NUMBER() over (partition by WinningGroup,WinningGroup, timezone order by WinningGroup, timezone))-1)/@MaxBlockSize)

【讨论】:

    猜你喜欢
    • 2014-03-28
    • 2019-03-19
    • 2020-09-14
    • 1970-01-01
    • 2017-07-31
    • 1970-01-01
    • 2016-11-13
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多