【问题标题】:Netezza Parition by with excluding specific records排除特定记录的 Netezza 分区
【发布时间】:2016-06-16 16:58:50
【问题描述】:

我在 Netezza 的 web_event 表中有一些数据,格式如下。

vstr_id  |  sessn_id  |  sessn_ts            | wbpg_nm 
V1       |  V1S1      |  02-02-2015 09:20:00 | /home/login
V1       |  V1S1      |  02-02-2015 09:22:00 | -1
V1       |  V1S1      |  02-02-2015 09:30:00 | /home/contacts
V1       |  V1S1      |  02-02-2015 09:32:00 | -1
V1       |  V1S1      |  02-02-2015 09:50:00 | /home/search
V1       |  V1S1      |  02-02-2015 09:55:00 | -1
V2       |  V2S1      |  02-02-2015 09:10:00 | /home
V2       |  V2S1      |  02-02-2015 09:15:00 | /home/apps
V2       |  V2S2      |  02-02-2015 09:20:00 | /home/news
V2       |  V2S2      |  02-02-2015 09:23:00 | /home/news/internal

这是我的源表。

我正在尝试使用该 web_event 表并创建另一个表,如下所示。

我希望像下面这样加载 sessn_durtn 表和 time_on_pg 表。

1) time_on_page 列:是当前页面和下一页加载之间的时间差,如果没有其他事件或页面加载,会话的最后一页可以有 0 秒。它可以用分钟或秒来表示。

Insert into time_on_pg (select VSTR_ID,
           SESSN_ID,
           sessn_ts,
           WBPG_NM,
           ????? as time_on_page
           from web_event)

vstr_id  |  sessn_id  |  sessn_ts            | wbpg_nm              | wanted_time_on_page   | currently_known_time_on_page
V1       |  V1S1      |  02-02-2015 09:20:00 | /home/login          |   10mins              |   2mins
V1       |  V1S1      |  02-02-2015 09:22:00 | -1                   |                       |   8mins
V1       |  V1S1      |  02-02-2015 09:30:00 | /home/contacts       |   20mins              |   2mins
V1       |  V1S1      |  02-02-2015 09:32:00 | -1                   |                       |   18mins
V1       |  V1S1      |  02-02-2015 09:50:00 | /home/search         |   5mins               |   5mins
V1       |  V1S1      |  02-02-2015 09:55:00 | -1                   |                       |   

V2       |  V2S1      |  02-02-2015 09:10:00 | /home                |   5mins               |   5mins
V2       |  V2S1      |  02-02-2015 09:15:00 | /home/apps           |                       |

V2       |  V2S2      |  02-02-2015 09:20:00 | /home/news           |   3mins               |   3mins
V2       |  V2S2      |  02-02-2015 09:23:00 | /home/news/internal  |                       |

我们如何在 Netezza 或任何 SQL 查询中做到这一点?

我有逻辑来计算 current_known_time_on_page 使用

SELECT vstr_id,
   sessn_id,
   sessn_ts,
   wbpg_nm,
   ???????? AS wanted_time_on_page,
   extract(epoch from (lag(event_ts) over (partition by vstr_id, sessn_id order by event_ts DESC) - event_ts)) AS currently_known_time_on_page
   from web_event;

wanted_time_on_page 和 current_known_time_on_page 的主要区别是在计算除最后一页之外的时间差时消除“-1”页。

【问题讨论】:

    标签: sql stored-procedures netezza


    【解决方案1】:

    我不知道您的数据集有多大以及您有多少可用 RAM。这个查询是在内存中完成的。您可以将每个单独的 CTE 转换为临时表以提高速度。

    WITH CTE_SessionOrder AS (
    SELECT
         sessn_id
        ,sessn_ts       
        ,wbpg_nm
        ,ROW_NUMBER() OVER(PARTITION BY sessn_id ORDER BY sessn_ts DESC) AS RowNum  -- This is sorted Desc to get last row
    FROM
        web_event
    )
    ,CTE_KeepLastRow AS (
    SELECT *
    FROM
        CTE_SessionOrder
    WHERE
        RowNum = 1
        AND wbpg_nm = '-1'
    )
    ,CTE_OtherRows AS (
    SELECT *
    FROM
        CTE_SessionOrder
    WHERE
        wbpg_nm != '-1'
    )
    ,CTE_FilteredData AS (
    SELECT sessn_id,sessn_ts,wbpg_nm FROM CTE_KeepLastRow
    UNION
    SELECT sessn_id,sessn_ts,wbpg_nm FROM CTE_OtherRows
    )
    ,CTE_FilterOrderedData AS (
    SELECT
         *
        ,ROW_NUMBER() OVER(PARTITION BY sessn_id ORDER BY sessn_ts) AS RowNum   -- Now Ordered Asc
    FROM
        CTE_FilteredData
    )
    ,CTE_FinalData AS (
    SELECT
         D1.sessn_id
        ,D1.sessn_ts       
        ,D1.wbpg_nm
        ,DATEDIFF(mi,D1.sessn_ts,D2.sessn_ts) time_on_page
    FROM
        CTE_FilterOrderedData D1
        LEFT JOIN CTE_FilterOrderedData D2
            ON  D1.sessn_id = D2.sessn_id
                AND D1.RowNum + 1 = D2.RowNum
    UNION
    SELECT
         sessn_id
        ,sessn_ts       
        ,wbpg_nm
        ,CAST(NULL AS INT) time_on_page
    FROM
        CTE_SessionOrder
    WHERE
        RowNum != 1
        AND wbpg_nm = '-1'
    )
    SELECT *
    FROM
        CTE_FinalData
    

    【讨论】:

    • Arleigh 您提供的结果集仅保留 1 '-1' 结果在您的答案中有 3 在他的结果集和起始表中
    • 没有意识到你也想要那个。更新了代码以包含它。谢谢。
    • 我试图弄清楚为什么疯狂的长帖子然后我意识到我没有拿起最后一个 -1 并且我在 OUTER APPLY 中错过了它,我将调整外部应用但我刚刚整理了一个 CTE,您可能想看看使用 2 行号 2 自引用并在 ROW_NUMBER 函数中调整 PARTITION BY 可以帮助您更快地获得所需的结果。
    • 您的解决方案是 MSSQL 方言吗? CTE 在 Netezza 中可以正常工作,但必须放弃 DATEDIFF 以支持 AGE() 或提取时代的数学。此外,这里没有任何东西会迫使它成为 Netezza 的内存。您的 SQL 对我来说非常清晰易读,这是一个加分项!
    • @ScottMcG 是的,它在 MSSQL 中,在 SQL Server 2012 中测试。我不熟悉 Netezza。我只是认为 Netezza 是创建数据集的源系统。我不知道它有自己的语法。我会记住这一点。
    【解决方案2】:

    我认为 event_ts 和 sessn_ts 一样 ????无论如何,这里有一个对你有用的查询,它使用OUTER APPLY 技术在表中查找(> sessn_ts) 之后不是网页-1 的结果,然后按升序获得最高结果。

    只需将表名更改为您的表即可。

    这是一个主要使用outer apply 的解决方案,但也使用公用表表达式 (cte) 来设置最后需要的'-1' 的时间。

    ;WITH cteMaxNeg1 AS (
        SELECT
           sessn_id
           ,MaxNeg1SessnTs = MAX(CASE WHEN we.webpg_nm = '-1' THEN we.sessn_ts ELSE NULL END)
           ,MaxPageSessnTs = MAX(CASE WHEN we.webpg_nm <> '-1' THEN we.sessn_ts ELSE NULL END)
        FROM
           @WebEvents we
        GROUP BY
           sessn_id
    )
    
    SELECT
        we.*
        ,currently_known_time_on_page = ISNULL(LAG(we.sessn_ts) over (partition by we.vstr_id, we.sessn_id order by we.sessn_ts DESC) - we.sessn_ts,CAST(0 AS DATETIME))
        ,WantedTimeOnPage = CASE
           WHEN we.sessn_ts = m.MaxPageSessnTs AND we.webpg_nm <> '-1' THEN DATEDIFF(MINUTE,we.sessn_ts,m.MaxNeg1SessnTs)
           WHEN we.webpg_nm <> '-1' THEN DATEDIFF(MINUTE,we.sessn_ts,o.sessn_ts)
           ELSE NULL
        END
    FROM
        @WebEvents we
        LEFT JOIN cteMaxNeg1 m
        ON we.sessn_id = m.sessn_id
        OUTER APPLY (
           SELECT TOP 1sessn_ts
           FROM
              @WebEvents i
           WHERE
              i.webpg_nm <> '-1'
              AND i.sessn_id = we.sessn_id
              AND i.sessn_ts > we.sessn_ts
    
           ORDER BY
              i.sessn_ts ASC
    
        ) o
    ORDER BY
        we.sessn_id
        ,we.sessn_ts
    

    这是一个仅使用 CTE 和窗口函数的解决方案

    ;WITH cte AS (
        SELECT
           *
           ,RowNum = ROW_NUMBER() OVER (PARTITION BY sessn_id, IIF(webpg_nm = '-1',0,1) ORDER BY sessn_ts)
           ,LastNeg1RowNum = ROW_NUMBER() OVER (PARTITION BY sessn_id, IIF(webpg_nm = '-1',0,1) ORDER BY sessn_ts DESC)
        FROM
           @WebEvents
    )
    
    SELECT
        c1.*
        ,WantedTimeOnPage = CASE
           WHEN c1.LastNeg1RowNum = 1 AND c1.webpg_nm <> '-1' THEN DATEDIFF(MINUTE,c1.sessn_ts,c3.sessn_ts)
           WHEN c1.webpg_nm <> '-1' THEN DATEDIFF(MINUTE,c1.sessn_ts,c2.sessn_ts)
           ELSE NULL
        END
    FROM
        cte c1
        LEFT JOIN cte c2
        ON c1.sessn_id = c2.sessn_id
        AND (c1.RowNum + 1) = c2.RowNum
        AND c2.webpg_nm <> '-1'
        LEFT JOIN cte c3
        ON c1.sessn_id = c3.sessn_id
        AND c3.LastNeg1RowNum = 1 
        AND c3.webpg_nm = '-1'
    ORDER BY
        c1.sessn_id
        ,c1.sessn_ts
    

    我从你那里得到的测试数据:

    DECLARE @WebEvents AS TABLE (vstr_id CHAR(2), sessn_id CHAR(5), sessn_ts DATETIME, webpg_nm VARCHAR(100))
    
    INSERT INTO @WebEvents (vstr_id, sessn_id, sessn_ts, webpg_nm)
    VALUES
    ('V1','V1S1','02-02-2015 09:20:00','/home/login')
    ,('V1','V1S1','02-02-2015 09:22:00','-1')
    ,('V1','V1S1','02-02-2015 09:30:00','/home/contacts')
    ,('V1','V1S1','02-02-2015 09:32:00','-1')
    ,('V1','V1S1','02-02-2015 09:50:00','/home/search')
    ,('V1','V1S1','02-02-2015 09:55:00','-1')
    ,('V2','V2S1','02-02-2015 09:10:00','/home')
    ,('V2','V2S1','02-02-2015 09:15:00','/home/apps')
    ,('V2','V2S2','02-02-2015 09:20:00','/home/news')
    ,('V2','V2S2','02-02-2015 09:23:00','/home/news/internal')
    

    【讨论】:

    • 感谢 matt 抽出宝贵时间提供帮助。
    猜你喜欢
    • 2013-05-31
    • 2020-06-02
    • 2016-12-23
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-05-08
    相关资源
    最近更新 更多