【问题标题】:Creating a trend line from data set SQL从数据集 SQL 创建趋势线
【发布时间】:2012-04-06 17:23:18
【问题描述】:

下面的代码返回一定天数后一段时间(期间为 YYYY,WW)内已解决的工单数量和打开的工单数量。例如,如果@NoOfDays 是 7:

已解决 |打开|周 |年份 |期间

56 | 30 | 13 | 2012 | 2012, 13

237 | 222 | 14 | 2012 | 2012, 14

'resolved' 和 'opened' 在线 (y) 上绘制在周期 (x) 上。我想添加另一列“趋势”,它会返回一个数字,当绘制在一段时间内时,它将是一条趋势线(简单线性回归)。 确实想将两组值用作趋势的一个数据源。

这是我的代码:

SELECT a.resolved, b.opened, a.weekClosed AS week, a.yearClosed AS year,
    CAST(a.yearClosed as varchar(5)) + ', ' + CAST(a.weekClosed as varchar(5)) AS period
FROM 
    (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS resolved, { fn WEEK(date_closed) } AS weekClosed, { fn YEAR(date_closed) } AS yearClosed
    FROM v_rpt_Service
    WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0))
    GROUP BY { fn WEEK(date_closed) }, { fn YEAR(date_closed) }) AS a 
LEFT OUTER JOIN
    (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS opened, { fn WEEK(date_entered) } AS weekEntered, { fn YEAR(date_entered) 
    } AS yearEntered
    FROM v_rpt_Service AS v_rpt_Service_1
    WHERE        (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0))
    GROUP BY { fn WEEK(date_entered) }, { fn YEAR(date_entered) }) AS b ON a.weekClosed = b.weekEntered AND a.yearClosed = b.yearEntered
ORDER BY year, week

编辑:

根据serc.carleton.edu/files/mathyouneed/best_fit_line_dividing.pdf,看来我想把数据分成两半,然后计算平均值。然后我需要找到最佳拟合线,并使用斜率和 y 截距来计算使用y = mx + b在“趋势”中返回所需的值?

我知道这在 SQL 中是很有可能的,但是,我插入 SQL 的程序对我能做的有限制。

红色和蓝色的点是我现在返回的数字(打开并解决)。为了创建紫色线,我需要为“趋势”中的每个时期返回一个值。 (这张图片是假设的)

【问题讨论】:

  • 这是针对 MS SQLServer 还是针对不同的 RDBMS?
  • MS SQLServer 是正确的。

标签: sql visual-studio math linear-regression


【解决方案1】:

我对这个问题很感兴趣,并且我发现了解复杂查询的最佳方法是使用我自己的风格和约定重新格式化它。我将它们应用于您的解决方案,结果如下。我不知道这对你是否有任何价值......

  • 有几段代码我认为不是 MS T-SQL 语法的一部分,例如 ({fn xxx }WEEK(xxx) 函数。
  • 此代码可以编译,但我无法运行它,因为我没有正确配置数据表。
  • 我进行了许多编码更改,这些更改需要大量解释,我将跳过其中的大部分内容。如果您想解释任何内容,请添加评论。
  • 我扔了很多空白。易读和难读的代码之间的区别往往只是旁观者的感知和感受,你可能会讨厌我的约定。
  • 不确定最终结果集应该是什么(即返回哪些列)

一些进一步的说明:

  • 如果该周内没有关闭项目,则此查询将不会获取该周输入的项目
  • 周可能不完整,例如并非所有 7 天都存在(将 @Interval 调整为始终包括整周 - 但是奇数间隔呢?)
  • 将 count(*) 值乘以 1.0 以提前将它们转换为浮点数(避免强制转换和整数数学截断)
  • 将早期的公式替换为后面的公式中的符号(此时事情变得更加清晰)

这就是我想出的:

;WITH cte as (
select
   c.period
  ,resolved_half1
  ,resolved_half2
  ,opened_half1
  ,opened_half2
  ,row = row_number() over(order by c.yearClosed, c.weekClosed)
  ,y1 = ((SUM(resolved_half1) + SUM(opened_half1)) - (SUM(resolved_half2) + SUM(opened_half2))) / ((count(resolved_half1) + count(opened_half1)) / 2)
  ,y2 = ((SUM(resolved_half2) + SUM(opened_half2)) / (count(resolved_half2) + COUNT (opened_half2)))
  ,x1 = ((count(c.period)) / 4)
  ,x2 = (((count(c.period)) / 4) * 3)
 from (select
          a.yearclosed
         ,a.weekClosed
         ,a.resolved_half1
         ,b.yearEntered
         ,b.weekEntered
         ,b.opened_half1
         ,cast(a.yearClosed as varchar(5)) + ', ' + cast(a.weekClosed as varchar(5))  period 
        from (--  Number of items per week that closed within @Interval
              select
                 count(distinct TicketNbr) * 1.0  resolved_half1
                ,datepart(wk, date_closed)        weekClosed
                ,year(date_closed)                yearClosed
               from v_rpt_Service 
               where date_closed >= @FullInterval
               group by
                 datepart(wk, date_closed)
                ,year(date_closed) )  a
         left outer join (--  Number of items per week that were entered within @Interval
                          select 
                             count(distinct TicketNbr) * 1.0  opened_half1
                            ,datepart(wk, date_entered)       weekEntered
                            ,year(date_entered)               yearEntered
                           from v_rpt_Service
                           where date_entered >= @FullInterval
                           group by
                             datepart(wk, date_entered)
                            ,year(date_entered) )  b
          on a.weekClosed = b.weekEntered 
           and a.yearClosed = b.yearEntered)  c
  left outer join (select
                       d.yearclosed
                      ,d.weekClosed
                      ,d.resolved_half2
                      ,e.yearEntered
                      ,e.weekEntered
                      ,e.opened_half2
                      ,cast(yearClosed as varchar(5)) + ', ' + cast(weekClosed as varchar(5))  period 
                    from (select
                             count(distinct TicketNbr) * 1.0  resolved_half2
                            ,datepart(wk, date_closed)        weekClosed
                            ,year(date_closed)                yearClosed
                           from v_rpt_Service
                           where date_closed >= @HalfInterval
                           group by
                             datepart(wk, date_closed) 
                            ,year(date_closed) )  d 
                     left outer join (select
                                         count(distinct TicketNbr) * 1.0  opened_half2
                                        ,datepart(wk, date_entered)       weekEntered
                                        ,year(date_entered)               yearEntered
                                       from v_rpt_Service
                                       where date_entered >= @HalfInterval
                                       group by
                                           datepart(wk, date_entered) 
                                          ,year(date_entered) )  e
                      on d.weekClosed = e.weekEntered
                       and d.yearClosed = e.yearEntered )  f
   on c.period = f.period 
 group by
   c.period
  ,resolved_half1
  ,resolved_half2
  ,opened_half1
  ,opened_half2
  ,c.yearClosed
  ,c.weekClosed
)
SELECT
   row
  ,Period
  ,x1
  ,y1
  ,x2
  ,y2
  ,m = ((y1 - y2) / (x1 - x2))
  ,b = (y2 - (((y1 - y2) / (x1 - x2)) * x2))
  ,trend = ((((y1 - y2) / (x1 - x2)) * (row)) + (y2 - (((y1 - y2) / (x1 - x2)) * x2)))
 from cte
 order by row 

作为附录,所有子查询“c”都可以替换为以下内容,而“f”则替换为稍作修改的版本。性能的好坏取决于表大小、索引和其他不可估量的因素。

select
   datepart(wk, date_closed)  weekClosed
  ,year(date_closed)          yearClosed
  ,count (distinct case
                  when date_closed >= @FullInterval then TicketNbr
                  else null
                end)          resolved_half1
  ,count (distinct case
                  when date_entered >= @FullInterval then TicketNbr
                  else null
                end)          opened_half1
 from v_rpt_Service 
 where date_closed >= @FullInterval
  or date_entered >= @FullInterval
 group by
   datepart(wk, date_closed)
  ,year(date_closed) 

【讨论】:

  • 十年后,您在尝试提供帮助以及添加空格等方面的努力获得了支持。
【解决方案2】:

我想通了。我把数据分成多个派生表和子查询,本质上把数据一分为二。这些是我获取每个值的公式:

*(each row is a week)*
y1 = average of data first half
y2 = average of data second half
x1 = 1/4 of number of weeks
x2 = 3/4 of number of weeks
m = (y1-y2)/(x1-x2)
b = y2 - (m * x2)
trend = (m * row_number) + b 

这是我的(非常脏的)SQL 代码:

SELECT  resolved_half1,resolved_half2,opened_half1,opened_half2, c.period,
((SUM (resolved_half1) OVER () + SUM(opened_half1) OVER ()) - (SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ())) / ((COUNT(resolved_half1) OVER () + COUNT(opened_half1) OVER ()) / 2) as y1, 
((SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ()) / (COUNT(resolved_half2) OVER () + COUNT (opened_half2) OVER ())) as y2,
((COUNT(c.period) OVER ()) / 4) as x1,
(((COUNT(c.period) OVER ()) / 4) * 3) as x2,
((CAST(((SUM (resolved_half1) OVER () + SUM(opened_half1) OVER ()) - (SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ())) / ((COUNT(resolved_half1) OVER () + COUNT(opened_half1) OVER ()) / 2) as float) - CAST(((SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ()) / (COUNT(resolved_half2) OVER () + COUNT (opened_half2) OVER ())) as float)) / (CAST(((COUNT(c.period) OVER ()) / 4) as float) - CAST( (((COUNT(c.period) OVER ()) / 4) * 3) as float))) as m,
(CAST(((SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ()) / (COUNT(resolved_half2) OVER () + COUNT (opened_half2) OVER ())) as float) - (((CAST(((SUM (resolved_half1) OVER () + SUM(opened_half1) OVER ()) - (SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ())) / ((COUNT(resolved_half1) OVER () + COUNT(opened_half1) OVER ()) / 2) as float) - CAST(((SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ()) / (COUNT(resolved_half2) OVER () + COUNT (opened_half2) OVER ())) as float)) / (CAST(((COUNT(c.period) OVER ()) / 4) as float) - CAST( (((COUNT(c.period) OVER ()) / 4) * 3) as float))) * (((COUNT(c.period) OVER ()) / 4) * 3))) as b,
((((CAST(((SUM (resolved_half1) OVER () + SUM(opened_half1) OVER ()) - (SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ())) / ((COUNT(resolved_half1) OVER () + COUNT(opened_half1) OVER ()) / 2) as float) - CAST(((SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ()) / (COUNT(resolved_half2) OVER () + COUNT (opened_half2) OVER ())) as float)) / (CAST(((COUNT(c.period) OVER ()) / 4) as float) - CAST( (((COUNT(c.period) OVER ()) / 4) * 3) as float))) * (ROW_NUMBER() OVER(ORDER BY c.yearClosed,c.weekClosed))) + (CAST(((SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ()) / (COUNT(resolved_half2) OVER () + COUNT (opened_half2) OVER ())) as float) - (((CAST(((SUM (resolved_half1) OVER () + SUM(opened_half1) OVER ()) - (SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ())) / ((COUNT(resolved_half1) OVER () + COUNT(opened_half1) OVER ()) / 2) as float) - CAST(((SUM(resolved_half2) OVER () + SUM(opened_half2) OVER ()) / (COUNT(resolved_half2) OVER () + COUNT (opened_half2) OVER ())) as float)) / (CAST(((COUNT(c.period) OVER ()) / 4) as float) - CAST( (((COUNT(c.period) OVER ()) / 4) * 3) as float))) * (((COUNT(c.period) OVER ()) / 4) * 3)))) as trend,
ROW_NUMBER() OVER(ORDER BY c.yearClosed,c.weekClosed) as row

FROM
    (SELECT *, CAST(yearClosed as varchar(5)) + ', ' + CAST(weekClosed as varchar(5)) AS period
     FROM  (SELECT        TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS resolved_half1, { fn WEEK(date_closed) } AS weekClosed, { fn YEAR(date_closed) } AS yearClosed
                          FROM            v_rpt_Service
      WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180), 0))

      GROUP BY { fn WEEK(date_closed) }, { fn YEAR(date_closed) }) AS a 
      LEFT OUTER JOIN
      (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS opened_half1, { fn WEEK(date_entered) } AS weekEntered, { fn YEAR(date_entered) 
       FROM v_rpt_Service AS v_rpt_Service_1
       WHERE (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180), 0))
       GROUP BY { fn WEEK(date_entered) }, { fn YEAR(date_entered) }) AS b ON a.weekClosed = b.weekEntered AND a.yearClosed = b.yearEntered) as c 
       LEFT OUTER JOIN
       (SELECT *, CAST(yearClosed as varchar(5)) + ', ' + CAST(weekClosed as varchar(5)) AS period 
       FROM  (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS resolved_half2, { fn WEEK(date_closed) } AS weekClosed, { fn YEAR(date_closed) } AS yearClosed
       FROM v_rpt_Service
       WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180 / 2), 0))
       GROUP BY { fn WEEK(date_closed) }, { fn YEAR(date_closed) }) AS d 
       LEFT OUTER JOIN
       (SELECT TOP (100) PERCENT COUNT(DISTINCT TicketNbr) AS opened_half2, { fn WEEK(date_entered) } AS weekEntered, { fn YEAR(date_entered)} AS yearEntered
       FROM v_rpt_Service AS v_rpt_Service_1
       WHERE (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - (180 / 2), 0))
       GROUP BY { fn WEEK(date_entered) }, { fn YEAR(date_entered) }) AS e ON d.weekClosed = e.weekEntered AND d.yearClosed = e.yearEntered
) as f ON c.yearClosed = f.yearClosed AND c.weekClosed = f.weekClosed AND c.weekEntered = f.weekEntered AND c.yearEntered = f.yearEntered AND c.period = f.period
GROUP BY c.period, resolved_half1,resolved_half2,opened_half1,opened_half2,c.yearClosed,c.weekClosed
ORDER BY row

此代码使用硬编码值 180 天。我仍然需要能够使用变量来选择天数(不会出现除以 0 的错误),并且确实需要清理代码。 如果有人可以为我做这两件事(我不是最擅长 SQL),那么赏金就是他们的。

图片:

【讨论】:

    【解决方案3】:

    我相信这可以解决问题 - 如果不发布一些实际的示例数据,我会看看是否可以对其进行调整以修复它:

    DECLARE @noOfDays INT
    SET @noofdays = 180
    
    ;WITH tickets AS
    (
    SELECT DISTINCT
    DATENAME(YEAR,date_closed) + RIGHT('000' + CAST(DATEPART(WEEK,date_closed) AS VARCHAR(5)),3) as Period
    ,ticket_nbr
    ,1 as ticket_type --resolved
    FROM v_rpt_Service
    WHERE (date_closed >= DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0)) 
    UNION ALL
    SELECT DISTINCT
    DATENAME(YEAR,date_closed) + RIGHT('000' + CAST(DATEPART(WEEK,date_closed) AS VARCHAR(5)),3) as Period
    ,ticket_nbr
    ,0 as ticket_type --opened
    FROM v_rpt_Service
    WHERE  (date_entered > = DateAdd(Day, DateDiff(Day, 0, GetDate()) - @NoOfDays, 0)) 
    )
    ,tickets2 AS
    (
    SELECT
    Period
    ,SUM(CASE WHEN ticket_type = 0 THEN 1 ELSE 0 END) as opened
    ,SUM(CASE WHEN ticket_type = 1 THEN 1 ELSE 0 END) as closed
    FROM tickets
    GROUP BY
    Period
    )
    ,tickets3 AS
    (
    SELECT
    Period
    ,row_number() OVER (ORDER BY period ASC) as row
    ,opened
    ,closed
    ,COUNT(period) OVER() as base
    ,SUM(opened) OVER () as [Sumopened]
    ,SUM(opened * opened) OVER () as [Sumopened^2]
    ,SUM(opened * closed) OVER () as [Sumopenedclosed]
    ,SUM(closed) OVER () as [Sumclosed]
    ,SUM(closed * closed) OVER () as [Sumclosed^2]
    ,SUM(opened * closed) OVER () * COUNT(period) OVER () AS [nSumopenedclosed]
    ,SUM(opened) OVER () * SUM(closed) OVER () AS [Sumopened*Sumclosed]
    ,SUM(opened * opened) OVER () * COUNT(period) OVER () AS [nSumopened^2]
    ,SUM(opened) OVER () * SUM(opened) OVER () as [Sumopened*Sumopened]
    FROM tickets2
    )
    --Formula for linear regression is Y = A + BX
    SELECT
    period
    ,opened
    ,closed
    ,((1.0 / base) * [Sumclosed]) - 
    ([Sumopenedclosed] - ([Sumopened*Sumclosed] / base)) / ([Sumopened^2] - ([Sumopened*Sumopened] / base)) *((1.0 / base) * [Sumopened]) 
    + row * ([Sumopenedclosed] - ([Sumopened*Sumclosed] / base)) / ([Sumopened^2] - ([Sumopened*Sumopened] / base))  
    AS trend_point
    ,((1.0 / base) * [Sumclosed]) - 
    ([Sumopenedclosed] - ([Sumopened*Sumclosed] / base)) / ([Sumopened^2] - ([Sumopened*Sumopened] / base)) *((1.0 / base) * [Sumopened]) AS A
    ,([Sumopenedclosed] - ([Sumopened*Sumclosed] / base)) / ([Sumopened^2] - ([Sumopened*Sumopened] / base)) as B
    from tickets3
    

    【讨论】:

      猜你喜欢
      • 2016-09-10
      • 2016-08-25
      • 2022-10-14
      • 1970-01-01
      • 1970-01-01
      • 2014-12-13
      • 2020-10-17
      • 2012-03-07
      • 1970-01-01
      相关资源
      最近更新 更多