【问题标题】:Is there any way to optimize the recursive sql query?Is there any way to optimize the recursive sql query?
【发布时间】:2022-12-01 13:10:29
【问题描述】:

I need to fill missing date values in orders table. DDL:

create table orders(order_date date, order_value int)

insert into orders values('2022-11-01',100),('2022-11-04 ',200),('2022-11-08',300)

Expected output is as:

order_date | order_value
-----------------------
2022-11-01 | 100
2022-11-02 | 100
2022-11-03 | 100
2022-11-04 | 200
2022-11-05 | 200
2022-11-06 | 200
2022-11-07 | 200
2022-11-08 | 300

I have solved the problem in ms sql using recursive query listed below.

with cte as (
  select min(order_date) [min_date], MAX(order_date) [max_date]
  FROM orders
),  cte2 AS(
  SELECT min_date [date]
  FROM cte
  UNION ALL
  SELECT dateadd(day,1,date) [date]
  FROM cte2
  WHERE date < (SELECT max_date FROM cte)
 ), cte3 as(
  select date [order_date], order_value
  FROM cte2 
  LEFT JOIN orders on date = order_date
 )
 SELECT order_date,
        FIRST_VALUE(order_value) IGNORE NULLS
        OVER(ORDER BY order_date desc ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) [order_value]
 FROM cte3

Is there any alternate approach to solve this problem or any way to optimize the recursive query?

Thanks in advance.

【问题讨论】:

  • You could switch to a Tally or a calendar table; both are more performant. As you are using the IGNORE NULLS syntax this means you are using SQL Server 2022, so you have access to the GENERATE_SERIES function too.

标签: sql-server tsql window-functions


【解决方案1】:

As mentioned in the cmets, as you are on SQL Server 2022 you can make use of GENERATE_SERIES here, which will be much more performant that a recursive Common Table Expression (rCTE). On prior versions, you can use an (inline) tally or a Calendar table.

WITH Dates AS(
    SELECT MIN(order_date) AS MinDate,
           MAX(order_date) AS MaxDate
    FROM dbo.orders),
DateSeries AS(
    SELECT DATEADD(DAY,GS.value,D.MinDate) AS Date
    FROM Dates D
         CROSS APPLY GENERATE_SERIES(0,DATEDIFF(DAY,MinDate,MaxDate),1) GS)
SELECT DS.Date,
       FIRST_VALUE(O.order_value) IGNORE NULLS OVER (ORDER BY DS.Date desc ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS [order_value]
FROM DateSeries DS
     LEFT JOIN dbo.orders O ON DS.Date = O.order_date
ORDER BY DS.Date;

db<>fiddle

【讨论】:

  • Many thanks for suggesting the use of GENERATE_SERIES. Could you please share the inline approach as well.
  • John shows one such method in their answer, @RishavGhosh .
【解决方案2】:

A calendar would be most useful, but another option is an ad-hoc numbers table

Example

Select Order_Date = coalesce(dateadd(DAY,N-1,Order_Date),order_date)
      ,Order_Value
 From ( Select *,NDays = datediff(DAY,order_date,lead(order_date,1) over (order by order_date)) From Orders ) A
 left Join ( Select Top 1000 N=Row_Number() Over (Order By (Select NULL)) From master..spt_values n1, master..spt_values n2 ) B
   on N<=NDays
 Order by Order_Date

Results

【讨论】:

    猜你喜欢
    • 2022-12-27
    • 2022-12-02
    • 2022-12-27
    • 2022-12-27
    • 2022-07-14
    • 2022-12-02
    • 1970-01-01
    • 2022-12-02
    • 2022-12-01
    相关资源
    最近更新 更多