如何改进具有 700 万行的表的本机查询？答案

【问题标题】：How can I improve the native query for a table with 7 millions rows?如何改进具有 700 万行的表的本机查询？
【发布时间】：2020-04-09 15:48:33
【问题描述】：

我的数据库（SQL SERVER）中有以下视图（表）。

我想从这个表中检索 2 个东西。

具有每个产品编号的最新预订日期的对象。它将返回对象 = {0001, 2, 2019-06-06 10:39:58} 和 {0003, 2, 2019-06-07 12:39:58}。
如果所有步骤编号都没有产品编号的预订日期，则返回步骤编号 = 1 的对象。返回对象 = {0002, 1, NULL}。

视图有 7.000.000 行。我必须使用本机查询来做到这一点。

第一个检索到最新预订日期的产品的查询：

SELECT DISTINCT *
FROM TABLE t
WHERE t.BOOKING_DATE = (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER)

第二个查询，检索预订日期为 NULL 且 Step number = 1 的产品；

SELECT DISTINCT *
FROM TABLE t
WHERE (SELECT max(tbl.BOOKING_DATE) FROM TABLE tbl WHERE t.PRODUCT_NUMBER = tbl.PRODUCT_NUMBER) IS NULL AND t.STEP_NUMBER = 1

我尝试使用单个查询，但耗时太长。现在我使用 2 查询来获取这些信息，但未来我需要改进它。你有替代方案吗？我也不能使用存储过程，SQL SERVER里面的函数。我必须使用来自 Java 的本机查询来做到这一点。

【问题讨论】：

标签： sql sql-server nativequery

【解决方案1】：

试试这个，

Declare @p table(pumber int,step int,bookdate datetime)
insert into @p values 
(1,1,'2019-01-01'),(1,2,'2019-01-02'),(1,3,'2019-01-03')
,(2,1,null),(2,2,null),(2,3,null)
,(3,1,null),(3,2,null),(3,3,'2019-01-03')

;With CTE as
(
select pumber,max(bookdate)bookdate 
from @p p1 
where bookdate is not null
group by pumber
)

select p.* from @p p
where exists(select 1 from CTE c 
where p.pumber=c.pumber and p.bookdate=c.bookdate)
union all
select p1.* from @p p1
where p1.bookdate is null and step=1
and not exists(select 1 from CTE c 
where p1.pumber=c.pumber)

如果性能是主要关注点，那么 1 或 2 个查询无关紧要，最后是性能问题。

Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
Go

如果超过 90% 的数据是 where BookingDate is not null 或 where BookingDate is null，那么您可以在其上创建过滤索引。

 Create NonClustered index ix_Product on Product (ProductNumber,BookingDate,Stepnumber)
where BookingDate is not null
 Go

【讨论】：

添加索引对我有帮助。谢谢

【解决方案2】：

以正确的顺序尝试row_number()。 Null 值被 sql-server ORDER BY 视为可能的最低值。

SELECT TOP(1) WITH TIES *
FROM myTable t
ORDER BY row_number() over(partition by PRODUCT_NUMBER order by BOOKING_DATE DESC, STEP_NUMBER);

注意 sql-server 建议的索引以获得良好的性能。

【讨论】：

【解决方案3】：

可能最有效的方法是关联子查询：

select t.*
from t
where t.step_number = (select top (1) t2.step_number
                       from t t2
                       where t2.product_number = t.product_number and
                       order by t2.booking_date desc, t2.step_number
                      );

特别是，这可以利用(product_number, booking_date desc, step_number) 上的索引。

【讨论】：