【发布时间】:2014-03-30 19:14:53
【问题描述】:
我想知道是否有人可以帮助简化此过程 - 并提高性能...!?
我们有关于赠款的数据。 'Donors' 向'Recipients' 提供资金,我们希望显示每个捐助者在 3 个时期内的前 15 名受助人:CurrentYear-20、CurrentYear-10 和 CurrentYear。我们发布年度报告并显示每个捐助者在 World 和 GeoZone 总数中所占的百分比。
我已经“继承”了这段由我的一位前任编写的代码。在我们切换到使用视图之前,执行时间大约是 15-30 分钟。目前,这将在不到 4 小时内运行(计划为服务器代理作业)!管理层不高兴。由于各种原因,该视图必须继续使用,目前只有不到 900,000 行包含 1950 年代以后的数据。我们目前为 30 个(大)捐助者运行此报告,并且每年都会增加更多。
为了帮助提高性能,我考虑过使用 CTE 或/使用 SUM() OVER(Partition BY...) 或这些的组合,但我不确定如何去做。
有人能指出我正确的方向吗?
流程如下:
- 创建一个表格(变量)来保存当前捐赠者的前 15 名收件人
- 创建一个表(变量)来保存捐赠者列表
- 按照捐赠者在报告中出现的顺序填充捐赠者表
- 循环遍历供体表并针对每个供体:
- 将此捐助者的捐助者 ID 放入临时表中
- 循环 3 次(针对 CurrentYear-20、CurrentYear-10、CurrentYear)
- 计算 18 个区域/区域中每个区域的份额总数
- 打印报告中每个部分的值
- 获取下一个捐赠者 ID
您可以从上面看到,每个捐赠者的计算都运行了 54 次 (18x3)!
这里是代码(简化):
-- @LatestYear is passed as a parameter, hardcoded here for simplicity
DECLARE @LatestYear SMALLINT ,
@CurrentYear SMALLINT ,
@DonorID SMALLINT ,
@totalWorld NUMERIC(10, 2) ,
@LoopCounter TINYINT ,
@DonorName VARCHAR(100)
SELECT @latestyear = 2012
-- create a table to hold list of top 15 recipients for each donor and their 'share' of ODA.
DECLARE @Top15 TABLE
(
Country VARCHAR(100) ,
Percentage REAL
)
-- create a table to hold list of donors, ordered as they need to appear in the report.
DECLARE @PageOrder TABLE
(
DonorID SMALLINT ,
DonorName VARCHAR(100) ,
SortOrder SMALLINT IDENTITY(1, 1)
)
-- create a table to store the "focus" donor.
DECLARE @CurrentDonor TABLE ( DonorID SMALLINT )
INSERT INTO @PageOrder
SELECT DonorID ,
DonorName
FROM dbo.LookupDonor
ORDER BY DonorName;
-- cursor to loop through the donors in SortOrder
DECLARE DonorCursor CURSOR
FOR
SELECT DonorID ,
DonorName
FROM @PageOrder
ORDER BY DonorName;
OPEN DonorCursor
FETCH NEXT FROM DonorCursor INTO @DonorID, @DonorName
WHILE @@fetch_status = 0
BEGIN
INSERT INTO pubOutput
( XMLText )
SELECT @DonorName;
-- Populate the DonorID table
INSERT INTO @CurrentDonor
VALUES ( @DonorID )
/* The following loop is invoked 3 times. The first time through, the year will be 20 years before the latest year,
the second time through, 10 years before. The last time through the year will be the latest year.
*/
SET @LoopCounter = 1
WHILE @LoopCounter <= 3
BEGIN
SELECT @CurrentYear = CASE @LoopCounter
WHEN 1 THEN @LatestYear - 20
WHEN 2 THEN @LatestYear - 10
ELSE @LatestYear
END
-- calculate the world total for the current years (year,year-1) for all recipients
SELECT @totalWorld = SUM(Amount)
FROM dbo.vData2 d
INNER JOIN ( SELECT RecipientID
FROM dbo.RecipientGroup
WHERE GroupID = 160
) c ON d.RecipientID = c.RecipientID
INNER JOIN @CurrentDonor z ON d.DonorID = z.DonorID
WHERE d.year IN ( @CurrentYear - 1, @CurrentYear )
-- calculate the GeoZones total for the current years (year,year-1)
SELECT @totalGeoZones = SUM(Amount)
FROM dbo.vDac2a d
INNER JOIN ( SELECT RecipientID
FROM dbo.GeoZones
WHERE GeoZoneID = 100
) x ON d.RecipientID = x.RecipientID
INNER JOIN @CurrentDonor z ON d.DonorCode = z.DonorCode
WHERE d.year IN ( @CurrentYear - 1, @CurrentYear )
-- Find the top 15 recipients for the current donor
INSERT INTO @Top15
SELECT TOP 15
r.RecipientName ,
( ISNULL(SUM(Amount), 0) / @totalWorld ) * 100
FROM dbo.vData2 d
INNER JOIN dbo.LookupRecipient r ON r.RecipientID = d.RecipientID
INNER JOIN @CurrentDonor z ON d.DonorID = z.DonorID
WHERE d.year IN ( @CurrentYear - 1, @CurrentYear )
GROUP BY r.RecipientName
ORDER BY 2 DESC
-- Print the top 15 recipients and total
INSERT INTO pubOutput
(
XMLText
)
SELECT country + @Separator + CAST(percentage AS VARCHAR)
FROM @Top15
ORDER BY percentage DESC
INSERT INTO pubOutput
(
XMLText
)
SELECT @Heading1 + @Separator + CAST(SUM(Percentage) AS VARCHAR)
FROM @Top15
-- Breakdown by Regionas
-- Region1
IF @totalWorld IS NOT NULL
INSERT INTO pubOutput
(
XMLText
)
SELECT 'Region1' + @Separator
+ CAST(( ISNULL(SUM(Amount), 0) / @totalWorld ) * 100 AS VARCHAR)
FROM dbo.vData2 d
INNER JOIN ( SELECT RecipientID
FROM dbo.RecipientGroup
WHERE RegionID = 1
) c ON d.RecipientID = c.RecipientID
INNER JOIN @CurrentDonor z ON d.DonorID = z.DonorID
WHERE d.year IN ( @CurrentYear - 1, @CurrentYear )
ELSE -- force output of sub-total heading
INSERT INTO pubOutput
(
XMLText
)
SELECT @Heading2 + @Separator + '--'
-- Region2-8
/* similar syntax as Region1 above, for all Regions 2-8 */
-- Total Regions
INSERT INTO pubOutput
(
XMLText
)
SELECT @Heading2 + @Separator + CAST(@totalWorld AS VARCHAR)
-- Breakdown by GeoZones 1-7
-- GeoZone1
INSERT INTO pubOutput
(
XMLText
)
SELECT 'GeoZone1' + @Separator
+ CAST(( ISNULL(SUM(Amount), 0) / @totalGeoZones ) * 100 AS VARCHAR)
FROM dbo.vDac2a d
INNER JOIN ( SELECT RecipientID
FROM dbo.GeoZones
WHERE GeoZoneID = 1
) m ON d.RecipientID = m.RecipientID
INNER JOIN @CurrentDonor z ON d.DonorCode = z.DonorCode
WHERE d.year IN ( @CurrentYear - 1, @CurrentYear )
-- GeoZones2-8
/* similar syntax as GeoZone1 above for GeoZones 2-7 */
-- Total GeoZones - currently hard-coded as 100, due to minor rounding errors
INSERT INTO pubOutput
(
XMLText
)
SELECT @Heading3 + @Separator + '100'
SET @LoopCounter = @LoopCounter + 1
END -- year loop
-- Get the next donor from the cursor
FETCH NEXT FROM DonorCursor
INTO @DonorID, @DonorName
END
-- donorcursor
-- Cleanup
CLOSE DonorCursor
DEALLOCATE DonorCursor
非常感谢您提供的任何帮助。
【问题讨论】:
-
P.S.我们正在使用 SQL2008R2,但很快将迁移到 SQL2012。
-
我不会尝试删除游标“仅仅因为”,因为我有几个项目,游标实际上提高了性能并消除了复杂性(必须驯服 SQL 查询优化器!),无论如何;我将首先在本地复制数据库,然后在 Management Studio 中运行查询并打开“包括实际执行计划”选项,以帮助识别其中的瓶颈查询。 (我的赌注是 vData2 视图,您可以考虑创建一个简化的派生视图,仅针对该批次进行调整)。
-
很抱歉,我不明白为什么您需要遍历 DonorId,您已经拥有前 15 名捐助者,不应该所有的循环都只是加入前 15 名,然后是一个组通过捐赠者 ID 还是我遗漏了什么。
-
我会查看视图和基表。 54个循环并不多。什么在循环中需要时间?您可以索引视图。您有一些静态选择加入,您可以运行一次并放入#temps 并在#temp 上声明一个PK,因为它有助于加入。
-
在这种情况下,最好使用脚本来创建带有一些虚假数据的结构(即链接到 .SQL 或 .BAK 文件)。我想很多人都喜欢挑战,但设置和尝试从零开始重现问题的基础工作非常耗时。我想这可能需要你几个小时,但这场胜利可能会为你节省几十个。
标签: sql-server tsql sql-server-2008-r2 cursor