【发布时间】:2016-05-28 11:43:30
【问题描述】:
我有一个关于实体框架查询执行性能的问题。
架构:
我有一个这样的表结构:
CREATE TABLE [dbo].[DataLogger]
(
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[ProjectID] [bigint] NULL,
CONSTRAINT [PrimaryKey1] PRIMARY KEY CLUSTERED ( [ID] ASC )
)
CREATE TABLE [dbo].[DCDistributionBox]
(
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[DataLoggerID] [bigint] NOT NULL,
CONSTRAINT [PrimaryKey2] PRIMARY KEY CLUSTERED ( [ID] ASC )
)
ALTER TABLE [dbo].[DCDistributionBox]
ADD CONSTRAINT [FK_DCDistributionBox_DataLogger]
FOREIGN KEY([DataLoggerID]) REFERENCES [dbo].[DataLogger] ([ID])
CREATE TABLE [dbo].[DCString]
(
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[DCDistributionBoxID] [bigint] NOT NULL,
[CurrentMPP] [decimal](18, 2) NULL,
CONSTRAINT [PrimaryKey3] PRIMARY KEY CLUSTERED ( [ID] ASC )
)
ALTER TABLE [dbo].[DCString]
ADD CONSTRAINT [FK_DCString_DCDistributionBox]
FOREIGN KEY([DCDistributionBoxID]) REFERENCES [dbo].[DCDistributionBox] ([ID])
CREATE TABLE [dbo].[StringData]
(
[DCStringID] [bigint] NOT NULL,
[TimeStamp] [datetime] NOT NULL,
[DCCurrent] [decimal](18, 2) NULL,
CONSTRAINT [PrimaryKey4] PRIMARY KEY CLUSTERED ( [TimeStamp] DESC, [DCStringID] ASC)
)
CREATE NONCLUSTERED INDEX [TimeStamp_DCCurrent-NonClusteredIndex]
ON [dbo].[StringData] ([DCStringID] ASC, [TimeStamp] ASC)
INCLUDE ([DCCurrent])
外键上的标准索引也存在(出于篇幅原因,我不想全部列出)。
[StringData] 表具有以下存储统计信息:
- 数据空间:26,901.86 MB
- 行数:131,827,749
- 已分区:true
- 分区数:62
用法:
我现在想对[StringData] 表中的数据进行分组并进行一些聚合。
我创建了一个实体框架查询(查询的详细信息可以在here找到):
var compareData = model.StringDatas
.AsNoTracking()
.Where(p => p.DCString.DCDistributionBox.DataLogger.ProjectID == projectID && p.TimeStamp >= fromDate && p.TimeStamp < tillDate)
.Select(d => new
{
TimeStamp = d.TimeStamp,
DCCurrentMpp = d.DCCurrent / d.DCString.CurrentMPP
})
.GroupBy(d => DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, d.TimeStamp) / minuteInterval * minuteInterval))
.Select(d => new
{
TimeStamp = d.Key,
DCCurrentMppMin = d.Min(v => v.DCCurrentMpp),
DCCurrentMppMax = d.Max(v => v.DCCurrentMpp),
DCCurrentMppAvg = d.Average(v => v.DCCurrentMpp),
DCCurrentMppStDev = DbFunctions.StandardDeviationP(d.Select(v => v.DCCurrentMpp))
})
.ToList();
执行时间特别长!?
- 执行结果:92行
- 执行时间:~16000ms
尝试:
我现在查看了 Entity Framework 生成的 SQL 查询,看起来像这样:
DECLARE @p__linq__4 DATETIME = 0;
DECLARE @p__linq__3 DATETIME = 0;
DECLARE @p__linq__5 INT = 15;
DECLARE @p__linq__6 INT = 15;
DECLARE @p__linq__0 BIGINT = 20827;
DECLARE @p__linq__1 DATETIME = '06.02.2016 00:00:00';
DECLARE @p__linq__2 DATETIME = '07.02.2016 00:00:00';
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [C2],
[GroupBy1].[A1] AS [C3],
[GroupBy1].[A2] AS [C4],
[GroupBy1].[A3] AS [C5],
[GroupBy1].[A4] AS [C6]
FROM ( SELECT
[Project1].[K1] AS [K1],
MIN([Project1].[A1]) AS [A1],
MAX([Project1].[A2]) AS [A2],
AVG([Project1].[A3]) AS [A3],
STDEVP([Project1].[A4]) AS [A4]
FROM ( SELECT
DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) AS [K1],
[Project1].[C1] AS [A1],
[Project1].[C1] AS [A2],
[Project1].[C1] AS [A3],
[Project1].[C1] AS [A4]
FROM ( SELECT
[Extent1].[TimeStamp] AS [TimeStamp],
[Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
FROM [dbo].[StringData] AS [Extent1]
INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
WHERE (([Extent4].[ProjectID] = @p__linq__0) OR (([Extent4].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
) AS [Project1]
) AS [Project1]
GROUP BY [K1]
) AS [GroupBy1]
我将此 SQL 查询复制到同一台机器上的 SSMS 中,使用与实体框架相同的连接字符串连接。
结果是性能大大提高:
- 执行结果:92行
- 执行时间:517ms
我也做了一些循环运行测试,结果很奇怪。测试是这样的
for (int i = 0; i < 50; i++)
{
DateTime begin = DateTime.UtcNow;
[...query...]
TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
}
结果非常不同,看起来很随机(?):
0th run: 00:00:11.0618580
1th run: 00:00:11.3339467
2th run: 00:00:10.0000676
3th run: 00:00:10.1508140
4th run: 00:00:09.2041939
5th run: 00:00:07.6710321
6th run: 00:00:10.3386312
7th run: 00:00:17.3422765
8th run: 00:00:13.8620557
9th run: 00:00:14.9041528
10th run: 00:00:12.7772906
11th run: 00:00:17.0170235
12th run: 00:00:14.7773750
问题:
为什么 Entity Framework 查询执行这么慢?结果行数非常低,原始 SQL 查询显示出非常快的性能。
更新 1:
我注意这不是 MetaContext 或模型创建延迟。之前在同一个模型实例上执行了一些其他查询,性能良好。
更新2(与@x0007me的回答有关):
感谢您的提示,但可以通过更改模型设置来消除这种情况:
modelContext.Configuration.UseDatabaseNullSemantics = true;
EF 生成的 SQL 现在是:
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [C2],
[GroupBy1].[A1] AS [C3],
[GroupBy1].[A2] AS [C4],
[GroupBy1].[A3] AS [C5],
[GroupBy1].[A4] AS [C6]
FROM ( SELECT
[Project1].[K1] AS [K1],
MIN([Project1].[A1]) AS [A1],
MAX([Project1].[A2]) AS [A2],
AVG([Project1].[A3]) AS [A3],
STDEVP([Project1].[A4]) AS [A4]
FROM ( SELECT
DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) AS [K1],
[Project1].[C1] AS [A1],
[Project1].[C1] AS [A2],
[Project1].[C1] AS [A3],
[Project1].[C1] AS [A4]
FROM ( SELECT
[Extent1].[TimeStamp] AS [TimeStamp],
[Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
FROM [dbo].[StringData] AS [Extent1]
INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
) AS [Project1]
) AS [Project1]
GROUP BY [K1]
) AS [GroupBy1]
所以你可以看到你描述的问题现在已经解决了,但是执行时间没有改变。
另外,正如您在架构和原始执行时间中看到的那样,我使用了优化结构和高度优化的索引器。
Update 3(与@Vladimir Baranov 的回答有关):
我不明白为什么这与查询计划缓存有关。因为在 MSDN 中明确描述了 EF6 使用查询计划缓存。
一个简单的测试证明,巨大的执行时间差异与查询计划缓存无关(伪代码):
using(var modelContext = new ModelContext())
{
modelContext.Query(); //1th run activates caching
modelContext.Query(); //2th used cached plan
}
因此,两个查询都以相同的执行时间运行。
更新4(与@bubi的回答有关):
我尝试按照您的描述运行由 EF 生成的查询:
int result = model.Database.ExecuteSqlCommand(@"SELECT
1 AS [C1],
[GroupBy1].[K1] AS [C2],
[GroupBy1].[A1] AS [C3],
[GroupBy1].[A2] AS [C4],
[GroupBy1].[A3] AS [C5],
[GroupBy1].[A4] AS [C6]
FROM ( SELECT
[Project1].[K1] AS [K1],
MIN([Project1].[A1]) AS [A1],
MAX([Project1].[A2]) AS [A2],
AVG([Project1].[A3]) AS [A3],
STDEVP([Project1].[A4]) AS [A4]
FROM ( SELECT
DATEADD (minute, ((DATEDIFF (minute, 0, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, 0) AS [K1],
[Project1].[C1] AS [A1],
[Project1].[C1] AS [A2],
[Project1].[C1] AS [A3],
[Project1].[C1] AS [A4]
FROM ( SELECT
[Extent1].[TimeStamp] AS [TimeStamp],
[Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
FROM [dbo].[StringData] AS [Extent1]
INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
) AS [Project1]
) AS [Project1]
GROUP BY [K1]
) AS [GroupBy1]",
new SqlParameter("p__linq__0", 20827),
new SqlParameter("p__linq__1", fromDate),
new SqlParameter("p__linq__2", tillDate),
new SqlParameter("p__linq__5", 15),
new SqlParameter("p__linq__6", 15));
- 执行结果:92
- 执行时间:~16000ms
与普通的 EF 查询一样长!?
更新5(与@vittore的回答有关):
我创建了一个跟踪调用树,也许它会有所帮助:
更新6(与@usr的回答有关):
我通过 SQL Server Profiler 创建了两个 showplan XML。
Update 7(与@VladimirBaranov 的cmets 相关):
我现在运行一些与您的 cmets 相关的测试用例。
首先,我通过使用一个新的计算列和一个匹配的 INDEXER 来消除订单操作的时间。这减少了与DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [TimeStamp] ) / 15* 15, 0) 相关的性能滞后。详细了解如何以及为什么可以找到here。
结果如下:
纯EntityFramework查询:
for (int i = 0; i < 3; i++)
{
DateTime begin = DateTime.UtcNow;
var result = model.StringDatas
.AsNoTracking()
.Where(p => p.DCString.DCDistributionBox.DataLogger.ProjectID == projectID && p.TimeStamp15Minutes >= fromDate && p.TimeStamp15Minutes < tillDate)
.Select(d => new
{
TimeStamp = d.TimeStamp15Minutes,
DCCurrentMpp = d.DCCurrent / d.DCString.CurrentMPP
})
.GroupBy(d => d.TimeStamp)
.Select(d => new
{
TimeStamp = d.Key,
DCCurrentMppMin = d.Min(v => v.DCCurrentMpp),
DCCurrentMppMax = d.Max(v => v.DCCurrentMpp),
DCCurrentMppAvg = d.Average(v => v.DCCurrentMpp),
DCCurrentMppStDev = DbFunctions.StandardDeviationP(d.Select(v => v.DCCurrentMpp))
})
.ToList();
TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
Debug.WriteLine("{0}th run pure EF: {1}", i, excecutionTimeSpan.ToString());
}
第 0 次运行纯 EF:00:00:12.6460624
第 1 次运行纯 EF:00:00:11.0258393
第 2 次运行纯 EF:00:00:08.4171044
我现在使用 EF 生成的 SQL 作为 SQL 查询:
for (int i = 0; i < 3; i++)
{
DateTime begin = DateTime.UtcNow;
int result = model.Database.ExecuteSqlCommand(@"SELECT
1 AS [C1],
[GroupBy1].[K1] AS [TimeStamp15Minutes],
[GroupBy1].[A1] AS [C2],
[GroupBy1].[A2] AS [C3],
[GroupBy1].[A3] AS [C4],
[GroupBy1].[A4] AS [C5]
FROM ( SELECT
[Project1].[TimeStamp15Minutes] AS [K1],
MIN([Project1].[C1]) AS [A1],
MAX([Project1].[C1]) AS [A2],
AVG([Project1].[C1]) AS [A3],
STDEVP([Project1].[C1]) AS [A4]
FROM ( SELECT
[Extent1].[TimeStamp15Minutes] AS [TimeStamp15Minutes],
[Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
FROM [dbo].[StringData] AS [Extent1]
INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp15Minutes] >= @p__linq__1) AND ([Extent1].[TimeStamp15Minutes] < @p__linq__2)
) AS [Project1]
GROUP BY [Project1].[TimeStamp15Minutes]
) AS [GroupBy1];",
new SqlParameter("p__linq__0", 20827),
new SqlParameter("p__linq__1", fromDate),
new SqlParameter("p__linq__2", tillDate));
TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
}
第 0 次运行:00:00:00.8381200
第一次运行:00:00:00.6920736
第二次运行:00:00:00.7081006
和OPTION(RECOMPILE):
for (int i = 0; i < 3; i++)
{
DateTime begin = DateTime.UtcNow;
int result = model.Database.ExecuteSqlCommand(@"SELECT
1 AS [C1],
[GroupBy1].[K1] AS [TimeStamp15Minutes],
[GroupBy1].[A1] AS [C2],
[GroupBy1].[A2] AS [C3],
[GroupBy1].[A3] AS [C4],
[GroupBy1].[A4] AS [C5]
FROM ( SELECT
[Project1].[TimeStamp15Minutes] AS [K1],
MIN([Project1].[C1]) AS [A1],
MAX([Project1].[C1]) AS [A2],
AVG([Project1].[C1]) AS [A3],
STDEVP([Project1].[C1]) AS [A4]
FROM ( SELECT
[Extent1].[TimeStamp15Minutes] AS [TimeStamp15Minutes],
[Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
FROM [dbo].[StringData] AS [Extent1]
INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp15Minutes] >= @p__linq__1) AND ([Extent1].[TimeStamp15Minutes] < @p__linq__2)
) AS [Project1]
GROUP BY [Project1].[TimeStamp15Minutes]
) AS [GroupBy1]
OPTION(RECOMPILE);",
new SqlParameter("p__linq__0", 20827),
new SqlParameter("p__linq__1", fromDate),
new SqlParameter("p__linq__2", tillDate));
TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
}
第 0 次使用 RECOMPILE 运行:00:00:00.8260932
第一次运行 RECOMPILE:00:00:00.9139730
使用 RECOMPILE 进行第二次运行:00:00:01.0680665
在 SSMS 中执行相同的 SQL 查询(没有重新编译):
00:00:01.105
在 SSMS 中执行相同的 SQL 查询(使用 RECOMPILE):
00:00:00.902
我希望这是您需要的所有值。
【问题讨论】:
-
如果您在 EF 中运行查询,然后在 SSMS 中运行,则大部分数据将在第二次缓存中。确保在测量执行速度时考虑到这一点(例如,如果您在 SSMS 版本之后再次运行 EF 版本,是否会回到 16 秒?)
-
Slow in the Application, Fast in SSMS 将解释为什么您会看到这种性能差异。
-
@SteffenMangold,EF 运行的查询有自己的计划。您在 SSMS 中运行的“相同”查询将有自己的计划,这可能会有所不同。如果可能,这两个计划都将被 SQL Server 引擎缓存和重用。文章详细解释了如何处理它。我个人将
OPTION(RECOMPILE)用于此类查询,即具有Dynamic Search Conditions 的查询,例如:([Extent4].[ProjectID] = @p__linq__0) OR (@p__linq__0 IS NULL)。 -
@SteffenMangold,我说的是execution plan in the SQL Server engine。不知道EF执行计划是什么意思。
-
您真的阅读了链接 (sommarskog.se/query-plan-mysteries.html) 吗?那里给出了很多理由。不同的会话设置是此问题的主要原因。这样做:使用 SQL Profiler 和 XML Showplan 事件捕获慢版本和快版本的执行计划。将两个计划都作为文件发布在某处(一些文件托管商)。
标签: c# sql-server entity-framework sql-server-2008 entity-framework-6