【问题标题】:EF6 Aggregation on large data sets大型数据集上的 EF6 聚合
【发布时间】:2017-05-23 15:43:08
【问题描述】:

有两个表,Events 和 Octave:

+---------+-------+
| EventId | Time  |
+---------+-------+

+----------+---------+-----------+-------+
| OctaveId | EventId | Frequency | Value |
+----------+---------+-----------+-------+

平均每个事件有 10 个八度音阶,每 10 秒记录一个事件,现在大约有 40 万个事件和 400 万个八度音阶。 我想过滤特定时间范围内的事件,按小时聚合它们并为每个具有相同频率的八度音程返回平均值。 我正在使用的 EF6 LINQ 代码是:

_context.Events
      .Where(x => x.Time >= afterDate)
      .Where(x => x.Time <= beforeDate)
      .Select(x => new { year = x.Time.Year, month = x.Time.Month, day = x.Time.Day, hour = x.Time.Hour, data = x.Data })
      .GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
      .Where(x => x.Any())
      .Select(x => new
      {
         Time = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
         Data = x.SelectMany(y => y.data).GroupBy(y => new { frequency = y.Frequency }).Select(y => new
         {
            frequency  = y.Key.frequency,
            value = Math.Round(y.Average(z => z.Value), 1),
         })

      })
        .OrderByDescending(m => m.Time)
        .Take(limit);

这有效,但仅在时间跨度非常短(几个小时)时才有效。如果它增加到某些天,查询似乎会永远运行。 我对 SQL Server 的要求太高了吗?或者有没有更好的方法来运行这个查询/构造我的数据? 如果我删除 SelectMany(...).GroupBy(...) 那么它不再是疯狂的慢了。

生成的 SQL 查询是:

SELECT 
    [Project5].[C1] AS [C1], 
    [Project5].[C2] AS [C2], 
    [Project5].[C3] AS [C3], 
    [Project5].[C4] AS [C4], 
    [Project5].[C5] AS [C5], 
    [Project5].[C6] AS [C6], 
    [Project5].[C8] AS [C7], 
    [Project5].[Frequency] AS [Frequency], 
    [Project5].[C7] AS [C8]
    FROM ( SELECT 
        [Limit1].[C1] AS [C1], 
        [Limit1].[C2] AS [C2], 
        [Limit1].[C3] AS [C3], 
        [Limit1].[C4] AS [C4], 
        [Limit1].[C5] AS [C5], 
        [Limit1].[C6] AS [C6], 
        CASE WHEN ([GroupBy1].[K1] IS NULL) THEN CAST(NULL AS float) ELSE ROUND([GroupBy1].[A1], 1) END AS [C7], 
        [GroupBy1].[K1] AS [Frequency], 
        CASE WHEN ([GroupBy1].[K1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C8]
        FROM   (SELECT TOP (10000) [Project4].[C1] AS [C1], [Project4].[C2] AS [C2], [Project4].[C3] AS [C3], [Project4].[C4] AS [C4], [Project4].[C5] AS [C5], [Project4].[C6] AS [C6]
            FROM ( SELECT 
                [Project2].[C1] AS [C1], 
                [Project2].[C2] AS [C2], 
                [Project2].[C3] AS [C3], 
                [Project2].[C4] AS [C4], 
                1 AS [C5], 
                convert (datetime2,right('000' + convert(varchar(255), [Project2].[C1]), 4) + '-' + convert(varchar(255), [Project2].[C2]) + '-' + convert(varchar(255), [Project2].[C3]) + ' ' + convert(varchar(255), [Project2].[C4]) + ':' + convert(varchar(255), 0) + ':' + str(cast(0 as float(53)), 10, 7), 121) AS [C6]
                FROM ( SELECT 
                    [Distinct1].[C1] AS [C1], 
                    [Distinct1].[C2] AS [C2], 
                    [Distinct1].[C3] AS [C3], 
                    [Distinct1].[C4] AS [C4]
                    FROM ( SELECT DISTINCT 
                        DATEPART (year, [Extent1].[TimeEnd]) AS [C1], 
                        DATEPART (month, [Extent1].[TimeEnd]) AS [C2], 
                        DATEPART (day, [Extent1].[TimeEnd]) AS [C3], 
                        DATEPART (hour, [Extent1].[TimeEnd]) AS [C4]
                        FROM [dbo].[Events] AS [Extent1]
                        WHERE ([Extent1].[TimeEnd] >= @p__linq__1) AND ([Extent1].[TimeEnd] <= @p__linq__2)
                    )  AS [Distinct1]
                )  AS [Project2]
                WHERE  EXISTS (SELECT 
                    1 AS [C1]
                    FROM [dbo].[Events] AS [Extent2]
                    WHERE ([Extent2].[TimeEnd] >= @p__linq__1) AND ([Extent2].[TimeEnd] <= @p__linq__2) AND (([Project2].[C1] = (DATEPART (year, [Extent2].[TimeEnd]))) OR (([Project2].[C1] IS NULL) AND (DATEPART (year, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C2] = (DATEPART (month, [Extent2].[TimeEnd]))) OR (([Project2].[C2] IS NULL) AND (DATEPART (month, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C3] = (DATEPART (day, [Extent2].[TimeEnd]))) OR (([Project2].[C3] IS NULL) AND (DATEPART (day, [Extent2].[TimeEnd]) IS NULL))) AND (([Project2].[C4] = (DATEPART (hour, [Extent2].[TimeEnd]))) OR (([Project2].[C4] IS NULL) AND (DATEPART (hour, [Extent2].[TimeEnd]) IS NULL)))
                )
            )  AS [Project4]
            ORDER BY [Project4].[C6] DESC ) AS [Limit1]
        OUTER APPLY  (SELECT 
            [Extent4].[Frequency] AS [K1], 
            AVG([Extent4].[Value]) AS [A1]
            FROM  [dbo].[Events] AS [Extent3]
            INNER JOIN [dbo].[Octaves] AS [Extent4] ON [Extent3].[EventId] = [Extent4].[EventId]
            WHERE ([Extent3].[TimeEnd] >= @p__linq__1) AND ([Extent3].[TimeEnd] <= @p__linq__2) AND (([Limit1].[C1] = (DATEPART (year, [Extent3].[TimeEnd]))) OR (([Limit1].[C1] IS NULL) AND (DATEPART (year, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C2] = (DATEPART (month, [Extent3].[TimeEnd]))) OR (([Limit1].[C2] IS NULL) AND (DATEPART (month, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C3] = (DATEPART (day, [Extent3].[TimeEnd]))) OR (([Limit1].[C3] IS NULL) AND (DATEPART (day, [Extent3].[TimeEnd]) IS NULL))) AND (([Limit1].[C4] = (DATEPART (hour, [Extent3].[TimeEnd]))) OR (([Limit1].[C4] IS NULL) AND (DATEPART (hour, [Extent3].[TimeEnd]) IS NULL)))
            GROUP BY [Extent4].[Frequency] ) AS [GroupBy1]
    )  AS [Project5]
    ORDER BY [Project5].[C6] DESC, [Project5].[C1] ASC, [Project5].[C2] ASC, [Project5].[C3] ASC, [Project5].[C4] ASC, [Project5].[C8] ASC

更新 1

我试图通过直接查询八度音阶来“翻转”查询,并且我得到了更好的结果。我首先按日期和频率对它们进行分组,计算平均值,然后再按时间对它们进行分组。它一点也不优雅,但它是第一个实际工作的解决方案。如果分组方式不同(例如,首先按时间,然后按频率,然后平均),它仍然不起作用。

 _context.Octaves
.Where(x => x.Event.Time >= afterDate)
.Where(x => x.Event.Time <= beforeDate)
.GroupBy(x => new { year = x.Event.Time.Year, month = x.Event.Time.Month, day = x.Event.Time.Day, hour = x.Event.Time.Hour, freq = x.Frequency })
.Select(x => new
{
  year = x.Key.year,
  month = x.Key.month,
  day = x.Key.day,
  hour = x.Key.hour,
  freq = x.Key.freq,
  value = Math.Round(x.Average(y => y.Value), 1)

})
.GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
.Select(x => new
{
  timeEnd = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
  data = x.Select(y=> new {freq = y.freq, value = y.value })

})
.OrderByDescending(m => m.timeEnd)
.Take(limit)

【问题讨论】:

  • 是否有适当的索引?您是否考虑过将每小时汇总的数据存储在单独的表中?这会是一个选择吗?
  • EventId、Octaves.EventId、Octaves.OctaveId 和 Octaves.Frequency 上有非聚集索引。我曾想过将聚合数据存储在另一个表上,但希望没有必要。谢谢
  • 尝试在您的表上创建一个表示日期+小时的计算列,然后对该列进行索引。在 EF 查询中按该列分组,它应该会快很多。

标签: c# linq entity-framework-6


【解决方案1】:

我不确定,但您可能想试试这个。我不确定可能会更糟。

_context.Events.AsNoTracking()
  .Where(x => x.Time >= afterDate &&  x.Time <= beforeDate)
.GroupBy(x => new { year = x.year, month = x.month, day = x.day, hour = x.hour })
.Select(x => new
               {Time = DbFunctions.CreateDateTime(x.Key.year, x.Key.month, x.Key.day, x.Key.hour, 0, 0),
                   Data = x.SelectMany
                   (y => 
                        y.Select(h => 
                        h.data.GroupBy(y => y.Frequency).select(y => 
                                new {
                                        frequency = y.Key,
                                        value = Math.Round(y.Average(z => z.Value), 1)
                                    }
 ))))
    .OrderByDescending(m => m.Time)
    .Take(limit);

【讨论】:

    猜你喜欢
    • 2020-11-28
    • 1970-01-01
    • 2018-09-30
    • 2018-03-07
    • 1970-01-01
    • 1970-01-01
    • 2011-07-25
    • 2020-02-13
    • 1970-01-01
    相关资源
    最近更新 更多