EF Core Aggregation Queries
This presentation walks through the EF Core LINQ queries that power delivery stats, exception reason aggregation, and other analytics endpoints. Each query translates to efficient SQL with GROUP BY and aggregate functions.
Setting Up the Analytics Service
Create a dedicated service for analytics queries. This keeps the aggregation logic separate from the controller and makes it testable.
csharp1public interface IAnalyticsService2{3 Task<DeliveryStatsDto> GetDeliveryStatsAsync(DateTime from, DateTime to);4 Task<List<ExceptionReasonDto>> GetTopExceptionReasonsAsync(DateTime from, DateTime to);5 Task<List<ServiceBreakdownDto>> GetServiceBreakdownAsync(DateTime from, DateTime to);6 Task<List<PipelineStatusDto>> GetPipelineAsync();7}
csharp1public class AnalyticsService : IAnalyticsService2{3 private readonly ParcelTrackingDbContext _context;45 public AnalyticsService(ParcelTrackingDbContext context)6 {7 _context = context;8 }9}
Register the service in Program.cs:
csharp1builder.Services.AddScoped<IAnalyticsService, AnalyticsService>();
Delivery Stats Query
The delivery stats endpoint returns counts of parcels by status within a date range, plus average delivery time and on-time percentage. This requires multiple aggregations over the same filtered dataset.
Filtering by Date Range
Start by building the base query with the date filter:
csharp1public async Task<DeliveryStatsDto> GetDeliveryStatsAsync(DateTime from, DateTime to)2{3 var query = _context.Parcels4 .Where(p => p.CreatedAt >= from && p.CreatedAt <= to);
This creates an IQueryable that has not executed yet. All subsequent operations build on this filtered set.
Counting by Status
Use conditional counting to get totals for each status category:
csharp1 var totalParcels = await query.CountAsync();23 var delivered = await query4 .CountAsync(p => p.Status == ParcelStatus.Delivered);56 var inTransit = await query7 .CountAsync(p => p.Status == ParcelStatus.InTransit8 || p.Status == ParcelStatus.OutForDelivery9 || p.Status == ParcelStatus.PickedUp);1011 var exceptions = await query12 .CountAsync(p => p.Status == ParcelStatus.Exception);
Each CountAsync call generates a separate SQL query. For small-to-medium datasets, this is perfectly fine. For very large datasets, you could combine these into a single GroupBy query.
Single-Query Approach with GroupBy
To get all status counts in one round trip:
csharp1 var statusCounts = await query2 .GroupBy(p => p.Status)3 .Select(g => new4 {5 Status = g.Key,6 Count = g.Count()7 })8 .ToListAsync();910 var delivered = statusCounts11 .FirstOrDefault(s => s.Status == ParcelStatus.Delivered)?.Count ?? 0;1213 var inTransit = statusCounts14 .Where(s => s.Status == ParcelStatus.InTransit15 || s.Status == ParcelStatus.OutForDelivery16 || s.Status == ParcelStatus.PickedUp)17 .Sum(s => s.Count);1819 var exceptions = statusCounts20 .FirstOrDefault(s => s.Status == ParcelStatus.Exception)?.Count ?? 0;
This executes a single SQL GROUP BY query. The in-memory LINQ after ToListAsync() operates on the small result set (at most 7 rows, one per status).
Average Delivery Time
Calculate the average time between parcel creation and actual delivery. Only delivered parcels have an ActualDeliveryDate:
csharp1 var averageDeliveryTimeHours = await query2 .Where(p => p.Status == ParcelStatus.Delivered3 && p.ActualDeliveryDate != null)4 .AverageAsync(p =>5 EF.Functions.DateDiffHour(p.CreatedAt, p.ActualDeliveryDate!.Value));
EF.Functions.DateDiffHour translates to the database's date difference function. For PostgreSQL, EF Core translates this using native date arithmetic with EXTRACT(EPOCH FROM ...) expressions.
If no parcels have been delivered in the date range, AverageAsync throws an InvalidOperationException because the sequence is empty. Handle this:
csharp1 double averageDeliveryTimeHours = 0;23 var deliveredParcels = query4 .Where(p => p.Status == ParcelStatus.Delivered5 && p.ActualDeliveryDate != null);67 if (await deliveredParcels.AnyAsync())8 {9 averageDeliveryTimeHours = await deliveredParcels10 .AverageAsync(p =>11 EF.Functions.DateDiffHour(p.CreatedAt, p.ActualDeliveryDate!.Value));12 }
On-Time Percentage
A parcel is on time if ActualDeliveryDate <= EstimatedDeliveryDate:
csharp1 double onTimePercentage = 0;23 if (delivered > 0)4 {5 var onTimeCount = await query6 .CountAsync(p => p.Status == ParcelStatus.Delivered7 && p.ActualDeliveryDate != null8 && p.EstimatedDeliveryDate != null9 && p.ActualDeliveryDate <= p.EstimatedDeliveryDate);1011 onTimePercentage = Math.Round((double)onTimeCount / delivered * 100, 1);12 }
Assembling the Result
Return the assembled DTO:
csharp1 return new DeliveryStatsDto2 {3 From = from,4 To = to,5 TotalParcels = totalParcels,6 Delivered = delivered,7 InTransit = inTransit,8 Exceptions = exceptions,9 AverageDeliveryTimeHours = Math.Round(averageDeliveryTimeHours, 1),10 OnTimePercentage = onTimePercentage11 };12}
Exception Reason Aggregation
This query groups tracking events by their DelayReason to find the most common causes of exceptions.
Why Query TrackingEvents?
Exception reasons live on the TrackingEvent entity, not on the Parcel. When a parcel enters exception status, a tracking event is created with the DelayReason field set. Grouping these events gives us the breakdown.
csharp1public async Task<List<ExceptionReasonDto>> GetTopExceptionReasonsAsync(2 DateTime from, DateTime to)3{4 var exceptionEvents = _context.TrackingEvents5 .Where(e => e.EventType == EventType.Exception6 && e.DelayReason != null7 && e.Timestamp >= from8 && e.Timestamp <= to);910 var totalExceptions = await exceptionEvents.CountAsync();1112 if (totalExceptions == 0)13 return new List<ExceptionReasonDto>();1415 var reasons = await exceptionEvents16 .GroupBy(e => e.DelayReason)17 .Select(g => new ExceptionReasonDto18 {19 Reason = g.Key!.ToString(),20 Count = g.Count(),21 Percentage = Math.Round((double)g.Count() / totalExceptions * 100, 1)22 })23 .OrderByDescending(r => r.Count)24 .ToListAsync();2526 return reasons;27}
Understanding the GroupBy Projection
The Select after GroupBy is crucial. Each group g has:
g.Key- the value being grouped by (theDelayReasonenum value)g.Count()- number of items in that group- Any other aggregate:
g.Average(...),g.Sum(...),g.Min(...),g.Max(...)
EF Core translates this to:
sql1SELECT DelayReason, COUNT(*) AS Count2FROM TrackingEvents3WHERE EventType = 'Exception'4 AND DelayReason IS NOT NULL5 AND Timestamp >= @from AND Timestamp <= @to6GROUP BY DelayReason7ORDER BY Count DESC
Calculating Percentage Within the Query
Notice the percentage calculation uses totalExceptions, which was computed in a separate query. You cannot reference an outer variable like this inside the EF Core Select for server-side evaluation. The approach above works because EF Core evaluates Math.Round((double)g.Count() / totalExceptions * 100, 1) client-side after fetching the grouped results.
An alternative is to compute the percentage after materialization:
csharp1 var grouped = await exceptionEvents2 .GroupBy(e => e.DelayReason)3 .Select(g => new4 {5 Reason = g.Key,6 Count = g.Count()7 })8 .OrderByDescending(g => g.Count)9 .ToListAsync();1011 var reasons = grouped.Select(g => new ExceptionReasonDto12 {13 Reason = g.Reason!.ToString(),14 Count = g.Count,15 Percentage = Math.Round((double)g.Count / totalExceptions * 100, 1)16 }).ToList();
This pattern is clearer: the database handles grouping and counting, then C# handles percentage formatting.
Service Type Breakdown
Group parcels by ServiceType and compute the count and average delivery time for each:
csharp1public async Task<List<ServiceBreakdownDto>> GetServiceBreakdownAsync(2 DateTime from, DateTime to)3{4 var breakdown = await _context.Parcels5 .Where(p => p.CreatedAt >= from && p.CreatedAt <= to)6 .GroupBy(p => p.ServiceType)7 .Select(g => new8 {9 ServiceType = g.Key,10 Count = g.Count(),11 DeliveredCount = g.Count(p =>12 p.Status == ParcelStatus.Delivered13 && p.ActualDeliveryDate != null),14 TotalDeliveryHours = g15 .Where(p => p.Status == ParcelStatus.Delivered16 && p.ActualDeliveryDate != null)17 .Sum(p =>18 EF.Functions.DateDiffHour(p.CreatedAt, p.ActualDeliveryDate!.Value))19 })20 .ToListAsync();2122 return breakdown.Select(b => new ServiceBreakdownDto23 {24 ServiceType = b.ServiceType.ToString(),25 Count = b.Count,26 AverageDeliveryTimeHours = b.DeliveredCount > 027 ? Math.Round((double)b.TotalDeliveryHours / b.DeliveredCount, 1)28 : 029 }).ToList();30}
Why Not Use Average Directly?
You might expect to use g.Average(...) inside the GroupBy projection. The problem is that Average over a subset of the group (only delivered parcels) requires a Where inside the group, which not all database providers translate cleanly.
The workaround is to compute Sum and Count separately, then divide in C# after materialization. This is a common EF Core pattern for conditional aggregates.
Alternative: Separate Queries Per Service Type
If the group-level Sum and Where combination causes translation issues, you can query each service type independently:
csharp1var serviceTypes = await _context.Parcels2 .Where(p => p.CreatedAt >= from && p.CreatedAt <= to)3 .Select(p => p.ServiceType)4 .Distinct()5 .ToListAsync();67var results = new List<ServiceBreakdownDto>();89foreach (var serviceType in serviceTypes)10{11 var parcels = _context.Parcels12 .Where(p => p.CreatedAt >= from && p.CreatedAt <= to13 && p.ServiceType == serviceType);1415 var count = await parcels.CountAsync();1617 var delivered = parcels18 .Where(p => p.Status == ParcelStatus.Delivered19 && p.ActualDeliveryDate != null);2021 double avgHours = 0;22 if (await delivered.AnyAsync())23 {24 avgHours = await delivered.AverageAsync(p =>25 EF.Functions.DateDiffHour(p.CreatedAt, p.ActualDeliveryDate!.Value));26 }2728 results.Add(new ServiceBreakdownDto29 {30 ServiceType = serviceType.ToString(),31 Count = count,32 AverageDeliveryTimeHours = Math.Round(avgHours, 1)33 });34}3536return results;
This trades one complex query for several simple queries. With only four service types, the overhead is negligible.
Handling Empty Results
Every aggregation query must handle the case where the filtered dataset is empty:
CountAsync()returns 0 for empty sets (safe)AverageAsync()throwsInvalidOperationExceptionon empty sequencesSumAsync()returns 0 for empty sets (safe)GroupBy(...).Select(...).ToListAsync()returns an empty list (safe)
Always check with AnyAsync() before calling AverageAsync(), or use a try-catch, or use the nullable overload:
csharp1// Safe approach: check first2if (await query.AnyAsync())3{4 var avg = await query.AverageAsync(p => p.SomeValue);5}67// Alternative: cast to nullable and use DefaultIfEmpty8var avg = await query9 .Select(p => (double?)p.SomeValue)10 .AverageAsync() ?? 0;
Performance Considerations
Database Indexes
Aggregation queries benefit from indexes on the columns used in WHERE and GROUP BY clauses. For the analytics endpoints, these columns are important:
csharp1// In your DbContext OnModelCreating2modelBuilder.Entity<Parcel>()3 .HasIndex(p => p.CreatedAt);45modelBuilder.Entity<Parcel>()6 .HasIndex(p => p.Status);78modelBuilder.Entity<Parcel>()9 .HasIndex(p => p.ServiceType);1011modelBuilder.Entity<TrackingEvent>()12 .HasIndex(e => new { e.EventType, e.Timestamp });
Avoiding N+1 Queries
The analytics queries in this topic do not use Include() because they do not need navigation properties. They project directly to DTOs with aggregates. This is one of the advantages of aggregation queries: they naturally avoid the N+1 problem because they work with scalar results, not entity graphs.
Let the Database Do the Work
A common mistake is to load all parcels into memory and aggregate in C#:
csharp1// BAD: loads all parcels into memory2var allParcels = await _context.Parcels.ToListAsync();3var delivered = allParcels.Count(p => p.Status == ParcelStatus.Delivered);
Instead, keep the aggregation in the LINQ query so EF Core translates it to SQL:
csharp1// GOOD: database performs the count2var delivered = await _context.Parcels3 .CountAsync(p => p.Status == ParcelStatus.Delivered);
The database engine is optimized for aggregation. It uses indexes, parallelism, and optimized algorithms. Loading rows into your application just to count them wastes memory, network bandwidth, and CPU time.
Key Takeaways
- Use
GroupBywithSelectto project aggregated results into DTOs g.Keyholds the grouping value;g.Count(),g.Sum(),g.Average()compute aggregates- Handle empty sequences before calling
AverageAsync()to avoid exceptions - Compute conditional averages using
Sum/Countand divide in C# after materialization - Add database indexes on columns used in
WHEREandGROUP BYfor performance - Keep aggregation in the LINQ query so the database does the work, not your application