18 minlesson

EF Core Aggregation Queries

EF Core Aggregation Queries

This presentation walks through the EF Core LINQ queries that power delivery stats, exception reason aggregation, and other analytics endpoints. Each query translates to efficient SQL with GROUP BY and aggregate functions.

Setting Up the Analytics Service

Create a dedicated service for analytics queries. This keeps the aggregation logic separate from the controller and makes it testable.

csharp
1public interface IAnalyticsService
2{
3 Task<DeliveryStatsDto> GetDeliveryStatsAsync(DateTime from, DateTime to);
4 Task<List<ExceptionReasonDto>> GetTopExceptionReasonsAsync(DateTime from, DateTime to);
5 Task<List<ServiceBreakdownDto>> GetServiceBreakdownAsync(DateTime from, DateTime to);
6 Task<List<PipelineStatusDto>> GetPipelineAsync();
7}
csharp
1public class AnalyticsService : IAnalyticsService
2{
3 private readonly ParcelTrackingDbContext _context;
4
5 public AnalyticsService(ParcelTrackingDbContext context)
6 {
7 _context = context;
8 }
9}

Register the service in Program.cs:

csharp
1builder.Services.AddScoped<IAnalyticsService, AnalyticsService>();

Delivery Stats Query

The delivery stats endpoint returns counts of parcels by status within a date range, plus average delivery time and on-time percentage. This requires multiple aggregations over the same filtered dataset.

Filtering by Date Range

Start by building the base query with the date filter:

csharp
1public async Task<DeliveryStatsDto> GetDeliveryStatsAsync(DateTime from, DateTime to)
2{
3 var query = _context.Parcels
4 .Where(p => p.CreatedAt >= from && p.CreatedAt <= to);

This creates an IQueryable that has not executed yet. All subsequent operations build on this filtered set.

Counting by Status

Use conditional counting to get totals for each status category:

csharp
1 var totalParcels = await query.CountAsync();
2
3 var delivered = await query
4 .CountAsync(p => p.Status == ParcelStatus.Delivered);
5
6 var inTransit = await query
7 .CountAsync(p => p.Status == ParcelStatus.InTransit
8 || p.Status == ParcelStatus.OutForDelivery
9 || p.Status == ParcelStatus.PickedUp);
10
11 var exceptions = await query
12 .CountAsync(p => p.Status == ParcelStatus.Exception);

Each CountAsync call generates a separate SQL query. For small-to-medium datasets, this is perfectly fine. For very large datasets, you could combine these into a single GroupBy query.

Single-Query Approach with GroupBy

To get all status counts in one round trip:

csharp
1 var statusCounts = await query
2 .GroupBy(p => p.Status)
3 .Select(g => new
4 {
5 Status = g.Key,
6 Count = g.Count()
7 })
8 .ToListAsync();
9
10 var delivered = statusCounts
11 .FirstOrDefault(s => s.Status == ParcelStatus.Delivered)?.Count ?? 0;
12
13 var inTransit = statusCounts
14 .Where(s => s.Status == ParcelStatus.InTransit
15 || s.Status == ParcelStatus.OutForDelivery
16 || s.Status == ParcelStatus.PickedUp)
17 .Sum(s => s.Count);
18
19 var exceptions = statusCounts
20 .FirstOrDefault(s => s.Status == ParcelStatus.Exception)?.Count ?? 0;

This executes a single SQL GROUP BY query. The in-memory LINQ after ToListAsync() operates on the small result set (at most 7 rows, one per status).

Average Delivery Time

Calculate the average time between parcel creation and actual delivery. Only delivered parcels have an ActualDeliveryDate:

csharp
1 var averageDeliveryTimeHours = await query
2 .Where(p => p.Status == ParcelStatus.Delivered
3 && p.ActualDeliveryDate != null)
4 .AverageAsync(p =>
5 EF.Functions.DateDiffHour(p.CreatedAt, p.ActualDeliveryDate!.Value));

EF.Functions.DateDiffHour translates to the database's date difference function. For PostgreSQL, EF Core translates this using native date arithmetic with EXTRACT(EPOCH FROM ...) expressions.

If no parcels have been delivered in the date range, AverageAsync throws an InvalidOperationException because the sequence is empty. Handle this:

csharp
1 double averageDeliveryTimeHours = 0;
2
3 var deliveredParcels = query
4 .Where(p => p.Status == ParcelStatus.Delivered
5 && p.ActualDeliveryDate != null);
6
7 if (await deliveredParcels.AnyAsync())
8 {
9 averageDeliveryTimeHours = await deliveredParcels
10 .AverageAsync(p =>
11 EF.Functions.DateDiffHour(p.CreatedAt, p.ActualDeliveryDate!.Value));
12 }

On-Time Percentage

A parcel is on time if ActualDeliveryDate <= EstimatedDeliveryDate:

csharp
1 double onTimePercentage = 0;
2
3 if (delivered > 0)
4 {
5 var onTimeCount = await query
6 .CountAsync(p => p.Status == ParcelStatus.Delivered
7 && p.ActualDeliveryDate != null
8 && p.EstimatedDeliveryDate != null
9 && p.ActualDeliveryDate <= p.EstimatedDeliveryDate);
10
11 onTimePercentage = Math.Round((double)onTimeCount / delivered * 100, 1);
12 }

Assembling the Result

Return the assembled DTO:

csharp
1 return new DeliveryStatsDto
2 {
3 From = from,
4 To = to,
5 TotalParcels = totalParcels,
6 Delivered = delivered,
7 InTransit = inTransit,
8 Exceptions = exceptions,
9 AverageDeliveryTimeHours = Math.Round(averageDeliveryTimeHours, 1),
10 OnTimePercentage = onTimePercentage
11 };
12}

Exception Reason Aggregation

This query groups tracking events by their DelayReason to find the most common causes of exceptions.

Why Query TrackingEvents?

Exception reasons live on the TrackingEvent entity, not on the Parcel. When a parcel enters exception status, a tracking event is created with the DelayReason field set. Grouping these events gives us the breakdown.

csharp
1public async Task<List<ExceptionReasonDto>> GetTopExceptionReasonsAsync(
2 DateTime from, DateTime to)
3{
4 var exceptionEvents = _context.TrackingEvents
5 .Where(e => e.EventType == EventType.Exception
6 && e.DelayReason != null
7 && e.Timestamp >= from
8 && e.Timestamp <= to);
9
10 var totalExceptions = await exceptionEvents.CountAsync();
11
12 if (totalExceptions == 0)
13 return new List<ExceptionReasonDto>();
14
15 var reasons = await exceptionEvents
16 .GroupBy(e => e.DelayReason)
17 .Select(g => new ExceptionReasonDto
18 {
19 Reason = g.Key!.ToString(),
20 Count = g.Count(),
21 Percentage = Math.Round((double)g.Count() / totalExceptions * 100, 1)
22 })
23 .OrderByDescending(r => r.Count)
24 .ToListAsync();
25
26 return reasons;
27}

Understanding the GroupBy Projection

The Select after GroupBy is crucial. Each group g has:

  • g.Key - the value being grouped by (the DelayReason enum value)
  • g.Count() - number of items in that group
  • Any other aggregate: g.Average(...), g.Sum(...), g.Min(...), g.Max(...)

EF Core translates this to:

sql
1SELECT DelayReason, COUNT(*) AS Count
2FROM TrackingEvents
3WHERE EventType = 'Exception'
4 AND DelayReason IS NOT NULL
5 AND Timestamp >= @from AND Timestamp <= @to
6GROUP BY DelayReason
7ORDER BY Count DESC

Calculating Percentage Within the Query

Notice the percentage calculation uses totalExceptions, which was computed in a separate query. You cannot reference an outer variable like this inside the EF Core Select for server-side evaluation. The approach above works because EF Core evaluates Math.Round((double)g.Count() / totalExceptions * 100, 1) client-side after fetching the grouped results.

An alternative is to compute the percentage after materialization:

csharp
1 var grouped = await exceptionEvents
2 .GroupBy(e => e.DelayReason)
3 .Select(g => new
4 {
5 Reason = g.Key,
6 Count = g.Count()
7 })
8 .OrderByDescending(g => g.Count)
9 .ToListAsync();
10
11 var reasons = grouped.Select(g => new ExceptionReasonDto
12 {
13 Reason = g.Reason!.ToString(),
14 Count = g.Count,
15 Percentage = Math.Round((double)g.Count / totalExceptions * 100, 1)
16 }).ToList();

This pattern is clearer: the database handles grouping and counting, then C# handles percentage formatting.

Service Type Breakdown

Group parcels by ServiceType and compute the count and average delivery time for each:

csharp
1public async Task<List<ServiceBreakdownDto>> GetServiceBreakdownAsync(
2 DateTime from, DateTime to)
3{
4 var breakdown = await _context.Parcels
5 .Where(p => p.CreatedAt >= from && p.CreatedAt <= to)
6 .GroupBy(p => p.ServiceType)
7 .Select(g => new
8 {
9 ServiceType = g.Key,
10 Count = g.Count(),
11 DeliveredCount = g.Count(p =>
12 p.Status == ParcelStatus.Delivered
13 && p.ActualDeliveryDate != null),
14 TotalDeliveryHours = g
15 .Where(p => p.Status == ParcelStatus.Delivered
16 && p.ActualDeliveryDate != null)
17 .Sum(p =>
18 EF.Functions.DateDiffHour(p.CreatedAt, p.ActualDeliveryDate!.Value))
19 })
20 .ToListAsync();
21
22 return breakdown.Select(b => new ServiceBreakdownDto
23 {
24 ServiceType = b.ServiceType.ToString(),
25 Count = b.Count,
26 AverageDeliveryTimeHours = b.DeliveredCount > 0
27 ? Math.Round((double)b.TotalDeliveryHours / b.DeliveredCount, 1)
28 : 0
29 }).ToList();
30}

Why Not Use Average Directly?

You might expect to use g.Average(...) inside the GroupBy projection. The problem is that Average over a subset of the group (only delivered parcels) requires a Where inside the group, which not all database providers translate cleanly.

The workaround is to compute Sum and Count separately, then divide in C# after materialization. This is a common EF Core pattern for conditional aggregates.

Alternative: Separate Queries Per Service Type

If the group-level Sum and Where combination causes translation issues, you can query each service type independently:

csharp
1var serviceTypes = await _context.Parcels
2 .Where(p => p.CreatedAt >= from && p.CreatedAt <= to)
3 .Select(p => p.ServiceType)
4 .Distinct()
5 .ToListAsync();
6
7var results = new List<ServiceBreakdownDto>();
8
9foreach (var serviceType in serviceTypes)
10{
11 var parcels = _context.Parcels
12 .Where(p => p.CreatedAt >= from && p.CreatedAt <= to
13 && p.ServiceType == serviceType);
14
15 var count = await parcels.CountAsync();
16
17 var delivered = parcels
18 .Where(p => p.Status == ParcelStatus.Delivered
19 && p.ActualDeliveryDate != null);
20
21 double avgHours = 0;
22 if (await delivered.AnyAsync())
23 {
24 avgHours = await delivered.AverageAsync(p =>
25 EF.Functions.DateDiffHour(p.CreatedAt, p.ActualDeliveryDate!.Value));
26 }
27
28 results.Add(new ServiceBreakdownDto
29 {
30 ServiceType = serviceType.ToString(),
31 Count = count,
32 AverageDeliveryTimeHours = Math.Round(avgHours, 1)
33 });
34}
35
36return results;

This trades one complex query for several simple queries. With only four service types, the overhead is negligible.

Handling Empty Results

Every aggregation query must handle the case where the filtered dataset is empty:

  • CountAsync() returns 0 for empty sets (safe)
  • AverageAsync() throws InvalidOperationException on empty sequences
  • SumAsync() returns 0 for empty sets (safe)
  • GroupBy(...).Select(...).ToListAsync() returns an empty list (safe)

Always check with AnyAsync() before calling AverageAsync(), or use a try-catch, or use the nullable overload:

csharp
1// Safe approach: check first
2if (await query.AnyAsync())
3{
4 var avg = await query.AverageAsync(p => p.SomeValue);
5}
6
7// Alternative: cast to nullable and use DefaultIfEmpty
8var avg = await query
9 .Select(p => (double?)p.SomeValue)
10 .AverageAsync() ?? 0;

Performance Considerations

Database Indexes

Aggregation queries benefit from indexes on the columns used in WHERE and GROUP BY clauses. For the analytics endpoints, these columns are important:

csharp
1// In your DbContext OnModelCreating
2modelBuilder.Entity<Parcel>()
3 .HasIndex(p => p.CreatedAt);
4
5modelBuilder.Entity<Parcel>()
6 .HasIndex(p => p.Status);
7
8modelBuilder.Entity<Parcel>()
9 .HasIndex(p => p.ServiceType);
10
11modelBuilder.Entity<TrackingEvent>()
12 .HasIndex(e => new { e.EventType, e.Timestamp });

Avoiding N+1 Queries

The analytics queries in this topic do not use Include() because they do not need navigation properties. They project directly to DTOs with aggregates. This is one of the advantages of aggregation queries: they naturally avoid the N+1 problem because they work with scalar results, not entity graphs.

Let the Database Do the Work

A common mistake is to load all parcels into memory and aggregate in C#:

csharp
1// BAD: loads all parcels into memory
2var allParcels = await _context.Parcels.ToListAsync();
3var delivered = allParcels.Count(p => p.Status == ParcelStatus.Delivered);

Instead, keep the aggregation in the LINQ query so EF Core translates it to SQL:

csharp
1// GOOD: database performs the count
2var delivered = await _context.Parcels
3 .CountAsync(p => p.Status == ParcelStatus.Delivered);

The database engine is optimized for aggregation. It uses indexes, parallelism, and optimized algorithms. Loading rows into your application just to count them wastes memory, network bandwidth, and CPU time.

Key Takeaways

  1. Use GroupBy with Select to project aggregated results into DTOs
  2. g.Key holds the grouping value; g.Count(), g.Sum(), g.Average() compute aggregates
  3. Handle empty sequences before calling AverageAsync() to avoid exceptions
  4. Compute conditional averages using Sum / Count and divide in C# after materialization
  5. Add database indexes on columns used in WHERE and GROUP BY for performance
  6. Keep aggregation in the LINQ query so the database does the work, not your application