18 minlesson

EF Core Aggregation Queries

EF Core Aggregation Queries

This presentation walks through the EF Core LINQ queries that power delivery stats, exception reason aggregation, and other analytics endpoints. Each query translates to efficient SQL with GROUP BY and aggregate functions.

Setting Up the Analytics Service

Create a dedicated service for analytics queries. This keeps the aggregation logic separate from the controller and makes it testable.

csharp
1public interface IAnalyticsService
2{
3 Task<List<ParcelCountByStatusDto>> GetParcelCountByStatusAsync();
4 Task<DeliveryPerformanceDto> GetDeliveryPerformanceAsync(DateTimeOffset from, DateTimeOffset to);
5}
csharp
1public class AnalyticsService : IAnalyticsService
2{
3 private readonly ParcelTrackingDbContext _context;
4
5 public AnalyticsService(ParcelTrackingDbContext context)
6 {
7 _context = context;
8 }
9}

The DTOs for these methods are:

csharp
1public class ParcelCountByStatusDto
2{
3 public string Status { get; set; } = string.Empty;
4 public int Count { get; set; }
5}
6
7public class DeliveryPerformanceDto
8{
9 public DateTimeOffset From { get; set; }
10 public DateTimeOffset To { get; set; }
11 public int TotalParcels { get; set; }
12 public int Delivered { get; set; }
13 public int InTransit { get; set; }
14 public int Exceptions { get; set; }
15 public double AverageDeliveryTimeHours { get; set; }
16 public double OnTimePercentage { get; set; }
17}

Register the service in Program.cs:

csharp
1builder.Services.AddScoped<IAnalyticsService, AnalyticsService>();

Parcel Count By Status Query

The parcel count by status query returns the current count of parcels in each status. This uses a simple GroupBy aggregation.

csharp
1public async Task<List<ParcelCountByStatusDto>> GetParcelCountByStatusAsync()
2{
3 var counts = await _context.Parcels
4 .GroupBy(p => p.Status)
5 .Select(g => new ParcelCountByStatusDto
6 {
7 Status = g.Key.ToString(),
8 Count = g.Count()
9 })
10 .OrderByDescending(p => p.Count)
11 .ToListAsync();
12
13 return counts;
14}

This generates:

sql
1SELECT Status, COUNT(*) AS Count
2FROM Parcels
3GROUP BY Status
4ORDER BY Count DESC

Including All Statuses

To ensure all status values appear in the result (even with zero counts), fill in the gaps after the query:

csharp
1public async Task<List<ParcelCountByStatusDto>> GetParcelCountByStatusAsync()
2{
3 var counts = await _context.Parcels
4 .GroupBy(p => p.Status)
5 .Select(g => new
6 {
7 Status = g.Key,
8 Count = g.Count()
9 })
10 .ToListAsync();
11
12 var allStatuses = Enum.GetValues<ParcelStatus>();
13
14 return allStatuses.Select(status => new ParcelCountByStatusDto
15 {
16 Status = status.ToString(),
17 Count = counts.FirstOrDefault(c => c.Status == status)?.Count ?? 0
18 }).ToList();
19}

Delivery Performance Query

The delivery performance endpoint returns comprehensive statistics about parcels within a date range, including counts, average delivery time, and on-time percentage.

Filtering by Date Range

Start by building the base query with the date filter:

csharp
1public async Task<DeliveryPerformanceDto> GetDeliveryPerformanceAsync(
2 DateTimeOffset from, DateTimeOffset to)
3{
4 var query = _context.Parcels
5 .Where(p => p.CreatedAt >= from && p.CreatedAt <= to);

This creates an IQueryable that has not executed yet. All subsequent operations build on this filtered set.

Counting by Status

Use conditional counting to get totals for each status category:

csharp
1 var totalParcels = await query.CountAsync();
2
3 var delivered = await query
4 .CountAsync(p => p.Status == ParcelStatus.Delivered);
5
6 var inTransit = await query
7 .CountAsync(p => p.Status == ParcelStatus.InTransit
8 || p.Status == ParcelStatus.OutForDelivery
9 || p.Status == ParcelStatus.PickedUp);
10
11 var exceptions = await query
12 .CountAsync(p => p.Status == ParcelStatus.Exception);

Each CountAsync call generates a separate SQL query. For small-to-medium datasets, this is perfectly fine. For very large datasets, you could combine these into a single GroupBy query.

Single-Query Approach with GroupBy

To get all status counts in one round trip:

csharp
1 var statusCounts = await query
2 .GroupBy(p => p.Status)
3 .Select(g => new
4 {
5 Status = g.Key,
6 Count = g.Count()
7 })
8 .ToListAsync();
9
10 var delivered = statusCounts
11 .FirstOrDefault(s => s.Status == ParcelStatus.Delivered)?.Count ?? 0;
12
13 var inTransit = statusCounts
14 .Where(s => s.Status == ParcelStatus.InTransit
15 || s.Status == ParcelStatus.OutForDelivery
16 || s.Status == ParcelStatus.PickedUp)
17 .Sum(s => s.Count);
18
19 var exceptions = statusCounts
20 .FirstOrDefault(s => s.Status == ParcelStatus.Exception)?.Count ?? 0;

This executes a single SQL GROUP BY query. The in-memory LINQ after ToListAsync() operates on the small result set (at most 7 rows, one per status).

Average Delivery Time

Calculate the average time between parcel creation and actual delivery. Only delivered parcels have an ActualDeliveryDate:

csharp
1 var averageDeliveryTimeHours = await query
2 .Where(p => p.Status == ParcelStatus.Delivered
3 && p.ActualDeliveryDate != null)
4 .AverageAsync(p =>
5 EF.Functions.DateDiffHour(p.CreatedAt, p.ActualDeliveryDate!.Value));

EF.Functions.DateDiffHour translates to the database's date difference function. The Npgsql provider translates this to PostgreSQL's native date arithmetic.

If no parcels have been delivered in the date range, AverageAsync throws an InvalidOperationException because the sequence is empty. Handle this:

csharp
1 double averageDeliveryTimeHours = 0;
2
3 var deliveredParcels = query
4 .Where(p => p.Status == ParcelStatus.Delivered
5 && p.ActualDeliveryDate != null);
6
7 if (await deliveredParcels.AnyAsync())
8 {
9 averageDeliveryTimeHours = await deliveredParcels
10 .AverageAsync(p =>
11 EF.Functions.DateDiffHour(p.CreatedAt, p.ActualDeliveryDate!.Value));
12 }

On-Time Percentage

A parcel is on time if ActualDeliveryDate <= EstimatedDeliveryDate:

csharp
1 double onTimePercentage = 0;
2
3 if (delivered > 0)
4 {
5 var onTimeCount = await query
6 .CountAsync(p => p.Status == ParcelStatus.Delivered
7 && p.ActualDeliveryDate != null
8 && p.EstimatedDeliveryDate != null
9 && p.ActualDeliveryDate <= p.EstimatedDeliveryDate);
10
11 onTimePercentage = Math.Round((double)onTimeCount / delivered * 100, 1);
12 }

Assembling the Result

Return the assembled DTO:

csharp
1 return new DeliveryPerformanceDto
2 {
3 From = from,
4 To = to,
5 TotalParcels = totalParcels,
6 Delivered = delivered,
7 InTransit = inTransit,
8 Exceptions = exceptions,
9 AverageDeliveryTimeHours = Math.Round(averageDeliveryTimeHours, 1),
10 OnTimePercentage = onTimePercentage
11 };
12}

Understanding GroupBy Projections

The Select after GroupBy is the key to aggregation queries. Each group g has:

  • g.Key - the value being grouped by (the enum or property value)
  • g.Count() - number of items in that group
  • Any other aggregate: g.Average(...), g.Sum(...), g.Min(...), g.Max(...)

EF Core translates these to efficient SQL with GROUP BY and aggregate functions.

Handling Empty Results

Every aggregation query must handle the case where the filtered dataset is empty:

  • CountAsync() returns 0 for empty sets (safe)
  • AverageAsync() throws InvalidOperationException on empty sequences
  • SumAsync() returns 0 for empty sets (safe)
  • GroupBy(...).Select(...).ToListAsync() returns an empty list (safe)

Always check with AnyAsync() before calling AverageAsync(), or use a try-catch, or use the nullable overload:

csharp
1// Safe approach: check first
2if (await query.AnyAsync())
3{
4 var avg = await query.AverageAsync(p => p.SomeValue);
5}
6
7// Alternative: cast to nullable and use DefaultIfEmpty
8var avg = await query
9 .Select(p => (double?)p.SomeValue)
10 .AverageAsync() ?? 0;

Performance Considerations

Database Indexes

Aggregation queries benefit from indexes on the columns used in WHERE and GROUP BY clauses. For the analytics endpoints, these columns are important:

csharp
1// In your DbContext OnModelCreating
2modelBuilder.Entity<Parcel>()
3 .HasIndex(p => p.CreatedAt);
4
5modelBuilder.Entity<Parcel>()
6 .HasIndex(p => p.Status);
7
8modelBuilder.Entity<Parcel>()
9 .HasIndex(p => p.ServiceType);
10
11modelBuilder.Entity<TrackingEvent>()
12 .HasIndex(e => new { e.EventType, e.Timestamp });

Avoiding N+1 Queries

The analytics queries in this topic do not use Include() because they do not need navigation properties. They project directly to DTOs with aggregates. This is one of the advantages of aggregation queries: they naturally avoid the N+1 problem because they work with scalar results, not entity graphs.

Let the Database Do the Work

A common mistake is to load all parcels into memory and aggregate in C#:

csharp
1// BAD: loads all parcels into memory
2var allParcels = await _context.Parcels.ToListAsync();
3var delivered = allParcels.Count(p => p.Status == ParcelStatus.Delivered);

Instead, keep the aggregation in the LINQ query so EF Core translates it to SQL:

csharp
1// GOOD: database performs the count
2var delivered = await _context.Parcels
3 .CountAsync(p => p.Status == ParcelStatus.Delivered);

The database engine is optimized for aggregation. It uses indexes, parallelism, and optimized algorithms. Loading rows into your application just to count them wastes memory, network bandwidth, and CPU time.

Key Takeaways

  1. Use GroupBy with Select to project aggregated results into DTOs
  2. g.Key holds the grouping value; g.Count(), g.Sum(), g.Average() compute aggregates
  3. Handle empty sequences before calling AverageAsync() to avoid exceptions
  4. Compute conditional averages using Sum / Count and divide in C# after materialization
  5. Add database indexes on columns used in WHERE and GROUP BY for performance
  6. Keep aggregation in the LINQ query so the database does the work, not your application
  7. The IAnalyticsService encapsulates complex aggregation logic and keeps controllers thin