Lessons Learned: Building Cloud-Native Microservices with ASP.NET Core 9 for Our Enterprise Client

Lessons Learned: Building Cloud-Native Microservices with ASP.NET Core 9 for Our Enterprise Client

A Real-World Journey from Requirements to Production

Hey fellow developers and architects! 👋

I wanted to share some insights from a recent project we completed for a major retail client who needed to modernize their monolithic e-commerce platform. They came to us with a classic story: "Our system is slow, hard to scale, and every deployment is a nail-biting experience." Sound familiar?

After six months of intense development, testing, and deployment, we successfully delivered a cloud-native microservices architecture using ASP.NET Core 9, .NET Aspire, and Kubernetes. Here are the key lessons we learned along the way – the good, the challenging, and the "wish we'd known this earlier" moments.

The Client Challenge: Why We Chose ASP.NET Core 9

Our client, a mid-size retail company, was struggling with their legacy .NET Framework application. Peak traffic during sales events would bring their system to its knees, and adding new features required coordinating deployments across multiple teams. They needed something modern, scalable, and maintainable.

After evaluating several options, we landed on ASP.NET Core 9 for several compelling reasons:

  • Performance: The client's existing .NET expertise meant easier team adoption
  • Container-First Approach: Perfect fit for their cloud-native aspirations
  • Microsoft Ecosystem: They were already invested in Azure
  • Long-term Support: .NET 9's roadmap aligned with their strategic plans

But here's the first lesson: choose your stack based on team capabilities, not just technical superiority. We could have gone with Go or Node.js, but the learning curve would have extended our timeline by months.

Lesson #1: .NET Aspire is a Game-Changer (But Mind the Learning Curve)

When we first heard about .NET Aspire, I'll be honest – I was skeptical. "Another framework to learn?" But after our initial spike, it became clear this wasn't just hype.

What Aspire Got Right

Here's how we structured our Aspire host for the client's catalog and inventory services:

var builder = DistributedApplication.CreateBuilder(args);

// Infrastructure components
var postgres = builder.AddPostgres("postgres", port: 5432)
    .WithDatabase("catalogdb")
    .WithDatabase("inventorydb");

var redis = builder.AddRedis("cache");

var serviceBus = builder.AddAzureServiceBus("messaging");

// Our microservices
var catalogApi = builder.AddProject<Projects.CatalogApi>("catalog-api")
    .WithReference(postgres.GetDatabase("catalogdb"))
    .WithReference(redis)
    .WithReference(serviceBus);

var inventoryApi = builder.AddProject<Projects.InventoryApi>("inventory-api")
    .WithReference(postgres.GetDatabase("inventorydb"))
    .WithReference(redis)
    .WithReference(serviceBus);

var webApp = builder.AddProject<Projects.WebApp>("webapp")
    .WithReference(catalogApi)
    .WithReference(inventoryApi)
    .WithExternalHttpEndpoints();

builder.Build().Run();

The Magic: With just this configuration, we got:

  • Automatic service discovery
  • Centralized configuration management
  • Built-in observability with OpenTelemetry
  • A gorgeous dashboard for local development

The Reality Check: Our junior developers initially struggled with the "magic." When things worked, they worked beautifully. When they didn't, debugging required understanding several abstraction layers.

Lesson Learned: Invest time upfront training your team on Aspire's internals. The productivity gains are massive once everyone understands what's happening under the hood.

Lesson #2: Container Optimization Matters More Than You Think

One of our early mistakes was treating containerization as an afterthought. "We'll just add a Dockerfile at the end," we thought. Wrong approach.

The Problem We Hit

Our initial container images were over 1GB, and startup times were exceeding 30 seconds in production. For a retail client expecting sub-second response times, this was unacceptable.

The Solution: Multi-Stage Builds and Alpine Images

Here's the optimized Dockerfile we ended up with:

# Build stage
FROM mcr.microsoft.com/dotnet/sdk:9.0-alpine AS build
WORKDIR /src

# Copy project files and restore dependencies
COPY ["CatalogApi/CatalogApi.csproj", "CatalogApi/"]
COPY ["CatalogApi.ServiceDefaults/CatalogApi.ServiceDefaults.csproj", "CatalogApi.ServiceDefaults/"]
RUN dotnet restore "CatalogApi/CatalogApi.csproj"

# Copy source and build
COPY . .
WORKDIR "/src/CatalogApi"
RUN dotnet build "CatalogApi.csproj" -c Release -o /app/build

# Publish stage
FROM build AS publish
RUN dotnet publish "CatalogApi.csproj" -c Release -o /app/publish /p:UseAppHost=false

# Runtime stage
FROM mcr.microsoft.com/dotnet/aspnet:9.0-alpine AS final
RUN apk add --no-cache icu-libs
ENV DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=false
WORKDIR /app
COPY --from=publish /app/publish .

# Create non-root user
RUN adduser -D -s /bin/sh appuser
USER appuser

EXPOSE 8080
ENTRYPOINT ["dotnet", "CatalogApi.dll"]

Results: Image size dropped to 180MB, startup time reduced to under 5 seconds.

Lesson Learned: Container optimization isn't just about size – it's about security, startup time, and resource utilization. Start optimizing from day one, not as an afterthought.

Lesson #3: Service Communication Strategy is Critical

Initially, we went HTTP-everywhere. Every service talked to every other service via HTTP APIs. This seemed clean and RESTful, but we quickly ran into issues during peak loads.

The HTTP-Only Approach (Our First Attempt)

// This seemed fine initially...
public class OrderService
{
    private readonly HttpClient _inventoryClient;
    private readonly HttpClient _catalogClient;
    private readonly HttpClient _paymentClient;

    public async Task<OrderResult> CreateOrderAsync(CreateOrderRequest request)
    {
        // Check inventory
        var inventory = await _inventoryClient.GetAsync($"api/inventory/{request.ProductId}");
        
        // Get product details
        var product = await _catalogClient.GetAsync($"api/products/{request.ProductId}");
        
        // Process payment
        var payment = await _paymentClient.PostAsync("api/payments", paymentData);
        
        // Create order...
    }
}

The Problem: During Black Friday testing, we discovered cascade failures. One slow service would cause timeouts across the entire chain.

The Hybrid Approach (Our Solution)

We implemented a hybrid communication strategy:

Synchronous (HTTP) for:

  • User-facing operations requiring immediate response
  • Operations needing strong consistency

Asynchronous (Message Bus) for:

  • Business events and notifications
  • Operations that can be eventually consistent
public class OrderService
{
    private readonly IInventoryService _inventoryService;
    private readonly IMessageBus _messageBus;

    public async Task<OrderResult> CreateOrderAsync(CreateOrderRequest request)
    {
        // Sync: Check inventory (needs immediate response)
        var inventoryStatus = await _inventoryService.CheckAvailabilityAsync(request.ProductId);
        
        if (!inventoryStatus.IsAvailable)
            return OrderResult.Failed("Product not available");

        // Create order first
        var order = await CreateOrderRecordAsync(request);

        // Async: Publish events for other services to handle
        await _messageBus.PublishAsync(new OrderCreatedEvent 
        { 
            OrderId = order.Id,
            ProductId = request.ProductId,
            Quantity = request.Quantity,
            CustomerId = request.CustomerId
        });

        return OrderResult.Success(order.Id);
    }
}

Lesson Learned: Don't default to all-sync or all-async. Choose the right communication pattern based on business requirements, not technical preferences.

Lesson #4: Observability Can't Be Bolted On Later

We initially focused on getting features working, planning to "add monitoring later." This was a mistake that cost us significant debugging time during UAT.

What We Should Have Done From Day One

Aspire gives you observability out of the box, but you need to instrument your business logic properly:

public class ProductService
{
    private static readonly ActivitySource ActivitySource = new("CatalogApi.Products");
    private static readonly Counter<int> ProductsCreated = 
        Meter.CreateCounter<int>("products_created_total");
    
    public async Task<Product> CreateProductAsync(CreateProductRequest request)
    {
        using var activity = ActivitySource.StartActivity("CreateProduct");
        activity?.SetTag("product.category", request.Category);
        activity?.SetTag("product.price", request.Price);

        try 
        {
            var product = new Product 
            {
                Name = request.Name,
                Category = request.Category,
                Price = request.Price
            };

            await _context.Products.AddAsync(product);
            await _context.SaveChangesAsync();

            ProductsCreated.Add(1, 
                new KeyValuePair<string, object>("category", request.Category));

            _logger.LogInformation("Product created: {ProductId} in category {Category}", 
                product.Id, product.Category);

            activity?.SetStatus(ActivityStatusCode.Ok);
            return product;
        }
        catch (Exception ex)
        {
            activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
            _logger.LogError(ex, "Failed to create product");
            throw;
        }
    }
}

The Aspire Dashboard Advantage: During development, we could see all our traces, metrics, and logs in one place. No more switching between multiple tools or guessing which service was causing issues.

Lesson Learned: Build observability into your code from the first commit. The Aspire dashboard makes local development debugging incredibly efficient, and the same instrumentation works in production.

Lesson #5: Kubernetes Deployment Complexity is Real

Moving from local Aspire development to Kubernetes production was our biggest challenge. The gap between "works on my machine with Aspire" and "works in Kubernetes" was larger than expected.

The Configuration Management Challenge

In Aspire, configuration "just works":

// This works magically in Aspire
builder.AddNpgsqlDbContext<CatalogDbContext>("catalogdb");

In Kubernetes, you need explicit configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: catalog-api-config
data:
  ConnectionStrings__CatalogDb: "Host=postgres-service;Port=5432;Database=catalogdb;Username=cataloguser"
  Redis__ConnectionString: "redis-service:6379"
  ServiceBus__ConnectionString: "Endpoint=sb://myservicebus.servicebus.windows.net/"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: catalog-api
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: catalog-api
        image: myregistry.azurecr.io/catalog-api:latest
        envFrom:
        - configMapRef:
            name: catalog-api-config
        - secretRef:
            name: catalog-api-secrets
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 10

Health Checks Saved Our Deployment

We learned the hard way that proper health checks are essential:

// In Program.cs
builder.Services.AddHealthChecks()
    .AddCheck("self", () => HealthCheckResult.Healthy())
    .AddNpgSql(connectionString, name: "database")
    .AddRedis(redisConnectionString, name: "redis")
    .AddCheck<InventoryServiceHealthCheck>("inventory-service");

app.MapHealthChecks("/health/live", new HealthCheckOptions
{
    Predicate = check => check.Name == "self"
});

app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready")
});

Lesson Learned: Aspire generates Kubernetes manifests, but you'll still need to understand and customize them. Don't assume the generated manifests are production-ready without review.

Lesson #6: Performance Testing Early Prevents Late Surprises

We thought our microservices would naturally be more performant than the monolith. We were wrong.

The Latency Discovery

During load testing, we discovered that our microservices architecture had introduced significant latency:

  • Monolith: Average response time 200ms
  • Our Microservices: Average response time 800ms

The culprit? Network calls and serialization overhead.

Our Performance Recovery Strategy

  1. Caching Strategy: Implemented Redis caching at multiple levels
public class CachedProductService : IProductService
{
    private readonly IProductService _productService;
    private readonly IDistributedCache _cache;
    private readonly TimeSpan _cacheDuration = TimeSpan.FromMinutes(15);

    public async Task<Product> GetProductAsync(int productId)
    {
        var cacheKey = $"product:{productId}";
        var cachedProduct = await _cache.GetStringAsync(cacheKey);
        
        if (cachedProduct != null)
            return JsonSerializer.Deserialize<Product>(cachedProduct);

        var product = await _productService.GetProductAsync(productId);
        
        if (product != null)
        {
            var serializedProduct = JsonSerializer.Serialize(product);
            await _cache.SetStringAsync(cacheKey, serializedProduct, 
                new DistributedCacheEntryOptions 
                { 
                    AbsoluteExpirationRelativeToNow = _cacheDuration 
                });
        }

        return product;
    }
}
  1. Connection Pooling: Properly configured database connections
builder.Services.AddNpgsqlDbContext<CatalogDbContext>(connectionString, options =>
{
    options.EnableSensitiveDataLogging(false);
    options.EnableServiceProviderCaching();
    options.EnableDetailedErrors(false);
});

// Configure connection pool
builder.Services.Configure<NpgsqlDbContextOptionsBuilder>(options =>
{
    options.EnableRetryOnFailure(maxRetryCount: 3, maxRetryDelay: TimeSpan.FromSeconds(5), null);
});
  1. Async All the Way: Eliminated blocking calls

Final Results: Brought average response time down to 250ms – better than the original monolith.

Lesson Learned: Microservices don't automatically mean better performance. You need to be intentional about latency, caching, and resource utilization.

Lesson #7: The Deployment Pipeline is as Important as the Code

Setting up a reliable CI/CD pipeline took almost as much effort as writing the application code, but it was absolutely worth it.

Our Final Pipeline Strategy

# .github/workflows/deploy-production.yml
name: Production Deployment

on:
  push:
    branches: [main]
    tags: ['v*']

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - name: Setup .NET
      uses: actions/setup-dotnet@v4
      with:
        dotnet-version: '9.0.x'
    
    - name: Run tests
      run: |
        dotnet restore
        dotnet test --configuration Release --logger trx --collect:"XPlat Code Coverage"
    
    - name: Upload test results
      uses: actions/upload-artifact@v3
      with:
        name: test-results
        path: '**/*.trx'

  security-scan:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - name: Run security scan
      uses: securecodewarrior/github-action-add-sarif@v1
      with:
        sarif-file: 'security-scan-results.sarif'

  build-and-deploy:
    needs: [test, security-scan]
    runs-on: ubuntu-latest
    steps:
    - name: Build and push images
      run: |
        docker build -t $REGISTRY/catalog-api:$GITHUB_SHA .
        docker build -t $REGISTRY/inventory-api:$GITHUB_SHA .
        docker push $REGISTRY/catalog-api:$GITHUB_SHA
        docker push $REGISTRY/inventory-api:$GITHUB_SHA
    
    - name: Deploy to staging
      run: |
        kubectl set image deployment/catalog-api catalog-api=$REGISTRY/catalog-api:$GITHUB_SHA
        kubectl rollout status deployment/catalog-api
    
    - name: Run smoke tests
      run: |
        npm install
        npm run test:smoke
    
    - name: Deploy to production
      if: startsWith(github.ref, 'refs/tags/v')
      run: |
        kubectl set image deployment/catalog-api catalog-api=$REGISTRY/catalog-api:$GITHUB_SHA --namespace=production

Lesson Learned: Invest in your deployment pipeline early. Automated testing, security scanning, and staged deployments saved us from multiple production incidents.

What We'd Do Differently Next Time

1. Start with Aspire Service Defaults Earlier

We initially wrote a lot of boilerplate configuration code that Aspire's service defaults would have handled for us:

// Instead of writing all this manually...
builder.Services.AddOpenTelemetry()
    .WithMetrics(...)
    .WithTracing(...);
    
builder.Services.AddServiceDiscovery();
builder.Services.AddHealthChecks();

// Just use this from day one:
builder.AddServiceDefaults();

2. Plan Data Consistency Strategy Upfront

We underestimated the complexity of managing data consistency across microservices. Implementing the saga pattern later was painful.

3. Invest in Contract Testing

API contract changes between services caused integration issues. Tools like Pact would have caught these earlier.

4. Set Resource Limits from the Beginning

We experienced resource contention in Kubernetes because we didn't set proper CPU and memory limits initially.

The Bottom Line: Was It Worth It?

Absolutely. Six months later, our client is seeing:

  • 99.9% uptime (vs. 95% with the monolith)
  • 50% faster feature delivery (independent service deployments)
  • 3x better performance during peak traffic
  • Reduced operational costs (better resource utilization)

But more importantly, their development teams are happier and more productive.

Final Thoughts for Fellow Consultants

Building microservices with ASP.NET Core 9 and .NET Aspire isn't just about the technology – it's about the entire ecosystem:

  1. Team Readiness: Ensure your team understands distributed systems concepts
  2. Client Education: Help clients understand the complexity trade-offs
  3. Incremental Approach: Don't try to rebuild everything at once
  4. Observability First: Build monitoring and logging from day one
  5. Performance Testing: Load test early and often

.NET Aspire genuinely changes the game for building cloud-native applications, but it's not a silver bullet. Success still requires solid architecture decisions, proper testing, and a team that understands distributed systems.

Would I recommend ASP.NET Core 9 and Aspire for similar projects? Absolutely. The developer experience improvements and production-ready defaults make it a compelling choice for teams building modern, cloud-native applications.

What's your experience been with ASP.NET Core 9 or .NET Aspire? I'd love to hear about your lessons learned in the comments!


Want to discuss this project or need help with your own microservices architecture? Feel free to reach out – always happy to chat about .NET, architecture, and lessons learned in the trenches.