Building Production-Ready Spring Boot Microservices: Lessons from Scaling to Millions

posted 5 min read

Introduction

Over the past several years working in e-commerce, I've architected and scaled microservices handling millions of customer requests daily, including platforms processing over $100M in transactions. In this article, I'll share practical patterns for building Spring Boot microservices that can handle production scale.

This isn't theory—these are battle-tested approaches proven under real-world load.

1. Service Boundaries and Domain-Driven Design

The biggest mistake teams make is creating microservices that are too granular or tightly coupled.

Key principle: Each service should own a complete business capability.

// ❌ BAD: Services that are too granular
CustomerNameService
CustomerAddressService

// ✅ GOOD: Services that own complete domains
CustomerProfileService // owns all customer identity/profile data
SubscriptionService    // owns subscriptions, benefits, rewards

When building a subscription platform, a SubscriptionService should own the complete domain: enrollment, benefits tracking, payment integration, and conversions. This prevents the "distributed monolith" where you call 10 services for one operation.

2. Handling Distributed Transactions with Saga Pattern

At scale, distributed transactions become your enemy. Here's the orchestration approach:

@Service
public class SubscriptionEnrollmentOrchestrator {
    
    @Transactional
    public EnrollmentResult enrollMember(EnrollmentRequest request) {
        // Step 1: Validate customer (synchronous)
        CustomerProfile customer = customerClient.getCustomer(request.getCustomerId())
            .orElseThrow(() -> new CustomerNotFoundException());
            
        // Step 2: Process payment (synchronous with compensation)
        PaymentResult payment;
        try {
            payment = paymentClient.processPayment(request.getPaymentMethod());
        } catch (PaymentException e) {
            throw new EnrollmentFailedException("Payment failed", e);
        }
        
        // Step 3: Create subscription (local transaction)
        Subscription subscription;
        try {
            subscription = createSubscription(customer, payment);
            subscriptionRepository.save(subscription);
        } catch (Exception e) {
            // Compensate payment
            paymentClient.refundPayment(payment.getTransactionId());
            throw new EnrollmentFailedException("Subscription creation failed", e);
        }
        
        // Step 4: Publish event for async operations
        eventPublisher.publish(new SubscriptionEnrolledEvent(subscription));
        
        return EnrollmentResult.success(subscription);
    }
}

Key lessons:

  1. Make operations idempotent - Include transaction IDs so retries don't cause duplicates
  2. Compensate explicitly - If step N fails, undo steps 1 through N-1
  3. Separate critical path from eventual consistency - Payment is synchronous, emails are async

3. Strategic Caching for High-Traffic Systems

We reduced database load by 85% through strategic caching:

@Service
public class SubscriptionService {
    
    private final LoadingCache<String, Optional<Subscription>> cache;
    
    public SubscriptionService(SubscriptionRepository repository) {
        this.cache = CacheBuilder.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(5, TimeUnit.MINUTES)
            .recordStats()
            .build(CacheLoader.from(repository::findByCustomerId));
    }
    
    @Transactional
    public Subscription updateSubscription(String id, SubscriptionUpdate update) {
        Subscription updated = repository.findById(id)
            .map(s -> applyUpdate(s, update))
            .orElseThrow(() -> new SubscriptionNotFoundException(id));
            
        repository.save(updated);
        cache.invalidate(updated.getCustomerId()); // Invalidate on write
        
        return updated;
    }
}

Caching layers:

  1. Application-level (Guava/Caffeine) - 5-minute TTL for hot data
  2. Redis distributed cache - 15-minute TTL for cross-instance consistency
  3. Database with proper indexing

4. Circuit Breakers and Resilience

When calling other services, assume they will fail:

@Service
public class CustomerProfileClient {
    
    @CircuitBreaker(name = "customer-profile", fallbackMethod = "getCustomerFallback")
    @TimeLimiter(name = "customer-profile")
    public CompletableFuture<CustomerProfile> getCustomer(String customerId) {
        return CompletableFuture.supplyAsync(() -> 
            restTemplate.getForObject("/api/customers/" + customerId, CustomerProfile.class)
        );
    }
    
    private CompletableFuture<CustomerProfile> getCustomerFallback(String customerId, Exception ex) {
        log.warn("Customer service unavailable, using cached data");
        return CompletableFuture.completedFuture(
            customerCache.get(customerId)
                .orElseThrow(() -> new ServiceUnavailableException("No cached data", ex))
        );
    }
}

Configuration:

CircuitBreakerConfig.custom()
    .failureRateThreshold(50)          // Open if 50% fail
    .waitDurationInOpenState(Duration.ofSeconds(30))
    .slidingWindowSize(100)            // Consider last 100 calls
    .slowCallDurationThreshold(Duration.ofSeconds(2))
    .build();

5. Structured Logging and Distributed Tracing

You can't fix what you can't see:

@Slf4j
@RestController
public class SubscriptionController {
    
    @PostMapping("/subscriptions")
    public ResponseEntity<SubscriptionResponse> enrollMember(
            @RequestBody EnrollmentRequest request,
            @RequestHeader("X-Request-ID") String requestId) {
        
        MDC.put("requestId", requestId);
        MDC.put("customerId", request.getCustomerId());
        
        try {
            log.info("Starting enrollment", 
                kv("tier", request.getTier()),
                kv("paymentMethod", request.getPaymentMethod().getType()));
            
            Instant start = Instant.now();
            EnrollmentResult result = enrollmentService.enroll(request);
            
            log.info("Enrollment completed",
                kv("subscriptionId", result.getSubscriptionId()),
                kv("durationMs", Duration.between(start, Instant.now()).toMillis()));
            
            return ResponseEntity.ok(toResponse(result));
        } finally {
            MDC.clear();
        }
    }
}

Logging standards:

  1. Use structured key-value pairs, not string concatenation
  2. Track request IDs across service boundaries
  3. Include customer ID and operation type in every log

6. Database Query Optimization

Slow queries kill scalability:

@Entity
@Table(name = "subscriptions", indexes = {
    @Index(name = "idx_customer_id", columnList = "customer_id"),
    @Index(name = "idx_tier_status", columnList = "tier, status"),
    @Index(name = "idx_expires_at", columnList = "expires_at")
})
public class Subscription {
    // Use @BatchSize to prevent N+1 queries
    @OneToMany(mappedBy = "subscription", fetch = FetchType.LAZY)
    @BatchSize(size = 50)
    private List<Benefit> benefits;
}

@Repository
public interface SubscriptionRepository extends JpaRepository<Subscription, String> {
    
    // Custom query with fetch join to avoid N+1
    @Query("SELECT s FROM Subscription s LEFT JOIN FETCH s.benefits WHERE s.customerId = :customerId")
    Optional<Subscription> findByCustomerIdWithBenefits(@Param("customerId") String customerId);
}

Performance lessons:

  1. Index everything you query by
  2. Use batch fetching to prevent N+1 queries
  3. Paginate large results
  4. Monitor and alert on queries >100ms

7. Connection Pool Tuning

spring:
  datasource:
    hikari:
      # Formula: (core_count * 2) + effective_spindle_count
      # For 4-core with SSD: (4 * 2) + 1 = 9
      maximum-pool-size: 10
      minimum-idle: 5
      connection-timeout: 3000
      validation-timeout: 1000
      leak-detection-threshold: 60000
      max-lifetime: 1800000  # 30 minutes

8. Zero-Downtime Deployments

# kubernetes/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: subscription-service
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 3        # Add 50% extra during update
      maxUnavailable: 1  # Keep 5/6 running
  template:
    spec:
      containers:
      - name: subscription-service
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          initialDelaySeconds: 60
          periodSeconds: 10

9. Metrics That Matter

@Service
public class MetricsService {
    
    private final Counter enrollmentsCounter;
    private final Timer enrollmentDuration;
    
    public MetricsService(MeterRegistry registry) {
        this.enrollmentsCounter = Counter.builder("subscription.enrollments")
            .tag("service", "subscription")
            .register(registry);
            
        this.enrollmentDuration = Timer.builder("subscription.enrollment.duration")
            .publishPercentiles(0.5, 0.95, 0.99)
            .register(registry);
    }
}

Monitor:

  • Business: Enrollments/hour, conversion rate, revenue
  • Technical: P50/P95/P99 latency, error rate, throughput
  • Infrastructure: CPU, memory, connection pool utilization

10. Testing Strategy

// Unit tests - fast, isolated
@Test
void shouldCalculateCorrectExpirationDate() {
    EnrollmentService service = new EnrollmentService(
        mock(CustomerClient.class),
        mock(PaymentClient.class),
        mock(SubscriptionRepository.class)
    );
    
    Instant enrolledAt = Instant.parse("2024-01-01T00:00:00Z");
    Instant expected = Instant.parse("2025-01-01T00:00:00Z");
    
    assertEquals(expected, service.calculateExpiration(enrolledAt, SubscriptionTier.ANNUAL));
}

// Integration tests - with real database
@SpringBootTest
@Testcontainers
class SubscriptionServiceIntegrationTest {
    
    @Container
    static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:15");
    
    @Test
    void shouldPersistAndRetrieveSubscription() {
        EnrollmentResult result = service.enroll(request);
        Optional<Subscription> retrieved = service.getSubscription(result.getSubscriptionId());
        
        assertTrue(retrieved.isPresent());
        assertEquals(SubscriptionTier.BASIC, retrieved.get().getTier());
    }
}

Key Takeaways

After building and scaling microservices handling millions of requests:

  1. Design for failure - Circuit breakers, timeouts, and fallbacks are not optional
  2. Embrace eventual consistency - Not everything needs to be synchronous
  3. Observability is critical - You can't improve what you can't measure
  4. Cache aggressively - But invalidate intelligently
  5. Test at every level - Unit, integration, contract, and load tests
  6. Automate deployments - Including rollbacks and scaling
  7. Performance is a feature - Optimize from day one

What's Next?

In future posts, I'll dive deeper into:

  • Advanced observability with distributed tracing
  • Chaos engineering and resilience testing
  • Cost optimization for cloud deployments

Questions or want to discuss these patterns? Leave a comment below or connect with me on LinkedIn.


Tags: #spring-boot #microservices #java #system-design #aws #architecture #scalability #production-engineering

2 Comments

1 vote
1 vote

More Posts

All Spring Boot Annotations — One Cheat Sheet

Hector Williams - Mar 4

Upload Files to AWS S3 with Spring Boot (Clean Architecture Guide)

buildbasekit - Apr 12

How to Deploy a Production-Ready File Server on a VPS for Free

buildbasekit - Apr 12

From Monolith to Microservices: How Bounded Context Improves AI-Driven Development

mijura - Mar 20

From Prompts to Goals: The Rise of Outcome-Driven Development

Tom Smithverified - Apr 11
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!