Backend APIs often feel fast during development.
You upload a few files locally.
Everything responds instantly.
Latency looks great.
Memory looks stable.
CPU barely moves.
Then production traffic arrives.
Suddenly:
- uploads slow down
- downloads start hanging
- videos become painfully slow
- latency spikes
- users complain
This happens because local testing rarely reflects real-world load.
A file upload API is not just handling HTTP requests.
It is dealing with:
- disk I/O
- file buffering
- network transfer
- JVM memory pressure
- concurrent request handling
- authentication overhead
- metadata queries
- file streaming
Under enough load, something breaks first.
The real question is:
What breaks first in a Spring Boot file upload system?
I ran a benchmark to find out.
Quick Answer
Spring Boot handled the application layer surprisingly well under heavy load.
The actual bottleneck was not request handling.
It was large file delivery.
Small files stayed fast.
Large video responses caused latency to rise sharply.
The lesson:
Spring Boot is fine for API orchestration, but direct media delivery becomes a scaling problem much faster than most developers expect.
The Test Scenario
The benchmark used a Spring Boot file service with typical production-style endpoints:
- file upload
- file download
- metadata lookup
- file listing
- delete operations
The workload intentionally mixed different file behaviors because real systems rarely handle only one request type.
The goal was not to create a synthetic benchmark that looks impressive.
The goal was realism.
Traffic gradually increased in stages until the system started showing visible degradation.
Metrics tracked included:
- request throughput
- P95 latency
- CPU usage
- memory usage
- endpoint behavior
- file-type response performance
Why File Upload APIs Fail Earlier Than Expected
A standard CRUD API is relatively cheap.
A file API is not.
For example:
A metadata lookup:
GET /api/files/metadata/123
might hit:
- authentication filter
- database query
- JSON serialization
That is manageable.
Now compare that with:
GET /api/files/download/huge-video.mp4
Now the system may need to:
- authenticate request
- validate permissions
- locate file on disk
- open file stream
- allocate buffers
- push large chunks over network
- keep thread occupied longer
- handle slow client reads
That changes everything.
The API layer is no longer the only concern.
Your infrastructure becomes part of request latency.
Benchmark Summary
Here’s the simplified outcome.
| Metric | Result |
| Total Requests | 134,000+ |
| Peak Load | 7,500 RPM |
| Peak P95 Latency | ~1.8 seconds |
| HTTP Failures | 0 |
| Average CPU | ~35% |
| Main Bottleneck | Large MP4 downloads |
At first glance, this looks strong.
Zero failures under that volume is respectable.
But raw survival is not the same as healthy performance.
Latency tells the real story.

P95 latency stayed stable until higher RPM levels, then degraded rapidly after ~6,300 RPM.
Latency Looked Fine... Until It Didn’t
At lower traffic levels, performance was stable.
The backend remained responsive.
P95 latency stayed in a healthy range.
Then the ramp increased.
Past a certain threshold, latency began climbing rapidly.
This pattern matters.
Gradual degradation usually means the system is saturating naturally.
Sudden collapse usually means architectural failure.
This benchmark showed degradation, not collapse.
That is actually a good sign.
Still, once latency crosses production tolerance, users do not care whether your system technically survived.
They only see slowness.

Throughput tracked closely with target RPM until upper load levels where the system started falling behind.
Throughput Was Stable
One interesting finding:
The backend kept serving requests consistently even under heavier load.
That suggests:
- request routing was stable
- controllers were fine
- business logic held up
- auth overhead was manageable
This is important because many developers blame Spring Boot too early.
The framework itself was not the issue here.
The architecture was.
The Real Bottleneck
This was the most useful finding.
The slowest operations were not:
- uploads
- metadata APIs
- listing APIs
- delete operations
The worst offender was:
large MP4 delivery

MP4 downloads had dramatically higher P95 latency compared to PDF and PNG files.
Small files performed dramatically better.
Typical behavior looked like this:
| File Type | Approx Performance |
| PDF | Extremely fast |
| PNG | Moderate |
| MP4 | Much slower |
This makes perfect sense.
Large media files create pressure across the entire stack.
Including:
Disk I/O
Reading large files repeatedly creates storage pressure.
Local disk works for development.
At scale, it becomes painful.
Network Throughput
A large file means longer response duration.
More active connections.
More bandwidth consumption.
Slow clients make this worse.
JVM Memory Pressure
Streaming often introduces buffering overhead.
Even when efficient, sustained concurrency adds pressure.
Memory growth is expected.
Thread Occupancy
Traditional request handling keeps resources busy longer.
A quick metadata request finishes fast.
A multi-second video stream does not.
That reduces effective throughput.
CPU Was Not the Problem

CPU usage increased gradually with traffic but remained relatively controlled during the test.
A common assumption:
"High latency means CPU saturation."
Not here.
CPU stayed relatively controlled.
Average utilization hovered around the mid-thirties.
That means compute was not the main bottleneck.
This is useful because it changes optimization priorities.
If CPU is not the problem:
Do not waste time micro-optimizing controller code first.
Fix architecture first.
Memory Behavior

Memory usage increased under higher throughput but stayed within a manageable range.
Memory climbed during heavier traffic.
That is expected.
Large streaming workloads create pressure.
But the test did not suggest catastrophic memory behavior.
The bigger issue remained delivery mechanics, not memory collapse.
What This Actually Means for Production
This is where many teams make a bad decision.
They see a working local file upload implementation and assume:
"Great, we can scale this."
Maybe.
But probably not for media-heavy workloads.
A better architecture separates responsibilities.
Spring Boot should handle:
- authentication
- authorization
- business rules
- metadata
- upload coordination
- signed URL generation
Dedicated infrastructure should handle media delivery.
Examples:
- AWS S3
- Cloudflare R2
- object storage
- CDN edge delivery
- Nginx static serving
This dramatically reduces backend pressure.
Better Architecture Example
Instead of this:
Client
↓
Spring Boot
↓
Local file disk
↓
Spring Boot streams file
Prefer:
Client
↓
Spring Boot auth + metadata
↓
Signed URL
↓
S3 / R2 / CDN
That changes scaling behavior completely.
Now your application server is not acting like a media server.
Which is exactly what you want.
Common Mistakes Developers Make
A few successful uploads prove almost nothing.
Load changes everything.
Works early.
Hurts later.
3. Blaming Spring Boot Too Quickly
The framework is often not the issue.
Bad delivery architecture usually is.
4. Ignoring P95 Latency
Average latency hides pain.
P95 exposes real user experience.
When Spring Boot Is Absolutely Fine
Spring Boot works well if your workload is:
- document uploads
- moderate internal tools
- admin dashboards
- metadata-heavy systems
- authenticated business APIs
If your workload becomes:
- video-heavy
- media streaming
- public downloads at scale
your architecture needs to evolve.
Practical Scaling Advice
If you already have a local file-based Spring Boot system:
do this first.
Move Storage
Shift from local disk to object storage.
Add CDN Delivery
Never make your app server deliver everything forever.
Use Signed URLs
Avoid routing every file request through application logic.
Benchmark Early
Do not wait until users discover bottlenecks.
Watch P95, Not Just Averages
This matters more than vanity metrics.
How many uploads can Spring Boot handle?
There is no universal number.
It depends on:
- file size
- infrastructure
- storage strategy
- concurrency
- request mix
- delivery architecture
Small files and metadata workloads scale much better than heavy media streaming.
Should I serve files directly from Spring Boot?
For small or internal workloads, yes.
For larger public media systems, usually no.
Object storage plus CDN is the better long-term design.
Why do video downloads slow everything down?
Because they stress:
- disk reads
- bandwidth
- buffering
- connection duration
- thread occupancy
This creates systemic pressure much faster than lightweight API requests.
Final Takeaway
The benchmark revealed something useful.
Spring Boot did not fail.
The architecture reached its natural limit.
That distinction matters.
If your backend handles uploads, metadata, auth, and business logic:
Spring Boot is a strong fit.
If your backend is also acting as a high-volume media server:
you are creating your own bottleneck.
Design accordingly.
Originally inspired by real performance benchmarking work done while building a Spring Boot file infrastructure product at BuildBaseKit.