Spring Boot File Upload Performance: Why Your API Gets Slow Under Load (And How to Fix It)

Question

Spring Boot File Upload Performance: Why Your API Gets Slow Under Load (And How to Fix It)

buildbasekitLeader

calendar_todayMay 20 • schedule6 min read

— Originally published at buildbasekit.com

Backend APIs often feel fast during development.

You upload a few files locally.

Everything responds instantly.

Latency looks great.

Memory looks stable.

CPU barely moves.

Then production traffic arrives.

Suddenly:

uploads slow down
downloads start hanging
videos become painfully slow
latency spikes
users complain

This happens because local testing rarely reflects real-world load.

A file upload API is not just handling HTTP requests.

It is dealing with:

disk I/O
file buffering
network transfer
JVM memory pressure
concurrent request handling
authentication overhead
metadata queries
file streaming

Under enough load, something breaks first.

The real question is:

What breaks first in a Spring Boot file upload system?

I ran a benchmark to find out.

Quick Answer

Spring Boot handled the application layer surprisingly well under heavy load.

The actual bottleneck was not request handling.

It was large file delivery.

Small files stayed fast.

Large video responses caused latency to rise sharply.

The lesson:

Spring Boot is fine for API orchestration, but direct media delivery becomes a scaling problem much faster than most developers expect.

The Test Scenario

The benchmark used a Spring Boot file service with typical production-style endpoints:

file upload
file download
metadata lookup
file listing
delete operations

The workload intentionally mixed different file behaviors because real systems rarely handle only one request type.

The goal was not to create a synthetic benchmark that looks impressive.

The goal was realism.

Traffic gradually increased in stages until the system started showing visible degradation.

Metrics tracked included:

request throughput
P95 latency
CPU usage
memory usage
endpoint behavior
file-type response performance

Why File Upload APIs Fail Earlier Than Expected

A standard CRUD API is relatively cheap.

A file API is not.

For example:

A metadata lookup:

GET /api/files/metadata/123

might hit:

authentication filter
database query
JSON serialization

That is manageable.

Now compare that with:

GET /api/files/download/huge-video.mp4

Now the system may need to:

authenticate request
validate permissions
locate file on disk
open file stream
allocate buffers
push large chunks over network
keep thread occupied longer
handle slow client reads

That changes everything.

The API layer is no longer the only concern.

Your infrastructure becomes part of request latency.

Benchmark Summary

Here’s the simplified outcome.

Metric	Result
Total Requests	134,000+
Peak Load	7,500 RPM
Peak P95 Latency	~1.8 seconds
HTTP Failures	0
Average CPU	~35%
Main Bottleneck	Large MP4 downloads

At first glance, this looks strong.

Zero failures under that volume is respectable.

But raw survival is not the same as healthy performance.

Latency tells the real story.

Spring Boot file upload latency under 7500 RPM load test
P95 latency stayed stable until higher RPM levels, then degraded rapidly after ~6,300 RPM.

Latency Looked Fine... Until It Didn’t

At lower traffic levels, performance was stable.

The backend remained responsive.

P95 latency stayed in a healthy range.

Then the ramp increased.

Past a certain threshold, latency began climbing rapidly.

This pattern matters.

Gradual degradation usually means the system is saturating naturally.

Sudden collapse usually means architectural failure.

This benchmark showed degradation, not collapse.

That is actually a good sign.

Still, once latency crosses production tolerance, users do not care whether your system technically survived.

They only see slowness.

Spring Boot file upload throughput under increasing load
Throughput tracked closely with target RPM until upper load levels where the system started falling behind.

Throughput Was Stable

One interesting finding:

The backend kept serving requests consistently even under heavier load.

That suggests:

request routing was stable
controllers were fine
business logic held up
auth overhead was manageable

This is important because many developers blame Spring Boot too early.

The framework itself was not the issue here.

The architecture was.

The Real Bottleneck

This was the most useful finding.

The slowest operations were not:

uploads
metadata APIs
listing APIs
delete operations

The worst offender was:

large MP4 delivery

Spring Boot file upload latency comparison for PDF PNG and MP4 files
MP4 downloads had dramatically higher P95 latency compared to PDF and PNG files.

Small files performed dramatically better.

Typical behavior looked like this:

File Type	Approx Performance
PDF	Extremely fast
PNG	Moderate
MP4	Much slower

This makes perfect sense.

Large media files create pressure across the entire stack.

Including:

Disk I/O

Reading large files repeatedly creates storage pressure.

Local disk works for development.

At scale, it becomes painful.

Network Throughput

A large file means longer response duration.

More active connections.

More bandwidth consumption.

Slow clients make this worse.

JVM Memory Pressure

Streaming often introduces buffering overhead.

Even when efficient, sustained concurrency adds pressure.

Memory growth is expected.

Thread Occupancy

Traditional request handling keeps resources busy longer.

A quick metadata request finishes fast.

A multi-second video stream does not.

That reduces effective throughput.

CPU Was Not the Problem

Spring Boot load test CPU usage chart
CPU usage increased gradually with traffic but remained relatively controlled during the test.

A common assumption:

"High latency means CPU saturation."

Not here.

CPU stayed relatively controlled.

Average utilization hovered around the mid-thirties.

That means compute was not the main bottleneck.

This is useful because it changes optimization priorities.

If CPU is not the problem:

Do not waste time micro-optimizing controller code first.

Fix architecture first.

Memory Behavior

Spring Boot load test memory usage chart
Memory usage increased under higher throughput but stayed within a manageable range.

Memory climbed during heavier traffic.

That is expected.

Large streaming workloads create pressure.

But the test did not suggest catastrophic memory behavior.

The bigger issue remained delivery mechanics, not memory collapse.

What This Actually Means for Production

This is where many teams make a bad decision.

They see a working local file upload implementation and assume:

"Great, we can scale this."

Maybe.

But probably not for media-heavy workloads.

A better architecture separates responsibilities.

Spring Boot should handle:

authentication
authorization
business rules
metadata
upload coordination
signed URL generation

Dedicated infrastructure should handle media delivery.

Examples:

AWS S3
Cloudflare R2
object storage
CDN edge delivery
Nginx static serving

This dramatically reduces backend pressure.

Better Architecture Example

Instead of this:

Client
   ↓
Spring Boot
   ↓
Local file disk
   ↓
Spring Boot streams file

Prefer:

Client
   ↓
Spring Boot auth + metadata
   ↓
Signed URL
   ↓
S3 / R2 / CDN

That changes scaling behavior completely.

Now your application server is not acting like a media server.

Which is exactly what you want.

Common Mistakes Developers Make

1. Treating Upload Success As Performance Proof

A few successful uploads prove almost nothing.

Load changes everything.

2. Serving Large Media Directly Forever

Works early.

Hurts later.

3. Blaming Spring Boot Too Quickly

The framework is often not the issue.

Bad delivery architecture usually is.

4. Ignoring P95 Latency

Average latency hides pain.

P95 exposes real user experience.

When Spring Boot Is Absolutely Fine

Spring Boot works well if your workload is:

document uploads
moderate internal tools
admin dashboards
metadata-heavy systems
authenticated business APIs

If your workload becomes:

video-heavy
media streaming
public downloads at scale

your architecture needs to evolve.

Practical Scaling Advice

If you already have a local file-based Spring Boot system:

do this first.

Move Storage

Shift from local disk to object storage.

Add CDN Delivery

Never make your app server deliver everything forever.

Use Signed URLs

Avoid routing every file request through application logic.

Benchmark Early

Do not wait until users discover bottlenecks.

Watch P95, Not Just Averages

This matters more than vanity metrics.

How many uploads can Spring Boot handle?

There is no universal number.

It depends on:

file size
infrastructure
storage strategy
concurrency
request mix
delivery architecture

Small files and metadata workloads scale much better than heavy media streaming.

Should I serve files directly from Spring Boot?

For small or internal workloads, yes.

For larger public media systems, usually no.

Object storage plus CDN is the better long-term design.

Why do video downloads slow everything down?

Because they stress:

disk reads
bandwidth
buffering
connection duration
thread occupancy

This creates systemic pressure much faster than lightweight API requests.

Final Takeaway

The benchmark revealed something useful.

Spring Boot did not fail.

The architecture reached its natural limit.

That distinction matters.

If your backend handles uploads, metadata, auth, and business logic:

Spring Boot is a strong fit.

If your backend is also acting as a high-volume media server:

you are creating your own bottleneck.

Design accordingly.

Originally inspired by real performance benchmarking work done while building a Spring Boot file infrastructure product at BuildBaseKit.

2 Comments

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

BuildBaseKit

4.2k Points • 86 Badges

India • buildbasekit.com

10Posts

121Comments

13Connections

I build practical backend tools for developers.

Most tutorials stop at basics.
Real systems need st... Show more

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

SuMiTa · Answer 1 · 2026-05-21T23:59:27+0000

SuMiTa • May 21

Good article. Curious if switching to reactive streams actually helped a lot in your testing or if infra changes mattered more?

buildbasekit • May 22

@[sumita] Great question.

Reactive streams can definitely help with thread efficiency when you have lots of long-lived requests, but in my testing the bigger gains usually come from architecture changes rather than switching programming models.

For example, if uploads are handled asynchronously, you can reduce response latency by acknowledging the request early and processing the file write in the background. And for downloads, signed URLs with object storage/CDN remove the application server from the heavy lifting entirely, which is usually the biggest win for large media delivery.

So I’d say reactive helps, but offloading storage/delivery tends to move the needle more.

	Your AI Doesn't Just Write Tests. It Runs Them Too. Kevin Martinez - May 12
	I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt Karol Modelskiverified - Mar 19
	Why Your WordPress Site Is Slow (It Is Not Always Hosting) ApogeeWatcherverified - Jul 13
	How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work Dharanidharan - Feb 9
	WooCommerce Performance Tuning (2026 Guide) ApogeeWatcherverified - Jul 10

Welcome to Coder Legion

Connect with 4,753 amazing developers

Don't have an account? Sign up

OR

Spring Boot File Upload Performance: Why Your API Gets Slow Under Load (And How to Fix It)

Quick Answer

The Test Scenario

Why File Upload APIs Fail Earlier Than Expected

Benchmark Summary

Latency Looked Fine... Until It Didn’t

Throughput Was Stable

The Real Bottleneck

Disk I/O

Network Throughput

JVM Memory Pressure

Thread Occupancy

CPU Was Not the Problem

Memory Behavior

What This Actually Means for Production

Better Architecture Example

Common Mistakes Developers Make

1. Treating Upload Success As Performance Proof

2. Serving Large Media Directly Forever

3. Blaming Spring Boot Too Quickly

4. Ignoring P95 Latency

When Spring Boot Is Absolutely Fine

Practical Scaling Advice

Move Storage

Add CDN Delivery

Use Signed URLs

Benchmark Early

Watch P95, Not Just Averages

How many uploads can Spring Boot handle?

Should I serve files directly from Spring Boot?

Why do video downloads slow everything down?

Final Takeaway

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From buildbasekit

Related Jobs

Commenters (This Week)