Visualizing Inference: Training a Medical Imaging U-Net on a 2014 CPU and a GTX 1060

Question

Visualizing Inference: Training a Medical Imaging U-Net on a 2014 CPU and a GTX 1060

calendar_today1 day ago • schedule2 min read

While the mainstream tech world burns gigawatts of energy scaling massive models on H100 clusters, I decided to pull things down to earth and focus on low-level optimization for a real-world, high-impact domain: automated medical image segmentation.

The objective was to train a custom light U-Net architecture to segment brain tumors using the international BraTS (Brain Tumor Segmentation) dataset.

The Stack & Hardware Control

OS: Arch Linux
Runtime & Framework: Pure PyTorch
Package Management: uv (because legacy tools are too slow for proper systems engineering)
Hardware Constraints: A legacy 8-core AMD FX-8370E CPU (Vishera architecture, 95W TDP) paired with a consumer-grade GTX 1060 (6GB VRAM) completely devoid of modern Tensor Cores.

No AWS instances, no managed cloud notebooks. Just raw local hardware control and silicon optimization.

Pipeline & Thermal Optimization

Processing volumetric 3D medical data (stored in multi-channel HDF5 formats) imposes severe stress on the system bus and motherboard VRMs. Under full load, the FX-8370E operated at the edge of its thermal envelope, drawing 89.63 W out of its 94.84 W power limit. Proper custom cooling configuration kept core temperatures stable below 51°C.

To handle potential hardware instability over extended workloads without state loss, a lightweight polling routine handled deterministic checkpointing. The runtime evaluated telemetry boundaries every 5–10 seconds, dumping the model state to disk and maintaining a rolling fallback to the best stable weight vectors if anomalies were detected.

Training Performance Analytics (10 Epochs)

Training overhead stabilized at exactly 24 minutes and 21 seconds per epoch, maintaining a steady throughput of 8.81 iterations per second.

Epoch 10/10: 100%|████████████| 12869/12869 [24:21<00:00, 8.81it/s, loss=0.00272]</p>

[INFO] Epoch 10 | Train Loss: 0.0017 | Val Loss: 0.0017

[INFO] Saved state: ./checkpoints/unet_brats_epoch_10.pth

The loss convergence curves demonstrated high stability, driven by a Dice Loss objective optimized for severe class imbalance (where the background voxels vastly outnumber the target tumor regions):

Epoch 1: Train Loss: 0.0457 | Val Loss: 0.0038
Epoch 10: Train Loss: 0.0017 | Val Loss: 0.0017

The near-identical alignment of the final training and validation loss values verifies optimal generalization. The model successfully bypassed overfitting and is fully prepared to execute inference on unseen test distributions.

Ground Truth vs. U-Net Prediction: Pixel-Level Conformance

The visual validation of the inference pipeline highlights the precision achieved within just 10 training epochs.

When comparing the manual, hour-intensive annotations generated by expert radiologists (Ground Truth) against the immediate output of the AI pipeline (U-Net Prediction), the geometric alignment is striking.

The network mapped complex, irregular structural boundaries of the tumor core with pixel-perfect accuracy. It accurately preserved sharp edge features and small satellite regions while completely suppressing false positives in healthy brain tissue. The localized Dice Coefficient for highly descriptive slices directly approaches a top-tier ~0.95 boundary.

Next Steps & Deployment Architecture

The compiled model weights occupy a mere 23 Megabytes. This minimal footprint eliminates the need for expensive server-side hardware during deployment. The next phase involves serializing the compute graph via ONNX Runtime and OpenVINO to implement real-time, low-latency CPU inference capable of running locally on a standard workstation right inside a clinical environment.

Longer-term plans involve interfacing this lightweight pipeline with real-time fNIRS/EEG data streams and deploying object detection layers for immediate anomaly localization.

The repository is open-source. True systems engineering relies on peer review and transparent codebases.

https://github.com/alexvoste/forgemed-ai

What are your thoughts on optimizing U-Net execution parameters for edge CPU architectures? Let's discuss performance tuning down in the comments.

Alex Voste

258 Points • 10 Badges •

Sweden • t.co/4fpTf3dL1D

6Posts

2Comments

1Followers

1Connections

Writing ForgeZero: Fixing the mess of modern build systems.
Performance overhead is my personal enemy.
C | Go | x86_64 Asm (3 dialects)

✨ Build your own developer journey

Track progress. Share learning. Stay consistent.

Create your profile

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

	The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI Ken W. Algerverified - Jun 4
	I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt Karol Modelskiverified - Mar 19
	Optimizing the Clinical Interface: Data Management for Efficient Medical Outcomes Huifer - Jan 26
	I Wrote a Script to Fix Audible's Unreadable PDF Filenames snapsynapseverified - Apr 20
	I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules snapsynapseverified - Apr 20

Visualizing Inference: Training a Medical Imaging U-Net on a 2014 CPU and a GTX 1060

The Stack & Hardware Control

Pipeline & Thermal Optimization

Training Performance Analytics (10 Epochs)

Ground Truth vs. U-Net Prediction: Pixel-Level Conformance

Next Steps & Deployment Architecture

0 Comments

Please log in to comment on this post.

More Posts

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Optimizing the Clinical Interface: Data Management for Efficient Medical Outcomes

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules

More From alexvoste

Zero-overhead building on the go: ForgeZero hits Android/arm64

ForgeZero: musl Cross-Compilation and Objective-C Support

Gloria JIT v4.4.0 — Bare-Metal Control, Memory Primitives, and Structured Flow

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,417 amazing developers

Don't have an account? Sign up

OR

Visualizing Inference: Training a Medical Imaging U-Net on a 2014 CPU and a GTX 1060

The Stack & Hardware Control

Pipeline & Thermal Optimization

Training Performance Analytics (10 Epochs)

Ground Truth vs. U-Net Prediction: Pixel-Level Conformance

Next Steps & Deployment Architecture

0 Comments

Please log in to comment on this post.

More Posts

More From alexvoste

Related Jobs

Commenters (This Week)