The Lustre Collective: Open-Source Veterans Launch Independent Filesystem Support Company

Question

The Lustre Collective: Open-Source Veterans Launch Independent Filesystem Support Company

Tom SmithverifiedBackerLeader posted Feb 5 3 min read

Three longtime Lustre developers have formed The Lustre Collective, an independent company focused on advancing the parallel filesystem that powers some of the world's largest supercomputers and AI systems.

The company launched at Supercomputing in November 2025. The founders include Andreas Dilger, who has led Lustre development since 1999, along with Peter Jones and Colin Faber.

Why Lustre Still Matters

Lustre runs on 8 of the top 10 HPC systems and over 60% of the top 100 systems in 2025. It handles storage for exascale supercomputers like Frontier at Oak Ridge National Laboratory and El Capitan at Lawrence Livermore National Laboratory.

The file system also powers major AI infrastructure. Examples include NVIDIA's EOS AI DGX SuperPOD and xAI's Colossus AI supercomputer.

All major public clouds now offer Lustre as a first-party service. That includes AWS, Azure, Google Cloud, and Oracle.

Technical Capabilities

Current Lustre deployments can deliver serious performance numbers:

10TB/s+ read/write throughput
100M+ IOPS
700TB+ capacity with 100B+ files
Support for 20,000+ clients and 100,000+ GPUs

The architecture uses fully parallel data and metadata at scale. The system is POSIX compatible and works with different storage types, including TLC/QLC NVMe and HDD.

Lustre supports direct client access without needing tiering. You can use client NVMe for local caching. The file system can be re-exported via NFS, SMB, or S3 protocols.

Security features include AES256/fscrypt encryption, Kerberos authentication, and subdirectory isolation with nodemap controls.

What's Coming in 2026

Version 2.18 is in progress with several major features:

Erasure-Coded Files - Being developed by DDN, TLC, and Oak Ridge National Laboratory. This reduces storage overhead while maintaining data protection.

Trash Can/Undelete - A joint effort between DDN and TLC that lets users recover accidentally deleted files.

Fault-Tolerant MGS - Improves reliability of the Management Service, a critical component in Lustre deployments.

Client-Side Data Compression - DDN is building this to reduce storage requirements and network bandwidth.

Large Folio Client IO Optimization - HPE's contribution to improve I/O performance on modern Linux kernels.

GPU Peer-to-Peer RDMA - AWS is adding direct GPU-to-GPU data transfers without CPU involvement.

Version 2.15.8 remains the long-term support release and will continue receiving updates through 2026.

Future Development Priorities

TLC is working with partners to identify critical roadmap projects. The company plans to announce an updated long-term roadmap at LUG2026 in April.

Areas being discussed include:

Accelerated recovery mechanisms
Metadata redundancy
Metadata writeback cache

The company will expand its team in 2026 to focus on six strategic areas: improved availability, expanded filesystem resiliency, better serviceability and ease of use, enhanced multi-tenancy, easier Quality of Service management, and modernized tooling and monitoring.

The Business Model

TLC offers tailored subscription services for different customer needs. Services include expert consulting and training, production support contracts, custom feature development, performance tuning and optimization, training and knowledge transfer, and deployment and migration services.

The company positions itself as an independent, vendor-neutral partner. This matters in an open-source ecosystem where multiple companies contribute to the codebase.

The Lustre 2.17 commits chart shows contributions from Oracle, DDN, AWS, AEON, Oak Ridge, Whamcloud, HPE, NVIDIA, Google, Microsoft, and others. TLC aims to work across this entire community rather than representing any single vendor's interests.

Why This Matters

Lustre has been around for 25+ years. That longevity comes with real-world testing and feedback that newer filesystems lack.

As Steve Crusan and Brock Johnson from Hudson River Trading noted in a recent presentation, "Lustre endures—25+ years of real-world testing, feedback, and expertise. Most competitors haven't been battle-tested at scale."

The formation of TLC gives organizations running Lustre deployments an independent option for support and development. For developers working on HPC or AI infrastructure, it means continued investment in a proven technology that scales to exascale workloads.

2 Comments

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

Marco Marelli · Answer 1 · 2026-02-06T03:43:59+0000

Marco Marelli • Feb 5

Didnt realize Lustre was still behind so many top HPC systems, interesting note Tom, curious if the client side compression will noticeably cut training storage costs for AI clusters.

Tom Smithverified • Feb 6

@[Marco Marelli] Thanks! Yeah, Lustre's market share in HPC is stronger than most people realize. Eight of the top 10 systems is solid dominance.

On compression: it should help with AI training costs. The benefit is twofold - you're storing less data (obvious win), but you're also reducing network bandwidth since training involves reading datasets repeatedly. DDN is building this for the 2.18 release. Most implementations I've seen deliver 2-3x compression ratios on training data, though it varies by data type.

The real issue is keeping GPUs fed with data. When 100,000 GPUs are waiting on storage, the compute costs dwarf the storage savings. But every bit helps when you're operating at that scale.

	I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt Karol Modelskiverified - Mar 19
	I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules snapsynapseverified - Apr 20
	Merancang Backend Bisnis ISP: API Pelanggan, Paket Internet, Invoice, dan Tiket Support Masbadar - Mar 13
	I Wrote a Script to Fix Audible's Unreadable PDF Filenames snapsynapseverified - Apr 20
	Your AI Doesn't Just Write Tests. It Runs Them Too. Kevin Martinez - May 12

The Lustre Collective: Open-Source Veterans Launch Independent Filesystem Support Company

Why Lustre Still Matters

Technical Capabilities

What's Coming in 2026

Future Development Priorities

The Business Model

Why This Matters

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules

Merancang Backend Bisnis ISP: API Pelanggan, Paket Internet, Invoice, dan Tiket Support

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

Your AI Doesn't Just Write Tests. It Runs Them Too.

More From Tom Smith

Why Regulated Industries Are Leading the Agentic AI Revolution

Why Agentic AI Fails Without Process Redesign

Kore.ai Wants to Let AI Build, Govern, and Optimize Your AI Agents

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,341 amazing developers

Don't have an account? Sign up

OR

The Lustre Collective: Open-Source Veterans Launch Independent Filesystem Support Company

Why Lustre Still Matters

Technical Capabilities

What's Coming in 2026

Future Development Priorities

The Business Model

Why This Matters

2 Comments

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Tom Smith

Related Jobs

Commenters (This Week)