Three longtime Lustre developers have formed The Lustre Collective, an independent company focused on advancing the parallel filesystem that powers some of the world's largest supercomputers and AI systems.
The company launched at Supercomputing in November 2025. The founders include Andreas Dilger, who has led Lustre development since 1999, along with Peter Jones and Colin Faber.
Why Lustre Still Matters
Lustre runs on 8 of the top 10 HPC systems and over 60% of the top 100 systems in 2025. It handles storage for exascale supercomputers like Frontier at Oak Ridge National Laboratory and El Capitan at Lawrence Livermore National Laboratory.
The file system also powers major AI infrastructure. Examples include NVIDIA's EOS AI DGX SuperPOD and xAI's Colossus AI supercomputer.
All major public clouds now offer Lustre as a first-party service. That includes AWS, Azure, Google Cloud, and Oracle.
Technical Capabilities
Current Lustre deployments can deliver serious performance numbers:
- 10TB/s+ read/write throughput
- 100M+ IOPS
- 700TB+ capacity with 100B+ files
- Support for 20,000+ clients and 100,000+ GPUs
The architecture uses fully parallel data and metadata at scale. The system is POSIX compatible and works with different storage types, including TLC/QLC NVMe and HDD.
Lustre supports direct client access without needing tiering. You can use client NVMe for local caching. The file system can be re-exported via NFS, SMB, or S3 protocols.
Security features include AES256/fscrypt encryption, Kerberos authentication, and subdirectory isolation with nodemap controls.
What's Coming in 2026
Version 2.18 is in progress with several major features:
Erasure-Coded Files - Being developed by DDN, TLC, and Oak Ridge National Laboratory. This reduces storage overhead while maintaining data protection.
Trash Can/Undelete - A joint effort between DDN and TLC that lets users recover accidentally deleted files.
Fault-Tolerant MGS - Improves reliability of the Management Service, a critical component in Lustre deployments.
Client-Side Data Compression - DDN is building this to reduce storage requirements and network bandwidth.
Large Folio Client IO Optimization - HPE's contribution to improve I/O performance on modern Linux kernels.
GPU Peer-to-Peer RDMA - AWS is adding direct GPU-to-GPU data transfers without CPU involvement.
Version 2.15.8 remains the long-term support release and will continue receiving updates through 2026.
Future Development Priorities
TLC is working with partners to identify critical roadmap projects. The company plans to announce an updated long-term roadmap at LUG2026 in April.
Areas being discussed include:
- Accelerated recovery mechanisms
- Metadata redundancy
- Metadata writeback cache
The company will expand its team in 2026 to focus on six strategic areas: improved availability, expanded filesystem resiliency, better serviceability and ease of use, enhanced multi-tenancy, easier Quality of Service management, and modernized tooling and monitoring.
The Business Model
TLC offers tailored subscription services for different customer needs. Services include expert consulting and training, production support contracts, custom feature development, performance tuning and optimization, training and knowledge transfer, and deployment and migration services.
The company positions itself as an independent, vendor-neutral partner. This matters in an open-source ecosystem where multiple companies contribute to the codebase.
The Lustre 2.17 commits chart shows contributions from Oracle, DDN, AWS, AEON, Oak Ridge, Whamcloud, HPE, NVIDIA, Google, Microsoft, and others. TLC aims to work across this entire community rather than representing any single vendor's interests.
Why This Matters
Lustre has been around for 25+ years. That longevity comes with real-world testing and feedback that newer filesystems lack.
As Steve Crusan and Brock Johnson from Hudson River Trading noted in a recent presentation, "Lustre endures—25+ years of real-world testing, feedback, and expertise. Most competitors haven't been battle-tested at scale."
The formation of TLC gives organizations running Lustre deployments an independent option for support and development. For developers working on HPC or AI infrastructure, it means continued investment in a proven technology that scales to exascale workloads.