The Lustre Collective: Open-Source Veterans Launch Independent Filesystem Support Company

The Lustre Collective: Open-Source Veterans Launch Independent Filesystem Support Company

BackerLeader posted 3 min read

Three longtime Lustre developers have formed The Lustre Collective, an independent company focused on advancing the parallel filesystem that powers some of the world's largest supercomputers and AI systems.

The company launched at Supercomputing in November 2025. The founders include Andreas Dilger, who has led Lustre development since 1999, along with Peter Jones and Colin Faber.

Why Lustre Still Matters

Lustre runs on 8 of the top 10 HPC systems and over 60% of the top 100 systems in 2025. It handles storage for exascale supercomputers like Frontier at Oak Ridge National Laboratory and El Capitan at Lawrence Livermore National Laboratory.

The file system also powers major AI infrastructure. Examples include NVIDIA's EOS AI DGX SuperPOD and xAI's Colossus AI supercomputer.

All major public clouds now offer Lustre as a first-party service. That includes AWS, Azure, Google Cloud, and Oracle.

Technical Capabilities

Current Lustre deployments can deliver serious performance numbers:

  • 10TB/s+ read/write throughput
  • 100M+ IOPS
  • 700TB+ capacity with 100B+ files
  • Support for 20,000+ clients and 100,000+ GPUs

The architecture uses fully parallel data and metadata at scale. The system is POSIX compatible and works with different storage types, including TLC/QLC NVMe and HDD.

Lustre supports direct client access without needing tiering. You can use client NVMe for local caching. The file system can be re-exported via NFS, SMB, or S3 protocols.

Security features include AES256/fscrypt encryption, Kerberos authentication, and subdirectory isolation with nodemap controls.

What's Coming in 2026

Version 2.18 is in progress with several major features:

Erasure-Coded Files - Being developed by DDN, TLC, and Oak Ridge National Laboratory. This reduces storage overhead while maintaining data protection.

Trash Can/Undelete - A joint effort between DDN and TLC that lets users recover accidentally deleted files.

Fault-Tolerant MGS - Improves reliability of the Management Service, a critical component in Lustre deployments.

Client-Side Data Compression - DDN is building this to reduce storage requirements and network bandwidth.

Large Folio Client IO Optimization - HPE's contribution to improve I/O performance on modern Linux kernels.

GPU Peer-to-Peer RDMA - AWS is adding direct GPU-to-GPU data transfers without CPU involvement.

Version 2.15.8 remains the long-term support release and will continue receiving updates through 2026.

Future Development Priorities

TLC is working with partners to identify critical roadmap projects. The company plans to announce an updated long-term roadmap at LUG2026 in April.

Areas being discussed include:

  • Accelerated recovery mechanisms
  • Metadata redundancy
  • Metadata writeback cache

The company will expand its team in 2026 to focus on six strategic areas: improved availability, expanded filesystem resiliency, better serviceability and ease of use, enhanced multi-tenancy, easier Quality of Service management, and modernized tooling and monitoring.

The Business Model

TLC offers tailored subscription services for different customer needs. Services include expert consulting and training, production support contracts, custom feature development, performance tuning and optimization, training and knowledge transfer, and deployment and migration services.

The company positions itself as an independent, vendor-neutral partner. This matters in an open-source ecosystem where multiple companies contribute to the codebase.

The Lustre 2.17 commits chart shows contributions from Oracle, DDN, AWS, AEON, Oak Ridge, Whamcloud, HPE, NVIDIA, Google, Microsoft, and others. TLC aims to work across this entire community rather than representing any single vendor's interests.

Why This Matters

Lustre has been around for 25+ years. That longevity comes with real-world testing and feedback that newer filesystems lack.

As Steve Crusan and Brock Johnson from Hudson River Trading noted in a recent presentation, "Lustre endures—25+ years of real-world testing, feedback, and expertise. Most competitors haven't been battle-tested at scale."

The formation of TLC gives organizations running Lustre deployments an independent option for support and development. For developers working on HPC or AI infrastructure, it means continued investment in a proven technology that scales to exascale workloads.

1 Comment

0 votes
0

More Posts

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules

snapsynapseverified - Apr 20

Merancang Backend Bisnis ISP: API Pelanggan, Paket Internet, Invoice, dan Tiket Support

Masbadar - Mar 13

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

snapsynapseverified - Apr 20

Your AI Agent Skills Have a Version Control Problem

snapsynapseverified - Apr 22
chevron_left