Testing AI configs one at a time is costing you weeks—RapidFire runs them all at once.

Testing AI configs one at a time is costing you weeks—RapidFire runs them all at once.

BackerLeader posted 3 min read

RapidFire.ai Speeds Up AI Experimentation So You Can Actually Ship

Most AI projects never make it to production. MIT found that 95% get stuck in the prototype phase. The problem isn't a lack of ideas or use cases. It's that teams can't get the results they need fast enough.

Here's what typically happens: You build a RAG pipeline. You test one chunking strategy. Wait. Check the results. Try a different embedding model. Wait again. Test a new prompt template. More waiting. This sequential process drags on for weeks while you burn through API tokens or tie up GPUs.

RapidFire.ai, launching November 4 at Ray Summit, takes a different approach. Instead of testing configurations one at a time, it runs them all in parallel.

The Real Problem with RAG Experimentation

"RAG is not a silver bullet," says Arun Kumar, co-founder and CTO of RapidFire.ai and a computer science professor at UC San Diego. "There are a lot of knobs to tune, and people have underappreciated that."

Those knobs include data chunking strategies, embedding models, retrieval schemes, rerankers, generator models, and prompt templates. Each combination produces different results. Finding the right combination traditionally means testing each one sequentially.

This creates three problems. First, it's slow. Days or weeks to find optimal configurations. Second, it's expensive. You're either wasting GPU cycles on self-hosted models or hitting rate limits on API-based models like OpenAI. Third, you often don't know when to stop experimenting or even where to start.

"Where do I start and when do I stop?" asks Jack Norris, CEO and co-founder. "This levels the playing field so you don't have to have great intuition to understand where to start or when to stop. You can see the results."

How RapidFire Works

RapidFire uses what they call "hyperparallelized experimentation." Multiple configurations run concurrently on shared data. The system aggregates evaluation metrics in real time and provides valid confidence intervals for RAG experiments. For fine-tuning, learning curves appear together so you can compare approaches as they train.

The dynamic control feature stops underperforming configurations and reallocates those resources to better performers. If you're using API-based models, it automatically balances token spend across configurations. For self-hosted models, it maximizes GPU utilization even on a single GPU.

"People think you can just keep sending stuff to OpenAI," Kumar explains. "They don't realize there are rate limits on token spend. Even if you're willing to pay, you're limited on how many tokens you can process per minute. Our adaptive execution engine balances that automatically."

The result: Kumar and Norris claim 20x improvements in speed and cost compared to traditional sequential testing.

Who Should Use This

RapidFire isn't for everyone. If you're building a basic chatbot or you haven't figured out your evaluation metrics yet, this is probably overkill.

"Companies that are ready for this have understood what the eval metrics are for their AI use case," Kumar says. "They know how to write an LLM-as-judge implementation or evaluation functions to measure how good their pipeline is."

The sweet spot for customers includes financial services, healthcare, retail, and telecom companies—organizations with substantial textual data and clear use cases. AI startups where product accuracy is critical are also good candidates.

"We have customers with use cases that are only internal because the accuracy isn't good enough for external applications," Norris explains. "If they can change the chunking or rerankers to get better results, they can move into production."

What Surprised the Founders

Kumar and Norris have been working on deep learning optimization systems for years. But they were still surprised by how quickly LLM adoption happened—and how creative teams have become with evaluation metrics.

"We've heard everything from custom code with heuristics to knowledge-based approaches to using small language models inside the eval function," Kumar says. "It's been fascinating to see how creative people have become."

That creativity in evaluation has enabled teams to ship products with "good enough" metrics, then improve based on real customer data. But experimentation remains the bottleneck.

Built on Open Source

RapidFire launches as an open-source package built on Ray, the distributed computing framework. It integrates with tools you're already using: PyTorch, TensorFlow, Hugging Face, Keras, and Jupyter notebooks. It works with major cloud providers (AWS, Google Cloud, Azure) and NVIDIA infrastructure, as well as on-premises GPU clusters.

The company was founded by Andrew Ng's AI Fund, pairing Norris (with a background in big data and analytics) with Kumar (who has spent 15 years working on systems for machine learning and deep learning at UCSD).

"This gap between generalist LLMs and proprietary data is the least developed part of the stack," Norris says. "That's where we're focused."

If you're stuck testing RAG configurations one at a time, RapidFire might help you ship faster. The open-source package launches November 4, 2025 at 9am ET.

More Posts

ReadmeReady: Free and Customizable Code Documentation with LLMs — A Fine-Tuning Approach

Souradip Pal - May 11

DDN unveils Infinia 2.0: Sub-millisecond data access for AI development at scale.

Tom Smith - Jun 15

Fine-Tune SLMs in Colab for Free : A 4-Bit Approach with Meta Llama 3.2

rishabdugar - May 18

Most AI projects fail because companies try to boil the ocean instead of solving one simple problem.

Tom Smith - Sep 9

Understanding AI Design Patterns: A Deep Dive into the RAG Design Pattern

Aparna Bhat - Jan 17
chevron_left