Beyond Elasticsearch: The Rise of Unified Vector and Keyword Search

posted 6 min read

This post was written by James Luan, VP of Engineering at Zilliz and key maintainer of the Milvus open source project.

The Evolution of Modern Search Architecture

You're debugging a critical production issue, and you know the solution exists in your documentation. Your vector search returns semantically relevant results about error handling and system recovery but somehow misses that one crucial Stack Overflow answer you saved last month with the exact error code. Sound familiar?

This common frustration highlights a fundamental challenge in modern search technology: the need to balance semantic understanding with precise term matching. The rise of AI and large language models has brought powerful semantic search capabilities through vector embeddings while revealing the continued importance of traditional keyword matching. When supporting customers, for instance, vector search brilliantly handles natural language queries like "my payment isn't going through" by understanding user intent. Still, keyword search remains essential for matching specific product codes and error messages.

Recent experiences with RAG implementations across the industry have revealed interesting insights about search technology. While dense vector embeddings excel at capturing semantic relationships and contextual understanding, traditional BM25 keyword matching still proves superior in certain scenarios, particularly with domain-specific queries and rare terms. This complementary strength of different search approaches has pushed the community toward exploring unified solutions.

However, implementing hybrid search at scale presents its own set of challenges. While frameworks like LangChain or LlamaIndex have made proof-of-concept implementations straightforward, scaling these solutions to production reveals significant architectural complexities. The conventional approach requires maintaining separate vector databases and search engines, resulting in the following:

  • Data redundancy across multiple storage systems
  • Complex consistency management between separate indices
  • Fragmented security controls and access patterns
  • Increased latency due to cross-system synchronization
  • Higher operational complexity in high-throughput environments

An Example of Complex Architectures of Hybrid Search

Elasticsearch's Technical Limitations

As the long-standing leader in enterprise search, Elasticsearch has powered everything from e-commerce platforms to logging systems for over a decade. Their dominance in keyword-based search and their extensive enterprise feature set, made them an obvious candidate for bridging the gap between traditional and AI-powered search. Recognizing this opportunity, Elasticsearch made a significant move by introducing vector search capabilities in version 8.0, aiming to unify traditional keyword search with modern vector operations under a single platform.

However, despite its market leadership and ambitious steps toward vector search capabilities, Elasticsearch faces several fundamental constraints that impact its effectiveness in modern search architectures:

Performance and Resource Management

The tight coupling between write operations, index construction, and query processing creates substantial overhead during data ingestion and indexing. This architectural decision leads to CPU and I/O contention, particularly during bulk updates. The "near real-time" search design introduces latency that can impair AI applications requiring rapid interactions or dynamic decision-making.

Architectural Constraints

The JVM-based architecture introduces additional complexity through memory management overhead and garbage collection pauses. Organizations often find themselves over-provisioning resources to maintain stable performance as data volumes grow. The non-cloud-native design, with tightly coupled storage and compute, requires simultaneous scaling of both resources, reducing deployment flexibility.

Scaling Limitations

The shard management system presents a fundamental dilemma: too many shards in small datasets lead to performance degradation. However, too few shards in large datasets limit scalability and cause data distribution imbalances. In multi-replica scenarios, the requirement for independent index construction per shard multiplies compute costs and reduces resource efficiency.

Vector Search Performance

The vector search implementation, built on Lucene's core, shows notable performance gaps compared to specialized engines, particularly in high-dimensional spaces. Internal benchmarks indicate performance degradation becomes acute beyond 128 dimensions, with query latency increasing exponentially in high-concurrency scenarios.

Technical Innovation: A New Approach to Unified Search

Recent advances in search technology have emerged from extensive experience with real-world search challenges across thousands of production deployments. A consistent pattern has emerged through deep engagement with the open-source community. While vector search excels at semantic matching, organizations often need to maintain separate systems for traditional keyword search, leading to architectural complexities and operational overhead.

This insight led to a fundamental reimagining of how unified search capabilities could work in the open source vector database Milvus. The result is Sparse-BM25, introduced by Milvus 2.5, an enhanced approach that leverages vector operations while preserving the precision of traditional keyword matching. This technical innovation builds on Milvus's expertise in vector operations while addressing the broader community's need for unified search capabilities.

Core Technical Architecture

The implementation introduces several key technical innovations:

  • Advanced text processing pipeline powered by Tantivy, delivering
    sophisticated stemming, lemmatization, and stop word filtering
    capabilities with support for over 30 languages
  • Distributed term frequency management system that efficiently handles
    corpus-wide statistics at scale, with automatic rebalancing and
    optimization
  • Novel sparse vector generation utilizing Corpus TF for documents and
    Query TF with global IDF for queries, optimized for high-cardinality
    vocabularies
  • Custom BM25 distance function optimized for high-dimensional sparse
    vector spaces, with configurable saturation parameters

The system implements a sophisticated indexing hierarchy that adapts to query patterns. The WAND-based inverted index provides efficient retrieval for standard queries, while the Block-Max WAND algorithm development promises further optimization for long-tail queries. Through careful pruning of long-tail terms and vector quantization, the system achieves more than 5x performance improvement while keeping recall degradation under 1% and reducing memory usage by over 50%.

The graph index support extends beyond basic HNSW implementation, incorporating dynamic graph pruning and rebuilding based on query patterns. This approach has shown particular effectiveness in handling high-dimensional sparse vectors, where traditional inverted indices struggle with the curse of dimensionality. The system automatically maintains optimal graph connectivity while minimizing index maintenance overhead.

Technical Advances in Modern Search Systems

Algorithm Flexibility

Modern search systems transform similarity calculations into vector distance computations, enabling more complex queries and corpus distance analysis. Recent research like "End-to-End Query Term Weighting" has introduced innovations such as the Term Weighting BERT (TW-BERT) algorithm, which uses BERT to infer n-gram term weights from queries and construct query representations. When combined with BM25 for calculating document relevance, these approaches demonstrate significant improvements over traditional token-based methods in both in-domain and out-of-domain scenarios.

Efficiency Considerations

Advanced search architectures implement lexical search through sparse vectors, combining traditional inverted index compression techniques with lossy compression for dense embeddings. Through careful pruning of long-tail terms and vector quantization, these systems can achieve substantial performance improvements while maintaining high recall rates and reducing resource usage. Ongoing research continues to optimize data compression to further reduce storage costs and query I/O overhead.

Handling Complex Queries

While traditional search engines commonly use WAND (Weak AND) technology to optimize inverted index queries, its effectiveness can diminish with long queries due to excessive inverted list intersections and reduced pruning efficiency. Modern approaches address this by combining sparse embeddings with graph indices like HNSW, showing particular promise for sparse vectors with high dimensionality.

These technical advances in unified vector and keyword search are particularly significant for RAG applications, where queries often combine natural language questions with specific technical terms or metadata filters. Efficiently processing these mixed queries while maintaining high performance is crucial for production deployments.

Looking Forward: The Future of Search

Remember that frustrated developer from our opening scenario? Today's reality is that teams shouldn't have to choose between semantic understanding and precise matching, performance and flexibility, or simplicity and scale. As our industry continues to push the boundaries of AI and search technology, we're moving toward a future where search infrastructure becomes truly intelligent - understanding not just what users are searching for but why they're searching for it.

The next frontier isn't just about combining vector and keyword search - it's about creating systems that can dynamically adapt to different types of queries, learn from user interactions, and seamlessly integrate with the growing ecosystem of AI tools and frameworks. As we continue to build and experiment with these technologies in Milvus, we're not just solving technical challenges - we're reshaping how humans interact with and make sense of the vast amounts of information at our disposal.

This is an incredibly exciting time for the developers and engineers working on these challenges every day. The tools and capabilities we're building today will form the foundation of how future generations will access and understand information. And that's worth getting right.

If you read this far, tweet to the author to show them you care. Tweet a Thanks
Great article James! I love how you highlighted the balance between semantic and keyword search, especially the challenges with hybrid architectures. Sparse-BM25 sounds like a game-changer. Curious—how do you see tools evolving to make these advanced systems more accessible for smaller teams? Looking forward to your thoughts!
E-Commerce: In e-commerce settings, unified search can improve product discovery by understanding user queries in a more human-like manner, potentially increasing sales and customer loyalty.

More Posts

The Future of Search: How AI is Revolutionizing Online Discovery

Brian Keary - Jan 28

Programming Languages of Late 2020's and Beyond

Brian Keary - Jan 25

Learn how to write GenAI applications with Java using the Spring AI framework and utilize RAG for improving answers.

Jennifer Reif - Sep 22, 2024

MongoDB: A Beginner's Guide to NoSQL Databases

Anadudev - Feb 2

Meet the .csv datasets and understand how they works on creating expert system Artificial Intelligence models

Elmer Urbina - Oct 19, 2024
chevron_left