DevLog 20250820: Towards Unified Chat Gateway
Overview
Ever since the general availability of ChatGPT, the problem of building a custom chatbot seemed - at least for a while - like it should have been greatly simplified.
Then came the widespread use of RAG, and now we've entered the era of MCP. A marvelous journey indeed.
But let's return to the chatbot problem.
Traditional Chatbot Development
The first chatbot I built had 5,000 lines of code - most of them simple if-else statements. The last one I built had basic English grammar built-in and could even converse with other bots.
Traditionally, chatbots were built around a few core methods:
- Rule-Based Systems – Early bots relied on scripted "if-then" rules or pattern matching (like ELIZA). They could handle predictable inputs but broke down when faced with anything unexpected.
- Retrieval-Based Bots – Instead of generating new text, these selected the best response from a predefined set using keyword search or similarity measures. They worked well for FAQs but lacked flexibility.
- Generative Models (Pre-LLM) – With Seq2Seq models (RNNs, LSTMs), chatbots began producing novel responses. While more dynamic, these struggled with coherence and factual accuracy.
- Hybrid Approaches – Developers often combined rules, retrieval, and basic generative models, managed by a dialogue manager to maintain context. Knowledge bases or ontologies were common in customer support.
Less traditionally ones like Perplexity by Eric Zinda seems to have full understanding of semantics of words (and is built using Prolog).
Importantly, in practice, most production "Chat" windows on websites were not fully automated. They were a blend of machine + human, where the chatbot handled basic queries and human agents stood by to take over for complex requests. This ensured customer satisfaction while keeping automation lightweight. A notable example, widely used today, is Crisp. I mention it here only because I consider it a competitor to what we are aiming to offer.
The Challenges with Using LLM Services
The Python community is blessed. They always seem to get the latest of everything - machine learning libraries, deep learning frameworks, and APIs. For most Python users, invoking ChatGPT or other inference providers (like Hugging Face or NIM) is just a matter of calling an API. For the rest of the world (e.g., C# developers), it usually means making HTTP requests - until an official native library is available.
There's also the matter of streaming API responses. This is often marketed as a big UI boost, and I've attended LangChain seminars emphasizing how responsive GUIs make LLMs appear "less slow." But in reality, for non-trivial tasks, I expect some thinking time. As long as there's a clear visual cue showing the service is alive, I'm fine. Streaming isn't always essential.

That's also exactly what happens in IM apps for conversations between real people.
How Knowledge Retrieval Works
LLMs predict the next word. The essence of an LLM-based chat agent is to give it the right context. RAG provides that context by feeding the LLM (as system prompt) the necessary information, automatically retrieved through similarity search.
The most extreme form of "RAG" - which I've heard of in production systems at NVIDIA's GDA conference - is simply passing everything needed into the system prompt. Don't laugh - sometimes the simplest solution is the most effective.
Vector databases store these embeddings, but today there are far too many to choose from. For simple cases, you can get by with a basic cosine similarity calculation if the dataset isn't too large.
The Challenges
Building a real-world chatbot typically involves:
- Connecting to a backend LLM service
- Assembling system prompts - either by concatenating everything or using RAG
- For RAG systems: uploading documents, generating embeddings, storing them, and enabling queries
- Providing a functional GUI
That's it! Simple in theory - but even the first step can be daunting.
First, there are so many LLM services to choose from. On top of that, higher-level offerings like Google NotebookLM, AWS Bedrock, Hugging Face Inference Engine, and NIM complicate the picture.
Here's the secret: the only way to know which works best for you is to try them - extensively.
Second, even when using a hosted inference service instead of self-hosting, there's the issue of authentication if your product is public-facing. Without safeguards, anyone could hijack your service. That means setting up an authentication server.
System prompt assembly and vector database work, by contrast, are routine engineering tasks - straightforward enough once you know what you're doing.
What I Really Want
As a developer, what I really want is:
- I have a bunch of files/documents
- I upload them to a service
- The service creates a ready-to-use agent
- I simply call that specialized agent for my purpose
In short, I'd like to embed a chatbot with just a snippet like this:
<script src="/widget/chat-widget.js"
        data-chat-widget
        data-project="ChatGateway"
        data-title="Chat with Methodox (AI)"
        data-display-name="Chat Gateway"
        data-welcome-message="Hi! I'm your AI assistant. Ask me anything."
        data-position="right"
        data-open="false"
        data-primary="#2563eb"></script>
Surely, many providers already offer something similar:
- Chatbase
- Azure
- Clause Enterprise
- Tidio
- Octane AI
- Social Intents
- Regal...
But here's the reality:
- Chatbase leans heavily on analytics and FAQ-style bots.
- Azure gives you the parts (Cognitive Search + OpenAI) but makes you engineer the pipeline.
- Clause Enterprise is weighed down by compliance workflows.
- Tidio focuses on rule-based customer support flows.
- Octane AI is optimized for e-commerce funnels.
- Social Intents is designed for live handoffs, not retrieval.
- Regal focuses on sales/CRM engagement.
All of these involve vendor lock-in, integration overhead, or enterprise baggage. You end up hacking around features you don't need.
In short, there's no truly lightweight option for just documents + embeddings + an LLM wrapper.
Summary
To recap, a proper service for this use case should be:
- A host for uploaded documents
- An API server for RAG-based retrieval
- Designed for direct public use
The last point is crucial: such a service must allow direct frontend embedding, without worrying about API key protection.
What's Next
We're now building exactly the service I've been describing: Methodox Chat Gateway.
Chat Gateway is a production-ready chat layer that lets you add an AI assistant to any site or app - powered by your own knowledge base. Think of it as documents + embeddings + an LLM wrapper, packaged into something you can deploy in minutes.
With just a single <script> tag, you can drop a polished chat widget into your frontend, or spin up a full-page project chat at /Project/{name}. Each project has its own scoped assistant, model settings, and optional vector store for retrieval. RAG status is even shown transparently (“RAG: on/off”) so your users know when answers are grounded in your documents.
Behind the scenes, Chat Gateway provides:
- Simple embedding – add a chat button with no-code widget setup.
- Project-scoped assistants – each project is isolated, configurable, and connected only to its own knowledge base.
- Hosted project pages – share live demos or support portals instantly.
- Straightforward API – post user input and get structured responses, with continuity across sessions.
- Built-in guardrails – request limits and concurrency caps to keep things reliable.
Unlike heavier enterprise platforms, Chat Gateway is designed for direct public use: no extra infrastructure for API key protection, no bloated CRM features, and no vendor lock-in. It's simply a lightweight way to put your content to work through an AI assistant.
If you want to see it in action, check out the Playground or visit the live demo at /Project/ChatGateway.
We'll be sharing a full launch announcement soon  -  but the vision is clear: a frictionless, developer-friendly way to embed intelligent, knowledge-backed chat into any experience.
References