Retrieval-Augmented Generation (RAG) is a powerful AI approach that combines retrieval-based systems with generative language models to produce more accurate, context-aware, and grounded responses. Rather than relying solely on the model’s training data, RAG fetches relevant documents or knowledge snippets from an external data source
In this article, we'll walk through how to build a simple RAG system using Python and OpenAI’s GPT models combined with a basic document retrieval technique.
What is a RAG System?
Retrieval: The system first searches a database of documents or
passages relevant to the user's query.
Augmentation: The retrieved content is passed as context to a large
language model (LLM).
Generation: The LLM generates a response conditioned on both the
query and the retrieved documents.
This approach helps improve factual accuracy and relevancy, especially in domains with up-to-date or domain-specific knowledge.
Step 1: Prepare Your Document Store
You need some textual data to retrieve from. For demonstration, let's say we have a small collection of documents stored as simple text snippets.
documents = [
"Python is a high-level programming language known for its readability and versatility.",
"OpenAI develops state-of-the-art AI models including GPT series.",
"Retrieval-Augmented Generation combines retrieval and generative methods to improve
answer accuracy.",
"Vector databases can store document embeddings for efficient similarity search."
]
Step 2: Generate Embeddings for Documents
To retrieve relevant documents efficiently, we embed documents and queries into a vector space, then find the closest ones.
import numpy as np
def cosine_similarity(vec1, vec2):
return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
def retrieve_documents(query, document_embeddings, documents, top_k=1):
query_embedding = get_embedding(query)
similarities = [cosine_similarity(query_embedding, doc_emb) for doc_emb in document_embeddings]
# Get indices of top_k most similar documents
top_indices = np.argsort(similarities)[::-1][:top_k]
return [documents[i] for i in top_indices]
# Example
query = "What is OpenAI?"
relevant_docs = retrieve_documents(query, document_embeddings, documents, top_k=2)
print("Relevant documents:", relevant_docs)
Step 4: Construct the Prompt for GPT
We now create a prompt that includes the retrieved documents as context, plus the user’s query.
def construct_prompt(documents, question):
context = "\n\n".join(documents)
prompt = (
f"Use the following context to answer the question:\n{context}\n\n"
f"Question: {question}\nAnswer:"
)
return prompt
prompt = construct_prompt(relevant_docs, query)
print(prompt)
Step 5: Generate an Answer Using OpenAI GPT
Now call OpenAI's chat or completion API to generate an answer based on this prompt.
def generate_answer(prompt, model="gpt-4o-mini", max_tokens=150):
response = openai.ChatCompletion.create(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens,
temperature=0.3,
)
return response['choices'][0]['message']['content'].strip()
answer = generate_answer(prompt)
print("Answer:", answer)
Putting It All Together
Here is a compact example that integrates all steps:
import openai
import numpy as np
openai.api_key = "YOUR_OPENAI_API_KEY"
documents = [
"Python is a high-level programming language known for its readability and versatility.",
"OpenAI develops state-of-the-art AI models including GPT series.",
"Retrieval-Augmented Generation combines retrieval and generative methods to improve answer accuracy.",
"Vector databases can store document embeddings for efficient similarity search."
]
def get_embedding(text, model="text-embedding-ada-002"):
response = openai.Embedding.create(input=text, model=model)
return response['data'][0]['embedding']
document_embeddings = [get_embedding(doc) for doc in documents]
def cosine_similarity(vec1, vec2):
return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
def retrieve_documents(query, document_embeddings, documents, top_k=2):
query_embedding = get_embedding(query)
similarities = [cosine_similarity(query_embedding, doc_emb) for doc_emb in document_embeddings]
top_indices = np.argsort(similarities)[::-1][:top_k]
return [documents[i] for i in top_indices]
def construct_prompt(documents, question):
context = "\n\n".join(documents)
return f"Use the following context to answer the question:\n{context}\n\nQuestion: {question}\nAnswer:"
def generate_answer(prompt, model="gpt-4o-mini", max_tokens=150):
response = openai.ChatCompletion.create(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens,
temperature=0.3,
)
return response['choices'][0]['message']['content'].strip()
# Example usage
query = "What is OpenAI?"
relevant_docs = retrieve_documents(query, document_embeddings, documents)
prompt = construct_prompt(relevant_docs, query)
answer = generate_answer(prompt)
print("Answer:", answer)
I hope this post will help you to build your RAG system!
Thanks