Integrating a Retrieval-Augmented Generation (RAG) System Using Python and OpenAI

Integrating a Retrieval-Augmented Generation (RAG) System Using Python and OpenAI

posted 3 min read

Retrieval-Augmented Generation (RAG) is a powerful AI approach that combines retrieval-based systems with generative language models to produce more accurate, context-aware, and grounded responses. Rather than relying solely on the model’s training data, RAG fetches relevant documents or knowledge snippets from an external data source
In this article, we'll walk through how to build a simple RAG system using Python and OpenAI’s GPT models combined with a basic document retrieval technique.

What is a RAG System?

  • Retrieval: The system first searches a database of documents or
    passages relevant to the user's query.

  • Augmentation: The retrieved content is passed as context to a large
    language model (LLM).

  • Generation: The LLM generates a response conditioned on both the
    query and the retrieved documents.

This approach helps improve factual accuracy and relevancy, especially in domains with up-to-date or domain-specific knowledge.

Step 1: Prepare Your Document Store

You need some textual data to retrieve from. For demonstration, let's say we have a small collection of documents stored as simple text snippets.

documents = [
    "Python is a high-level programming language known for its readability and versatility.",
    "OpenAI develops state-of-the-art AI models including GPT series.",
    "Retrieval-Augmented Generation combines retrieval and generative methods to improve 
answer accuracy.",
    "Vector databases can store document embeddings for efficient similarity search."
]

Step 2: Generate Embeddings for Documents

To retrieve relevant documents efficiently, we embed documents and queries into a vector space, then find the closest ones.

import numpy as np

def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

def retrieve_documents(query, document_embeddings, documents, top_k=1):
    query_embedding = get_embedding(query)
    similarities = [cosine_similarity(query_embedding, doc_emb) for doc_emb in document_embeddings]
    # Get indices of top_k most similar documents
    top_indices = np.argsort(similarities)[::-1][:top_k]
    return [documents[i] for i in top_indices]

# Example
query = "What is OpenAI?"
relevant_docs = retrieve_documents(query, document_embeddings, documents, top_k=2)
print("Relevant documents:", relevant_docs)

Step 4: Construct the Prompt for GPT

We now create a prompt that includes the retrieved documents as context, plus the user’s query.

def construct_prompt(documents, question):
    context = "\n\n".join(documents)
    prompt = (
        f"Use the following context to answer the question:\n{context}\n\n"
        f"Question: {question}\nAnswer:"
    )
    return prompt

prompt = construct_prompt(relevant_docs, query)
print(prompt)

Step 5: Generate an Answer Using OpenAI GPT

Now call OpenAI's chat or completion API to generate an answer based on this prompt.

def generate_answer(prompt, model="gpt-4o-mini", max_tokens=150):
    response = openai.ChatCompletion.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        max_tokens=max_tokens,
        temperature=0.3,
    )
    return response['choices'][0]['message']['content'].strip()

answer = generate_answer(prompt)
print("Answer:", answer)

Putting It All Together

Here is a compact example that integrates all steps:

import openai
import numpy as np

openai.api_key = "YOUR_OPENAI_API_KEY"

documents = [
    "Python is a high-level programming language known for its readability and versatility.",
    "OpenAI develops state-of-the-art AI models including GPT series.",
    "Retrieval-Augmented Generation combines retrieval and generative methods to improve answer accuracy.",
    "Vector databases can store document embeddings for efficient similarity search."
]

def get_embedding(text, model="text-embedding-ada-002"):
    response = openai.Embedding.create(input=text, model=model)
    return response['data'][0]['embedding']

document_embeddings = [get_embedding(doc) for doc in documents]

def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

def retrieve_documents(query, document_embeddings, documents, top_k=2):
    query_embedding = get_embedding(query)
    similarities = [cosine_similarity(query_embedding, doc_emb) for doc_emb in document_embeddings]
    top_indices = np.argsort(similarities)[::-1][:top_k]
    return [documents[i] for i in top_indices]

def construct_prompt(documents, question):
    context = "\n\n".join(documents)
    return f"Use the following context to answer the question:\n{context}\n\nQuestion: {question}\nAnswer:"

def generate_answer(prompt, model="gpt-4o-mini", max_tokens=150):
    response = openai.ChatCompletion.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        max_tokens=max_tokens,
        temperature=0.3,
    )
    return response['choices'][0]['message']['content'].strip()

# Example usage
query = "What is OpenAI?"
relevant_docs = retrieve_documents(query, document_embeddings, documents)
prompt = construct_prompt(relevant_docs, query)
answer = generate_answer(prompt)
print("Answer:", answer)

I hope this post will help you to build your RAG system!
Thanks

If you read this far, tweet to the author to show them you care. Tweet a Thanks

Really helpful breakdown of building a simple RAG system — thanks for sharing this! Curious though, have you tried plugging this into a larger vector DB like Pinecone or Weaviate? Wondering how it scales with more complex datasets.

Thanks for your comment, I have tried Pinecone as a vector database for RAG system.
I didn't mention about vector database in this post because this is just simple example of RAG.

More Posts

A multi-agent HR assistant that handles various HR-related queries and actions using OpenAI- Agents Framework

Ramandeep Singh - May 24

Understanding AI Design Patterns: A Deep Dive into the RAG Design Pattern

Aparna Bhat - Jan 17

Learn how to write GenAI applications with Java using the Spring AI framework and utilize RAG for improving answers.

Jennifer Reif - Sep 22, 2024

AI platform & Gaia RAG help devs build secure apps with quantum-proof encryption & protection.

Tom Smith - Jun 21

ReadmeReady: Free and Customizable Code Documentation with LLMs — A Fine-Tuning Approach

Souradip Pal - May 11
chevron_left