Integrating a Retrieval-Augmented Generation (RAG) System Using Python and OpenAI

Question

Integrating a Retrieval-Augmented Generation (RAG) System Using Python and OpenAI

Michael Liang posted May 14 3 min read

Retrieval-Augmented Generation (RAG) is a powerful AI approach that combines retrieval-based systems with generative language models to produce more accurate, context-aware, and grounded responses. Rather than relying solely on the model’s training data, RAG fetches relevant documents or knowledge snippets from an external data source
In this article, we'll walk through how to build a simple RAG system using Python and OpenAI’s GPT models combined with a basic document retrieval technique.

What is a RAG System?

Retrieval: The system first searches a database of documents or
passages relevant to the user's query.
Augmentation: The retrieved content is passed as context to a large
language model (LLM).
Generation: The LLM generates a response conditioned on both the
query and the retrieved documents.

This approach helps improve factual accuracy and relevancy, especially in domains with up-to-date or domain-specific knowledge.

Step 1: Prepare Your Document Store

You need some textual data to retrieve from. For demonstration, let's say we have a small collection of documents stored as simple text snippets.

documents = [
    "Python is a high-level programming language known for its readability and versatility.",
    "OpenAI develops state-of-the-art AI models including GPT series.",
    "Retrieval-Augmented Generation combines retrieval and generative methods to improve 
answer accuracy.",
    "Vector databases can store document embeddings for efficient similarity search."
]

Step 2: Generate Embeddings for Documents

To retrieve relevant documents efficiently, we embed documents and queries into a vector space, then find the closest ones.

import numpy as np

def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

def retrieve_documents(query, document_embeddings, documents, top_k=1):
    query_embedding = get_embedding(query)
    similarities = [cosine_similarity(query_embedding, doc_emb) for doc_emb in document_embeddings]
    # Get indices of top_k most similar documents
    top_indices = np.argsort(similarities)[::-1][:top_k]
    return [documents[i] for i in top_indices]

# Example
query = "What is OpenAI?"
relevant_docs = retrieve_documents(query, document_embeddings, documents, top_k=2)
print("Relevant documents:", relevant_docs)

Step 4: Construct the Prompt for GPT

We now create a prompt that includes the retrieved documents as context, plus the user’s query.

def construct_prompt(documents, question):
    context = "\n\n".join(documents)
    prompt = (
        f"Use the following context to answer the question:\n{context}\n\n"
        f"Question: {question}\nAnswer:"
    )
    return prompt

prompt = construct_prompt(relevant_docs, query)
print(prompt)

Step 5: Generate an Answer Using OpenAI GPT

Now call OpenAI's chat or completion API to generate an answer based on this prompt.

def generate_answer(prompt, model="gpt-4o-mini", max_tokens=150):
    response = openai.ChatCompletion.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        max_tokens=max_tokens,
        temperature=0.3,
    )
    return response['choices'][0]['message']['content'].strip()

answer = generate_answer(prompt)
print("Answer:", answer)

Putting It All Together

Here is a compact example that integrates all steps:

import openai
import numpy as np

openai.api_key = "YOUR_OPENAI_API_KEY"

documents = [
    "Python is a high-level programming language known for its readability and versatility.",
    "OpenAI develops state-of-the-art AI models including GPT series.",
    "Retrieval-Augmented Generation combines retrieval and generative methods to improve answer accuracy.",
    "Vector databases can store document embeddings for efficient similarity search."
]

def get_embedding(text, model="text-embedding-ada-002"):
    response = openai.Embedding.create(input=text, model=model)
    return response['data'][0]['embedding']

document_embeddings = [get_embedding(doc) for doc in documents]

def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

def retrieve_documents(query, document_embeddings, documents, top_k=2):
    query_embedding = get_embedding(query)
    similarities = [cosine_similarity(query_embedding, doc_emb) for doc_emb in document_embeddings]
    top_indices = np.argsort(similarities)[::-1][:top_k]
    return [documents[i] for i in top_indices]

def construct_prompt(documents, question):
    context = "\n\n".join(documents)
    return f"Use the following context to answer the question:\n{context}\n\nQuestion: {question}\nAnswer:"

def generate_answer(prompt, model="gpt-4o-mini", max_tokens=150):
    response = openai.ChatCompletion.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        max_tokens=max_tokens,
        temperature=0.3,
    )
    return response['choices'][0]['message']['content'].strip()

# Example usage
query = "What is OpenAI?"
relevant_docs = retrieve_documents(query, document_embeddings, documents)
prompt = construct_prompt(relevant_docs, query)
answer = generate_answer(prompt)
print("Answer:", answer)

I hope this post will help you to build your RAG system!
Thanks

If you read this far, tweet to the author to show them you care. Tweet a Thanks

chevron_left

Ben Kiehl · Answer 1 · 2025-05-15T07:29:44+0000

Really helpful breakdown of building a simple RAG system — thanks for sharing this! Curious though, have you tried plugging this into a larger vector DB like Pinecone or Weaviate? Wondering how it scales with more complex datasets.

Michael Liang · Answer 2 · 2025-05-15T07:35:47+0000

Michael Liang • May 15

Thanks for your comment, I have tried Pinecone as a vector database for RAG system.
I didn't mention about vector database in this post because this is just simple example of RAG.

	A multi-agent HR assistant that handles various HR-related queries and actions using OpenAI- Agents Framework Ramandeep Singh - May 24
	Understanding AI Design Patterns: A Deep Dive into the RAG Design Pattern Aparna Bhat - Jan 17
	Learn how to write GenAI applications with Java using the Spring AI framework and utilize RAG for improving answers. Jennifer Reif - Sep 22, 2024
	AI platform & Gaia RAG help devs build secure apps with quantum-proof encryption & protection. Tom Smith - Jun 21
	ReadmeReady: Free and Customizable Code Documentation with LLMs — A Fine-Tuning Approach Souradip Pal - May 11

Integrating a Retrieval-Augmented Generation (RAG) System Using Python and OpenAI

0 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

A multi-agent HR assistant that handles various HR-related queries and actions using OpenAI- Agents Framework

Understanding AI Design Patterns: A Deep Dive into the RAG Design Pattern

Learn how to write GenAI applications with Java using the Spring AI framework and utilize RAG for improving answers.

AI platform & Gaia RAG help devs build secure apps with quantum-proof encryption & protection.

ReadmeReady: Free and Customizable Code Documentation with LLMs — A Fine-Tuning Approach

More From Michael Liang

What is Web3?

What even is a blockchain?

Executing Denial of Service on smart contracts

Welcome to Coder Legion Community

with 2,570 amazing developers

Connect with

Already have an account? Log in

Integrating a Retrieval-Augmented Generation (RAG) System Using Python and OpenAI

0 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Michael Liang