I Built an Tool to AI Agent Testing

Question

I Built an Tool to AI Agent Testing

Fernando Richter posted Jul 24 1 min read

Excited to share my latest open-source project: the AI Agent Tester!

As AI models become more integrated into our applications, how do we ensure their responses are consistent and reliable? Manually testing prompts is slow and doesn't scale. That's why I created this tool.

The AI Agent Tester automates the validation process. It reads prompts from a simple CSV file, sends them to an AI (like OpenAI's GPT), and checks the responses for expected keywords.

Here’s what makes it effective:

Intelligent Validation: It uses "stemming" with NLTK to recognize word variations (e.g., 'fly', 'flying', 'flew'), making validation more robust.
Detailed Reports: It generates a JSON report with the status (Success/Fail) for each prompt, along with the AI's full response.
Easy to Use: Built with Python and requires minimal setup. It even has automatic proxy support for corporate environments.

This project is for any developer or QA engineer working with Large Language Models who wants to add a layer of automated testing to their workflow.

It's open-source, and I would love to get your feedback or contributions!

Check out the project on GitHub: https://github.com/lfrichter/ai-agent-test

If you read this far, tweet to the author to show them you care. Tweet a Thanks

1 Comment

chevron_left

Muzzamil Abbas · Answer 1 · 2025-10-20T12:16:33+0000

Muzzamil Abbas • 5 days

Nice idea using stemming to make the checks less brittle. In what ways do you see this kind of tester evolving once models start generating more fluid and implicit answers rather than keyword-visible ones?

Fernando Richter • 5 days

@[Muzzamil Abbas] Great point, Muzzamil! Keyword validation is ideal for regression and fact testing.

To handle more fluid and implicit responses, I see the future of the tester as threefold:

Semantic Validation (LLM as Evaluator): Using a second LLM to judge whether the response meets the intent or meaning of the requirement, instead of searching for keywords.
Structural Verification: Integrating parsers to validate specific formats (JSON, code, etc.).
Embedding Evaluation: Comparing the vector distance between the response and the ideal output, ensuring semantic similarity even with different vocabulary.

The first point is the most viable and will be fundamental to the future of the tool. Your ideas and contributions are very welcome in this evolution!

	How I Built PromptBank: The AI-Powered Bank That Lets You Yell at Your Money (And It Listens) Fred - Oct 17
	How to Install and Use ComfyUI and SwarmUI on Massed Compute and RunPod Private Cloud GPU Services FurkanGozukara - Oct 18
	Hunyuan Image 2.1 by Tencent Full Tutorial and 1-Click to Install Ultra Advanced App to Use Locally FurkanGozukara - Sep 10
	Open-source SDK to fine-tune, orchestrate & deploy transformer/non-transformer LLMs in one place multimind - Jun 28
	The first AI-native proxy designed specifically for agent communication joins Linux Foundation. Tom Smith - Aug 25

I Built an Tool to AI Agent Testing

1 Comment

Please log in to add a comment.

Please log in to comment on this post.

More Posts

How I Built PromptBank: The AI-Powered Bank That Lets You Yell at Your Money (And It Listens)

How to Install and Use ComfyUI and SwarmUI on Massed Compute and RunPod Private Cloud GPU Services

Hunyuan Image 2.1 by Tencent Full Tutorial and 1-Click to Install Ultra Advanced App to Use Locally

Open-source SDK to fine-tune, orchestrate & deploy transformer/non-transformer LLMs in one place

The first AI-native proxy designed specifically for agent communication joins Linux Foundation.

More From Fernando Richter

Building a Robust API with Laravel, Clean Architecture, and SOLID Principles

The Essence of Development: Unravelling the Complexities to Build High-End Applications

My end-to-end automated YouTube video factory

Welcome to Coder Legion Community

with 2,570 amazing developers

Connect with

Already have an account? Log in

I Built an Tool to AI Agent Testing

1 Comment

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Fernando Richter