Nice idea using stemming to make the checks less brittle. In what ways do you see this kind of tester evolving once models start generating more fluid and implicit answers rather than keyword-visible ones?
I Built an Tool to AI Agent Testing
Fernando Richter
posted
1 min read
2 Comments
Fernando Richter
•
@[Muzzamil Abbas] Great point, Muzzamil! Keyword validation is ideal for regression and fact testing.
To handle more fluid and implicit responses, I see the future of the tester as threefold:
- Semantic Validation (LLM as Evaluator): Using a second LLM to judge whether the response meets the intent or meaning of the requirement, instead of searching for keywords.
- Structural Verification: Integrating parsers to validate specific formats (JSON, code, etc.).
- Embedding Evaluation: Comparing the vector distance between the response and the ideal output, ensuring semantic similarity even with different vocabulary.
The first point is the most viable and will be fundamental to the future of the tool. Your ideas and contributions are very welcome in this evolution!
Please log in to add a comment.
Please log in to comment on this post.
More Posts
- © 2026 Coder Legion
- Feedback / Bug
- Privacy
- About Us
- Contacts
- Premium Subscription
- Terms of Service
- Refund
- Early Builders
chevron_left
More From Fernando Richter
Related Jobs
- AWS Agentcore Platform EngineerVDart · Full time · Waynesboro, PA
- Agentic AI Technical Mentorjobgether · Temporary · India
- Agentic AI Technical Mentorjobgether · Temporary · Switzerland County, IN
Commenters (This Week)
ScriptMasterLabs
1 comment
LegendsDaD
1 comment
lovestaco
1 comment
Contribute meaningful comments to climb the leaderboard and earn badges!