Nice idea using stemming to make the checks less brittle. In what ways do you see this kind of tester evolving once models start generating more fluid and implicit answers rather than keyword-visible ones?
I Built an Tool to AI Agent Testing
Fernando Richter
posted
1 min read
1 Comment
Fernando Richter
•
@[Muzzamil Abbas] Great point, Muzzamil! Keyword validation is ideal for regression and fact testing.
To handle more fluid and implicit responses, I see the future of the tester as threefold:
- Semantic Validation (LLM as Evaluator): Using a second LLM to judge whether the response meets the intent or meaning of the requirement, instead of searching for keywords.
- Structural Verification: Integrating parsers to validate specific formats (JSON, code, etc.).
- Embedding Evaluation: Comparing the vector distance between the response and the ideal output, ensuring semantic similarity even with different vocabulary.
The first point is the most viable and will be fundamental to the future of the tool. Your ideas and contributions are very welcome in this evolution!
Please log in to add a comment.
Please log in to comment on this post.
More Posts
- © 2026 Coder Legion
- Feedback / Bug
- Privacy
- About Us
- Contacts
- Premium Subscription
- Terms of Service
- Refund
- Early Builders
chevron_left
More From Fernando Richter
Related Jobs
- Chief Applied Scientist - AI & Agentic Systems (Healthcare & Life Sciences)Oracle · Full time · Springfield, IL
- Software Engineer, US Exchange ToolingPolymarket · Full time · New York, NY
- Senior Software Engineer - Agentic AI PlatformP-1 Ai · Full time · United States
Commenters (This Week)
kate8382
2 comments
Akshatverified
2 comments
buildbasekit
1 comment
Contribute meaningful comments to climb the leaderboard and earn badges!