Nice idea using stemming to make the checks less brittle. In what ways do you see this kind of tester evolving once models start generating more fluid and implicit answers rather than keyword-visible ones?
I Built an Tool to AI Agent Testing
Fernando Richter
posted
1 min read
1 Comment
Fernando Richter
•
@[Muzzamil Abbas] Great point, Muzzamil! Keyword validation is ideal for regression and fact testing.
To handle more fluid and implicit responses, I see the future of the tester as threefold:
- Semantic Validation (LLM as Evaluator): Using a second LLM to judge whether the response meets the intent or meaning of the requirement, instead of searching for keywords.
- Structural Verification: Integrating parsers to validate specific formats (JSON, code, etc.).
- Embedding Evaluation: Comparing the vector distance between the response and the ideal output, ensuring semantic similarity even with different vocabulary.
The first point is the most viable and will be fundamental to the future of the tool. Your ideas and contributions are very welcome in this evolution!
Please log in to add a comment.
Please log in to comment on this post.
More Posts
chevron_left