This sounds great in theory, but outcomes are often fuzzy in real projects. How do you avoid vague or shifting success criteria?
From Prompts to Goals: The Rise of Outcome-Driven Development
5 Comments
@[Sergey C Kryukov] Fair point, and honestly it's the part of this model that hasn't been fully worked out yet.
Goal-setting only works if the goal is specific and measurable. "Lower error rates" is a goal. "Better code" is not. The teams most likely to get value from Jitro-style agents are the ones that already have clear quality metrics tied to observable outcomes — test coverage percentage, p99 latency, a defined accessibility standard. If those don't exist, you're not ready for goal-driven agents; you're still in the phase of figuring out what "done" means.
Shifting criteria is the harder problem. If you move the target mid-run, you've essentially invalidated whatever the agent was optimizing for. That argues for treating goals like sprint commitments — defined, bounded, and not changed while the work is in flight.
None of this is an argument against the approach. It's an argument for engineering rigor around how you define success before you hand it off to an agent. The discipline that's always separated good engineering teams from average ones turns out to matter even more here.
Please log in to add a comment.
Interesting shift. Moving from prompts to outcomes feels less like “better autocomplete” and more like actually changing the developer’s role. If tools like this really work, the hard part won’t be using them it’ll be knowing what goals are worth setting in the first place. That’s where real engineering judgment still matters.
Please log in to add a comment.
Sharp piece Tom!
The "engineering director" framing is exactly right, and it surfaces something worth naming: the transition from prompts to goals doesn't remove the input quality problem, it just moves it up a layer.
A weakly specified goal is a prompt with more leverage. If "improve test coverage" doesn't define what counts as meaningful coverage, which paths matter, or what the tradeoff against velocity looks like, an autonomous agent can burn hours optimizing the wrong surface and produce PR noise instead of signal.
The compute bill and the review burden both scale with goal ambiguity, not goal ambition.
I ran 500 production software prompts today through an 8-dimension quality rubric.
Average score 13/80. The dimension that scored lowest across all of them was examples at 1.01 out of 10, engineers describing what they want without ever showing what "good" looks like. That gap gets more expensive, not less, when the agent is acting asynchronously over a persistent workspace.
The developers who do well with Jitro-style agents won't be the ones who write the cleanest prompts. They'll be the ones who can specify outcomes with the rigor of a product spec, acceptance criteria, counter-examples, what's explicitly out of scope.
That skill is rarer than prompt fluency, and the people who have it will extract an order of magnitude more value from goal-driven agents than everyone else.
Please log in to add a comment.
Please log in to comment on this post.
More Posts
- © 2026 Coder Legion
- Feedback / Bug
- Privacy
- About Us
- Contacts
- Premium Subscription
- Terms of Service
- Refund
- Early Builders
More From Tom Smith
Related Jobs
- Enterprise Account Manager - Nordics (Cybersecurity & Identity Solutions)Quest Software · Full time · Sweden
- Lead Data Engineer Enterprise Reporting & AnalyticsUnknown Company · Full time · San Francisco, CA
- Senior IT Project Manager - Enterprise Technology PMOMAXIMUS · Full time · Springfield, MO
Commenters (This Week)
Contribute meaningful comments to climb the leaderboard and earn badges!