The first stage of AI work is prompting.
The last stage is removing the model from most of the workflow.
That sounds backwards.
It is not.
When a workflow is new, the LLM is useful because the work is still ambiguous. You are discovering what good looks like. You try a prompt, read the output, adjust the examples, change the tone, add constraints, and run it again.
That is a good use of AI.
But if the same workflow keeps coming back, and you are still explaining it to the model every time, you are not building capability. You are repeating yourself with a better interface.
The mature workflow is not one where the LLM does everything.
The mature workflow is one where the LLM only handles the part where ambiguity is useful.
Everything else becomes process.
Prompting is where most people start because it is the fastest way to get feedback.
You ask:
● "Write this article."
● "Make it sound less generic."
● "Use my style."
● "Add examples."
● "Make the intro stronger."
At this stage, the model is helping you figure out the shape of the task.
You do not fully know the target yet. You are exploring. You are testing whether the idea even works. You are using the model as a thinking partner, a drafter, a critic, and sometimes a mirror.
That is fine.
The mistake is treating this as the final form.
If you have to keep saying the same thing, you do not have a workflow. You have a recurring conversation.
The next stage is usually better prompting.
You add examples. You add constraints. You add a target audience. You tell the model what to avoid. You define the output format. You write a longer system prompt.
The output improves.
For a while, this feels like progress.
But longer prompts have a hidden failure mode: the model still has to remember and balance everything at once.
Style rules. Factual constraints. Tone. Audience. Platform conventions. Examples. Edge cases. Forbidden phrases. Review criteria.
All of it goes into one big instruction block.
The prompt becomes a pile of expectations, and the model is still the one deciding which expectations matter in the moment.
That is fragile.
At some point, "make the prompt better" stops being the right move.
The next level is turning repeated prompting into a skill.
A skill packages the context and process:
● what files or sources to read
● what examples matter
● what tone to use
● what scripts to run
● what output format is expected
● what review criteria should be applied
● what fallback path to use when something breaks
This is a real improvement.
The workflow becomes portable. You stop explaining everything from scratch. The model gets the right context faster. You are no longer relying on whatever happens to be in the current chat thread.
For many AI workflows, this is the first serious productivity jump.
But skills have their own failure mode.
They can become too large.
A skill can start as a clean reusable process and slowly turn into another giant prompt.
More instructions.
More exceptions.
More "always do this."
More "never do that."
More examples.
More scripts.
More personal preferences.
At some point, the skill is not making the workflow reliable. It is just giving the model more things to interpret.
The model still has to decide what matters.
The model still has to judge whether the output is good enough.
The model is still checking its own homework.
That is the point where the workflow needs to move outside the LLM.
Not all of it.
The stable parts.
. .
Developers already understand this when we talk about code.
We do not ask a developer, human or AI:
"Does this code look correct?"
We run the formatter.
We run the linter.
We run the tests.
We run the type checker.
We run CI.
We use pre-commit hooks.
The model can generate code, but it does not get to decide whether the code passes.
The gate decides.
That is the important shift.
If a rule can be checked deterministically, it should not live only inside a prompt.
For code, the gates are obvious:
● formatting
● linting
● type checks
● unit tests
● integration tests
● golden files
● JSON schema validation
● pre-commit hooks
● CI checks
The agent can still do useful work. It can draft the patch, explain a tradeoff, write a test, debug a failure, or propose a smaller path.
But the agent should not be the final authority on whether the work satisfies the standard.
That authority should be outside the model.
Content feels less deterministic than code, so people keep more of the workflow inside the prompt.
But the same principle applies.
If you have found a content formula that works, do not just ask the model to remember it.
Turn it into gates.
For example, before publishing an article:
● Does the title match the actual promise of the article?
● Does the first section create tension quickly?
● Is the reader obvious?
● Is there one clear argument?
● Does every section move the argument forward?
● Are there generic AI phrases that should be removed?
● Does the article contain a real observation or only recycled advice?
● Is there a concrete next action for the reader?
● Does the tag strategy match the article, not just the broad topic?
These checks are not perfect.
But they are better than "make it good."
"Make it good" is a vibe.
A gate is a standard.
The more often a workflow matters, the more it deserves standards that do not depend on the model's mood.
%
The mature version of an AI workflow is not:
"The LLM does everything."
It is:
"The LLM does the part where ambiguity is useful. The system handles the rest."
That might mean the model only does 20% of the workflow.
And that is a good thing.
For a content workflow, the LLM might:
● propose angles
● compare possible hooks
● draft sections
● rewrite unclear paragraphs
● find contradictions
● suggest examples
But the system should handle:
● loading the source notes
● selecting the target platform
● applying the tag strategy
● checking title/promise match
● scanning for banned phrases
● verifying links
● enforcing formatting
● creating the publishing task
For a coding workflow, the LLM might:
● inspect the codebase
● implement the change
● write tests
● explain a failure
● reduce a patch
But the system should handle:
● formatting
● linting
● type checking
● test execution
● schema validation
● contract checks
● CI gates
That is not using AI less because AI is weak.
It is using AI less because the workflow is becoming stronger.
,
There is a simple rule I keep coming back to:
If code can check it, do not ask the model to remember it.
If a script can clean it, do not spend tokens reasoning about it.
If a test can catch it, do not rely on a sentence in a prompt.
The LLM should be used when ambiguity remains.
It should not be the first line of defense for things that are already measurable.
This is where many AI workflows waste effort. They ask the model to handle everything:
● classify the task
● remember the rules
● generate the output
● check the output
● decide whether the output is done
● explain why it is done
That is too much responsibility in one probabilistic step.
Split the work.
Let the model handle the ambiguous part.
Let deterministic systems handle the stable part.
If you want to know where your workflow is immature, ask these questions:
- What do I keep explaining to the model again and again?
That probably belongs in a skill.
- What does the model keep judging by itself?
That probably belongs in a gate.
- What failure would be obvious to a script, linter, test, schema, or checklist?
That should not live only in the prompt.
- If I removed the LLM tomorrow, which parts of the workflow would still be clear?
Those parts are real process.
- Which parts only work because the model is being generous?
Those parts are risk.
This self-check is uncomfortable because it reveals how much "automation" is just trust in a model call.
But that is the point.
Capability is not how much work you can hand to the model.
Capability is how much of the workflow still holds when the model is only doing the part it is actually good at.
The pattern looks like this:
Prompt -> Skill -> Gate -> System
Or:
Ask -> Package -> Validate -> Automate
When the task is new, prompt.
When the task repeats, create a skill.
When the skill succeeds, move the stable checks into gates.
When the gates are stable, reduce the LLM's responsibility.
This is the path from beginner prompting to a 20% LLM workflow.
It does not make the model irrelevant.
It puts the model in the right place.
There is a lot of advice about better prompts.
Some of it is useful.
But better prompting is not the destination.
Better prompting helps you discover the workflow.
Skills help you repeat the workflow.
Gates help you trust the workflow.
Systems help you scale the workflow.
If you stop at prompting, every task stays a negotiation.
If you stop at skills, every process still depends on the model interpreting the skill correctly.
If you add gates, the model has something it must pass.
That is the difference between a helpful assistant and a reliable workflow.
The useful question is no longer:
"How do I write a better prompt?"
The useful question is:
"Which part of this should stop being a prompt?"
If it is repeated context, make it a skill.
If it is a stable rule, make it a checklist.
If it is measurable, make it a test.
If it is non-negotiable, make it a gate.
Let the LLM handle ambiguity.
Make the system handle standards.
Use prompts to discover.
Use skills to repeat.
Use gates to scale.
And when the workflow is mature, let the LLM do less.
That is not a downgrade.
That is capability becoming real.
Which part of your AI workflow should stop being a prompt?