ReplyJet — How I Built an AI That Replies Like a Real Human (Not a Chatbot)
TL;DR: I built a Next.js + Groq app that detects customer intent and generates short, structured, natural-sounding replies in Egyptian dialect Arabic or English — in under 400ms. Here's the full architecture, the prompt engineering that actually worked, and the mistakes I made along the way.

The Problem
Every business selling online in the Middle East deals with the same pain: customers send angry, impatient, or vague messages on WhatsApp and Facebook — and whoever is replying either sounds too formal, too robotic, or just pastes the wrong template.
The gap isn't AI capability. It's tone.
Most AI tools that support Arabic output sound like a translated textbook. Sentences are technically correct but nobody actually talks that way. Customers notice. Trust drops. Sales are lost.
I wanted to close that gap with a tool built specifically around how people actually write and speak.
What ReplyJet Does
ReplyJet takes a raw customer message, auto-detects its intent, then generates a short, structured, human reply in Egyptian colloquial Arabic or English — in under one second.
Example:
Input: "I'm going to destroy your restaurant if you don't fix this"
Output: "We completely understand your frustration, and we sincerely
apologize for the inconvenience. Let us fix this right away —
could you send us your order number?"
That reply took ~380ms. No template. No copy-paste. Pure inference with a structured prompt.
Architecture
User Input + Mode Selection
|
v
Mode === "auto"?
YES --> detectUserIntent() (keyword scan, zero API cost)
NO --> use selected mode (complaint / close_sale / follow_up)
|
v
buildSystemPrompt() (per intent x tone x language)
|
v
Groq API (llama-3.1-8b-instant, temp=0.3, max_tokens=60-400)
|
v
{ reply, intent, tone, language, mode } --> UI + History
Tech Stack
| Layer | Technology | Version |
| Framework | Next.js App Router | 14.2.3 |
| UI | React | 18.2.0 |
| AI Provider | Groq API | llama-3.1-8b-instant |
| Deployment | Vercel | — |
| Language | JavaScript | ES2024 |
The Hard Part: Prompt Engineering for Dialect Arabic
This is where most AI tools fail, and where ReplyJet does something different.
What didn't work
"Reply in Egyptian Arabic in a friendly, professional tone."
This gives a different phrasing every single time. Sometimes formal. Sometimes casual. Sometimes formal-but-with-emojis. Never consistent — which is a problem when a business needs every reply to sound like it came from the same person.
What worked: Forced Structure
Instead of describing the desired output, I gave the model a fixed script to follow:
You MUST follow this EXACT structure:
1. Start with:
"We completely understand your frustration"
OR
"We're sorry this happened"
2. Then apology:
"We sincerely apologize for the inconvenience"
3. Then action:
"Let us fix this right away"
4. Then ask:
"Could you send us your order number?"
RULES:
- Use ONLY short sentences
- Do NOT generate new sentence styles
- Do NOT deviate from the structure above
The model follows this reliably at temperature=0.3. The output sounds like a real support agent every time.
Key insight: At low temperature, the model becomes a compliant script follower, not a creative writer. That's exactly what customer support needs.
Intent Detection — Free and Instant
Before hitting the Groq API, ReplyJet runs a fast local keyword scan to classify intent:
function detectUserIntent(message) {
const lower = message.toLowerCase();
const angrySignals = [
"destroy", "angry", "complaint", "refund", "unacceptable",
"terrible", "worst", "never again", "lawsuit", "disgusting"
];
const salesSignals = [
"price", "how much", "cost", "available", "buy", "order",
"discount", "shipping", "stock", "delivery"
];
if (angrySignals.some(s => lower.includes(s))) return "angry";
if (salesSignals.some(s => lower.includes(s))) return "sales";
return "normal";
}
This costs zero tokens, runs in microseconds, and is accurate for ~95% of real support messages. The mode selector lets users override it manually when the message is ambiguous.
The API Route
The full AI logic lives in one clean Next.js API route:
// app/api/generate/route.js
export async function POST(req) {
const { message, tone, language, mode, maxTokens } = await req.json();
if (!message?.trim()) {
return NextResponse.json({ error: "Message required" }, { status: 400 });
}
// Resolve intent: use mode override or auto-detect
const intent = mode === "auto" ? detectUserIntent(message) : mode;
// Build structured prompt per intent x tone x language
const systemPrompt = buildSystemPrompt(intent, tone, language);
// Groq inference
const response = await groq.chat.completions.create({
model: "llama-3.1-8b-instant",
messages: [
{ role: "system", content: systemPrompt },
{ role: "user", content: message }
],
temperature: 0.3,
max_tokens: maxTokens || 180
});
const reply = response.choices[0].message.content.trim();
return NextResponse.json({
success: true,
data: { reply, intent, tone, language, mode }
});
}
No middleware. No over-engineering. The entire intelligence is in buildSystemPrompt().
Why Groq?
I tested three providers running the same model (llama-3.1-8b):
| Provider | Avg Latency | Cost per 1M tokens |
| Groq | ~380ms | Free tier available |
| Together AI | ~900ms | $0.18 |
| OpenRouter | ~1,100ms | $0.055 |
For a tool where users expect instant feedback — typing a message and seeing a reply appear — Groq's speed was the deciding factor. The free tier covers development and early production comfortably.
Supported Modes
| Mode | Trigger | What the reply does |
| Auto | Default | Detects intent from keywords |
| Complaint | Angry customer | Structured apology → action → order request |
| Close Sale | Price / availability questions | Value first → CTA |
| Follow Up | Manual re-engagement | Warm message → soft CTA |
5 Things I Learned Building This
1. Temperature is the most important parameter.
At temp=0.7, the model gets creative and unpredictable. At temp=0.3, it stays on-structure. For customer support, consistency beats creativity every time.
2. Forced structure beats description.
"Sound natural and professional" produces inconsistent output. Numbered steps with explicit sentence starters produce reliable, human-sounding output.
3. Intent detection should be offline-first.
Using an LLM to classify intent wastes tokens and adds 300–500ms of latency. A keyword scan handles 95% of cases at zero cost.
4. Short replies win.
I cap angry responses at 180 tokens. A WhatsApp customer reading a reply wants 2–3 sentences max. Longer = ignored, or worse, escalates the anger.
5. One API route is enough.
I started with a service layer, middleware, and multiple handlers. I ended up with one clean route.js. Simpler code, faster debugging, easier deployment.
Roadmap
- [x] Intent detection + structured reply generation (v1.0.0)
- [x] Regenerate, History, Settings, Keyboard shortcuts (v1.1.0)
- [x] Mode selector: Complaint / Close Sale / Follow Up (v1.2.0)
- [ ] Templates library + saved reply sets (v1.3.0)
- [ ] Chrome extension + WhatsApp / Facebook integration (v2.0.0)
Try It / Contribute
The full codebase is on GitHub: github.com/SamoTech/ReplyJet
PRs are welcome. If you're building anything in the customer support or e-commerce space in Arabic-speaking markets, I'd love to hear what you're working on.
Built by Ossama Hashim · SamoTech · Cairo
Tags: #AI #NextJS #Groq #PromptEngineering #CustomerSupport #OpenSource #JavaScript #LLM