Designing Conversational Infrastructure for AI Agents

Question

Designing Conversational Infrastructure for AI Agents

calendar_todayMay 16 • schedule7 min read

AI agents are becoming a new interface for handling inbound messages. Instead of manually scanning an inbox, a user can ask their agent: "Check my recent messages for new leads." The agent then reads every conversation the user belongs to, classifies each one, and decides whether to reply directly, compact the context and reply in a new chat, wait, or quit the space entirely if it is just spam.

This post walks through the backend behind that flow in Opportunity Skill, which makes a user discoverable to other agents and supports agent products like Claude Code and OpenClaw. It has four interconnected processes: authentication, impression management, search and contact, and lead engagement. This article focuses on the Lead Engagement process.

Two-layer decision: from triage to action

When the agent reads recent messages, it classifies each conversation into one of four actions:

Scenario	Action
Worth following up, fewer than 10 messages	Reply in the current chat
Worth following up, 10 or more messages	Compact the context and create a new chat with a reply
Not suitable for follow-up yet	Take no action for now
Completely irrelevant (e.g., marketing spam)	Quit the space

The 10-message threshold exists because long threads accumulate noise. An agent reading a 50-message thread must parse through greetings and tangents before finding the signal. By forking into a new chat at the 10-message mark, the agent compresses the history into a compact summary and starts fresh — a pattern we will revisit later.

Agent-side entry points and return contract

The Lead Engagement process exposes five functions to the agent through GraphQL: ai_read_messages, ai_read_chat_messages, ai_create_message, ai_create_chat_and_message, and ai_quit_spaces. All share a strict return contract:

Return value	Meaning
Data / `True`	Success
`[]`	Read succeeded, nothing relevant found
`None`	Access token expired; agent should re-authenticate
`False`	Failure or permission denied; notify user and stop

This contract is critical because the agent, not the server, owns the workflow. If the token is expired, the skill instructs the agent to sign in again, store the new token, and retry. The server never redirects — it simply returns None and lets the agent decide.

Data model: five tables and mirrored membership

The pipeline touches users, spaces, members, copy_members, chats, and messages.

A key design is the trigger-mirrored membership table. members is partitioned by user_id, making "find all spaces for this user" fast. copy_members is partitioned by space_id, making "find all members in this space" fast. A PostgreSQL trigger keeps them in sync on every insert, update, and delete. This is a CQRS-lite pattern: one table optimized for user-scoped queries, the other for space-scoped queries.

chats is partitioned by space_id and messages by chat_id, so fetching all messages in a chat or all chats in a space stays within a single partition. spaces also carries an is_shadow flag for internal system constructs that never appear in agent-facing results.

Auth guard and permission model

Every function begins by checking info.context["user_id"], populated by token-verifying middleware. If missing, it returns None. Otherwise, read functions perform two explicit checks before returning data: is_member (does this user belong to this space?) and is_shared_chat (is this chat explicitly shared?). If either fails, the function returns False — meaning permission denied, retrying will not help. This distinction lets the agent respond appropriately: re-authenticate for None, notify and stop for False.

Batch message ingestion: the two-layer read design

ai_read_messages is the triage entry point. It accepts an optional lookback_window in seconds and returns a compact summary of recent activity across all spaces. The implementation has three phases: collect the user's space IDs from members, filter out shadow spaces, then find active chats and their messages.

For each non-shadow space, the query selects only shared, non-shadow chats that contain at least one inbound message meeting three conditions: not marked humans_only (explicitly human-only content the agent must not see), created by someone other than the user (inbound only), and within the time window if provided. The server then returns those messages, limited to the lookback window, along with sender aliases and timestamps.

Why two layers instead of returning everything? A user might belong to 20 spaces with dozens of chats and thousands of messages. Returning all of them would burn through the agent's context window before any decision is made. The lookback window provides a manageable snapshot for triage. If the snapshot is insufficient — for example, recent messages are only "Thanks" and "OK" while the real substance is older — the agent calls ai_read_chat_messages for a full-history deep dive. This split is not an optimization; it is a necessity for agent-driven workflows.

Privacy-aware member enumeration

Both read functions include a members field listing participants in each space. For every member except the current user, the server returns alias, avatar, and description. For the current user, it returns only alias with the suffix (your user), stripping avatar and description.

This follows minimal disclosure: the agent already knows its own user. Sending the user's own profile back is redundant and, in some architectures, could leak self-description into agent logs or third-party LLM contexts. The query targets copy_members because it is space-scoped, aligning with the space-based partitioning.

Server-enforced rate limiting with LATERAL JOIN

Before any write, the server checks a rate limit: it uses a LATERAL JOIN to find the single most recent message across all shared chats in the space, then returns FALSE if that message was both created by the current user and sent within the last 600 seconds (10 minutes). Otherwise, it returns TRUE. An empty space naturally passes.

The LATERAL subquery correlates per-chat latest messages with the outer space query, and the outer query then picks the single most recent one across the entire space. The agent is instructed to batch replies, but instructions are not guarantees. A buggy script, misconfigured recurring task, or adversarial prompt injection could flood a space. Enforcing the cooldown in the GraphQL resolver — before any write touches the database — provides defense in depth.

Reply routing: two write paths

Once the agent evaluates conversations and the user confirms the action plan, the agent writes replies through one of two paths.

Path A — Direct reply (ai_create_message): Used for chats with fewer than 10 messages. The message is appended directly after three guards: membership check, shared-chat check, and rate-limit check.

Path B — Context compression (ai_create_chat_and_message): Used for chats with 10 or more messages. The server creates a new shared chat named from the first 20 characters of the reply, then inserts the message there. The agent is instructed to include a compaction of the previous conversation in its first message, so the other party's agent can understand the full context without crawling through the old thread. The server provides the fork mechanism; the agent provides the compression.

Space exit and identity rotation

When the agent quits low-value spaces, ai_quit_spaces performs two operations inside a single database transaction: deleting the user's memberships and regenerating both professional_id and buyer_id as new UUIDs.

This rotation is a privacy primitive. When a user contacts someone through Search and Contact, they share their candidate ID. If the user later leaves a space because it turned out to be spam, simply exiting is not enough — the other party still has the old ID and could re-initiate contact. By rotating both IDs atomically with the exit, any previously shared candidate IDs become invalid immediately. It is not a perfect anonymity guarantee, but it is a practical barrier against automated re-contact, aligning with the principle that leaving a space should mean leaving it completely.

Indexing and partitioning strategy

The system uses targeted indexes to support the query patterns: a composite index on messages covering chat ID, humans_only, and creation time for filtered reads; a descending index on message creation time for the rate-limit lateral query; and composite indexes on impression_tags and impressions for related lookups. Partitioning ensures that every dominant query — user-scoped, space-scoped, or chat-scoped — hits exactly one partition.

The full mental model

The complete Lead Engagement flow works as follows:

The agent calls ai_read_messages with a lookback window.
The server collects space IDs, filters shadows, and finds shared chats with recent inbound messages.
It returns a time-windowed snapshot of messages and privacy-stripped member lists.
The agent evaluates each chat; if the snapshot is insufficient, it deep-dives via ai_read_chat_messages.
The agent classifies each chat: reply directly, compact and fork, wait, or quit.
After user confirmation, it sends replies in parallel through the appropriate write path.
For spaces to quit, it calls ai_quit_spaces; the server deletes memberships and rotates candidate IDs in one transaction.
The agent evaluates whether the user's decisions reveal new preferences, and if so, triggers Impression Management to update the searchable profile.

Engineering takeaways

Two-layer reads prevent context-window bloat. Triage via lookback window, deep-dive on demand. This is a requirement for agent workflows, not an optimization.
Server-side rate limiting is non-negotiable. A 600-second cooldown enforced at the GraphQL layer is a safety net that no client-side instruction can replace.
Context compression via chat forking is an agent-native pattern. Forking at 10 messages keeps contexts clean without losing continuity; the server provides the mechanism, the agent provides the summary.
Identity rotation is a practical privacy primitive. Regenerating candidate IDs on exit raises the friction for automated re-contact from abandoned spaces.
Partitioning should match query patterns. Mirroring members and copy_members via trigger is a small write cost for a large read benefit.
Auth failure and permission failure must be distinct. Returning None for expired tokens and False for permission denials lets the agent respond appropriately: re-authenticate versus notify and stop.

Closing

Opportunity Skill is built on the belief that in the agent era, your inbox should not just be readable — it should be triaged, prioritized, and acted upon by your agent, with safety and privacy guarantees enforced by the platform. The Lead Engagement pipeline turns a natural-language request into a time-windowed snapshot, on-demand deep dives, privacy-aware context, server-enforced rate limiting, context-optimized write paths, and atomic exit with identity rotation. Together with Search and Contact, it forms a complete opportunity-capture loop: one side finds the right people, the other handles the people who find you.

If you want to try the skill, ask your agent to install it from: https://github.com/QuestMeet/opportunityskill/releases/download/latest/opportunity-skill.zip

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

	The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI Ken W. Algerverified - Jun 4
	Maintaining an Agent-Searchable Profile innovationsiyu - May 16
	Architecting a Two-Stage Semantic Search Pipeline innovationsiyu - May 16
	Opportunity Skill innovationsiyu - May 16
	AI Agents Don't Have Identities. That's Everyone's Problem. Tom Smithverified - Mar 13

Designing Conversational Infrastructure for AI Agents

Two-layer decision: from triage to action

Agent-side entry points and return contract

Data model: five tables and mirrored membership

Auth guard and permission model

Batch message ingestion: the two-layer read design

Privacy-aware member enumeration

Server-enforced rate limiting with LATERAL JOIN

Reply routing: two write paths

Space exit and identity rotation

Indexing and partitioning strategy

The full mental model

Engineering takeaways

Closing

0 Comments

Please log in to comment on this post.

More Posts

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

Maintaining an Agent-Searchable Profile

Architecting a Two-Stage Semantic Search Pipeline

Opportunity Skill

AI Agents Don't Have Identities. That's Everyone's Problem.

More From innovationsiyu

Maintaining an Agent-Searchable Profile

Architecting a Two-Stage Semantic Search Pipeline

Opportunity Skill

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,560 amazing developers

Don't have an account? Sign up

OR

Designing Conversational Infrastructure for AI Agents

Two-layer decision: from triage to action

Agent-side entry points and return contract

Data model: five tables and mirrored membership

Auth guard and permission model

Batch message ingestion: the two-layer read design

Privacy-aware member enumeration

Server-enforced rate limiting with LATERAL JOIN

Reply routing: two write paths

Space exit and identity rotation

Indexing and partitioning strategy

The full mental model

Engineering takeaways

Closing

0 Comments

Please log in to comment on this post.

More Posts

More From innovationsiyu

Related Jobs

Commenters (This Week)