Architecting a Two-Stage Semantic Search Pipeline

Question

Architecting a Two-Stage Semantic Search Pipeline

calendar_todayMay 16 • schedule8 min read

AI agents are becoming a new interface for finding people. Instead of opening a marketplace and manually filtering profiles, a user can now say: "Help me find a few SaaS founders who might need my backend architecture services" or "Find remote Rust freelancers who have experience with early-stage infrastructure products."

In Opportunity Skill, the user's AI agent turns that request into a semantic search query, calls the QuestMeet backend, receives a compact list of matched candidates, and drafts tailored collaboration proposals. This post walks through the backend search function behind that flow.

The search engine combines PostgreSQL, pgvector cosine distance, HNSW indexes, tag-level semantic recall, active-user filtering, cubic similarity scoring, LATERAL JOIN impression reranking, and separate buyer/professional identity perspectives. The goal: given a natural-language request from an AI agent, return the candidates worth contacting, together with enough semantic context for the agent to explain the match and write a good proposal.

Context: What Opportunity Skill does

Opportunity Skill makes a user discoverable to other agents, supporting products like Claude Code and OpenClaw. It has four interconnected processes: authentication, impression management, search and contact, and lead engagement. This article focuses on the Search and Contact process.

When the user asks their agent to find buyers or professionals, the agent calls ai_search_buyers or ai_search_professionals. These functions communicate with QuestMeet through GraphQL. Return values have clear semantics for the agent: a list of dicts means relevant candidates were found; an empty list means the request succeeded but nothing matched; None means the token is expired (agent should re-authenticate); False means something failed (notify user and stop). The agent, not the server, owns the workflow.

GraphQL entry points and identity separation

On the server side, both search fields wrap the same internal function, differing only in the perspective argument: "Buyer" for searching employers/clients, "Professional" for searching freelancers/employees. This distinction is not cosmetic. The same person can be both — a founder may want to hire developers while also being discoverable as a product consultant. These identities should not share the same matching context. Each user has two external candidate IDs (professional_id and buyer_id), and the API returns the appropriate one as candidate_id without the agent needing to know which internal column was used.

The data model

The search touches five tables: users, logins, tags, impressions, and impression_tags. The core idea is that the user's profile is not just a human-readable display profile, but a set of impressions written for AI agents to search and reason over. An impression is a structured statement about a user's expertise, collaboration style, communication preference, leadership style, taste, or requirements. Each impression is associated with 1–5 tags, which are embedded into vector space and used as a lightweight semantic recall layer.

Both tags and impressions carry two vector columns: odd_embedding and even_embedding, each 1536-dimensional, with a constraint that exactly one is present. This dual-column design supports embedding model rotation without downtime.

The full search pipeline at a glance

The pipeline proceeds through these stages: vectorize the natural-language query → search tags using pgvector cosine distance (keeping tags with distance ≤ 0.4, limited to 100) → map matched tags back to users via impression_tags (only public impressions, only users active in the last month) → score users by summing the cube of tag similarity (similarity³) → take the top 100 users via heapq.nlargest → exclude the current user → rerank each candidate's impressions using LATERAL JOIN (keeping impressions with distance ≤ 0.28, up to 10 per user) → return name, badges, candidate_id, description, and impressions with creation dates.

Step-by-step architecture

Step 1: Auth guard

The internal function checks info.context["user_id"], populated after access token verification. If missing, it returns None. The server does not attempt to redirect — it only tells the agent "you are not authenticated." The skill then instructs the agent to re-authenticate, store the new token, and retry. This keeps the backend simple and makes the agent responsible for workflow recovery.

Step 2: Vectorize the query

The user's natural-language request is embedded into a 1536-dimensional vector before querying PostgreSQL.

Step 3: Tag-level semantic recall

The first database query searches the tags table using pgvector's cosine distance operator (<=>), where distance = 1 − cosine_similarity. The filter distance <= 0.4 means cosine similarity ≥ 0.6 — intentionally not too strict, since this is the recall stage. The LIMIT 100 prevents broad queries from pulling too many tags into the next stage. HNSW indexes on both embedding columns (with m = 32, ef_construction = 128) keep semantic tag recall fast as the vocabulary grows.

Step 4: Map tags back to active public users

The server maps matched tag IDs back to users through impression_tags, which acts as an inverted index connecting tags to users who have public impressions containing those tags. Three filters apply simultaneously: only public impressions (is_public = TRUE), only users who have logged in within the last month (via a join against logins), and the join is on impression_tags so only users with at least one matching tag are included. A partial index on impression_tags (tag_id, user_id) WHERE is_public IS TRUE keeps this reverse lookup efficient.

Step 5: Cubic similarity scoring

For every matched (tag_id, user_id) pair, the server converts cosine distance back to similarity (1.0 − distance) and adds the cube of that similarity to the user's score. Why cube? Because weak semantic matches should not dominate the ranking just because they are numerous. After the distance ≤ 0.4 filter, similarity ranges from 0.6 to 1.0. Compare: a similarity of 0.99 stays strong at 0.97 cubed, while 0.60 drops to 0.216. A candidate with strong matches on "TypeScript," "Type Safety," and "Software Architecture" should outrank someone with only broad matches like "JavaScript" and "Web Development." Cubic scoring makes that more likely without completely discarding weaker supporting signals.

Step 6: Keep the top 100 candidates

heapq.nlargest(100, ...) selects the top-scoring users without fully sorting the entire candidate set — cheaper for large sets. The current user is excluded, so the final count may be fewer than 100. This list is not the final result; it is a small candidate set for impression-level reranking.

Step 7: Impression-level reranking with LATERAL JOIN

Tag-level recall is fast and broad, but final results should be based on actual impression text. This is where LATERAL JOIN shines.

The server passes the top 100 candidate user IDs as an array and uses unnest to turn them into rows, processing all candidates in one query instead of 100 round trips. A LATERAL JOIN runs a per-user subquery that can reference the outer query's users.user_id. For each candidate, the database selects that user's most relevant public impressions under the requested perspective. Two quality gates apply: the perspective parameter prevents identity leakage (hiring preferences should not affect freelancer rankings), and distance <= 0.28 ensures only impressions with sufficiently high semantic overlap to the query are included. If none of a candidate's impressions pass this threshold, the lateral subquery returns no rows and the candidate is naturally filtered out. This gives the pipeline two quality gates: tag-level recall and impression-level verification.

Up to 10 impressions per candidate are returned, aggregated into a compact Markdown list with creation dates. This format is intentionally simple — the consumer is an AI agent, not a frontend component.

The query also includes AND users.is_shadow IS FALSE to handle a multi-node architecture where foreign key placeholders (shadow data) exist across regions for data-residency compliance. This ensures only actual, localized user records appear in results.

Step 8: Return an agent-readable payload

The final payload contains name, badges (subscription/trust markers), candidate_id (identity-specific), description, and impressions (up to 10 query-relevant statements with dates). The agent uses this evidence to explain matches and draft personalized proposals. After user confirmation, it calls ai_contact_candidate in parallel for each selected candidate. The server finds the right people; the agent says the right thing.

Handling two embedding models

The production code supports a two-model mode for embedding rotation. When two models are configured, the query is embedded with both, and both vector columns are searched. Tag matches from both embedding spaces contribute to the same scoring dictionary. The impression reranking query uses UNION ALL across both embedding columns, then filters combined results with the same distance <= 0.28 threshold. This makes the search tolerant of data encoded with either model, useful during migration.

Why not search all impressions globally?

A simpler design would embed the query and run a global vector search over all impressions. But impressions are longer and more numerous than tags — searching the entire impression table globally would make the expensive part of the pipeline happen too early. The two-stage approach uses tags as a lightweight proxy for semantic recall, reserving impression-level search for after the candidate set is reduced to 100 users. This split is not just a performance optimization: tags help identify relevant candidates, and impressions help explain why they are a good fit.

Relevant indexes

HNSW indexes on tags for both embedding columns (m = 32, ef_construction = 128); a composite index on logins (updated_at, user_id) for active-user filtering; a partial index on impression_tags (tag_id, user_id) WHERE is_public IS TRUE for public tag-to-user lookup; and a composite index on impressions (user_id, perspective) for per-user impression retrieval. The impressions table is partitioned by user_id, keeping per-user retrieval predictable as the table grows.

Engineering takeaways

AI-agent APIs should return reasoning context, not just records. The response includes query-relevant impressions because the agent needs them to explain matches and write personalized messages.
Semantic tags are a useful middle layer. Raw keyword search is too brittle; global impression vector search is too expensive too early. Tags provide a compact recall layer.
Weak semantic matches should not dominate rankings. Linear scoring lets many weak matches overpower fewer strong ones. Cubic scoring is a simple fix.
Separate identity perspectives matter. Mixing buyer and professional impressions would create strange matches. Keeping buyer_id, professional_id, and perspective separate keeps the search context clean.
Auth failure should be part of the function contract. Returning None for expired tokens and False for other failures lets the agent respond appropriately — re-authenticate versus stop.
Two-stage search is a design pattern, not just an optimization. Tags for recall, impressions for precision. The split improves both performance and result quality.

Closing

Opportunity Skill is built on the belief that in the AI-agent era, your profile should not only be readable by humans — it should be searchable, interpretable, and actionable by agents. The search function described here turns a natural-language request into semantically matched candidates, identity-aware IDs, relevant profile evidence, and compact context that an AI agent can use to draft a proposal. Together with Impression Management and Lead Engagement, it forms a self-reinforcing loop where every preference you reveal makes you more precisely discoverable.

If you want to try the skill, ask your agent to install it from: https://github.com/QuestMeet/opportunityskill/releases/download/latest/opportunity-skill.zip

Siyu Wang

231 Points • 8 Badges •

San Francisco, CA / Beijing, China • questmeet.ai

4Posts

0Comments

2Followers

2Connections

Founder of QuestMeet. Full-stack builder. Deep into PostgreSQL, backend architecture, semantic search, and agent skills. Former VC & management consultant.

✨ Build your own developer journey

Track progress. Share learning. Stay consistent.

Create your profile

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

	The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI Ken W. Algerverified - Jun 4
	Designing Conversational Infrastructure for AI Agents innovationsiyu - May 16
	Maintaining an Agent-Searchable Profile innovationsiyu - May 16
	Architecting a Local-First Hybrid RAG for Finance Pocket Portfolio - Feb 25
	Opportunity Skill innovationsiyu - May 16

Architecting a Two-Stage Semantic Search Pipeline

Context: What Opportunity Skill does

GraphQL entry points and identity separation

The data model

The full search pipeline at a glance

Step-by-step architecture

Step 1: Auth guard

Step 2: Vectorize the query

Step 3: Tag-level semantic recall

Step 4: Map tags back to active public users

Step 5: Cubic similarity scoring

Step 6: Keep the top 100 candidates

Step 7: Impression-level reranking with LATERAL JOIN

Step 8: Return an agent-readable payload

Handling two embedding models

Why not search all impressions globally?

Relevant indexes

Engineering takeaways

Closing

0 Comments

Please log in to comment on this post.

More Posts

The Sovereign Vault — A Comprehensive Guide to Protocol-Driven AI

Designing Conversational Infrastructure for AI Agents

Maintaining an Agent-Searchable Profile

Architecting a Local-First Hybrid RAG for Finance

Opportunity Skill

More From innovationsiyu

Designing Conversational Infrastructure for AI Agents

Maintaining an Agent-Searchable Profile

Opportunity Skill

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,404 amazing developers

Don't have an account? Sign up

OR

Architecting a Two-Stage Semantic Search Pipeline

Context: What Opportunity Skill does

GraphQL entry points and identity separation

The data model

The full search pipeline at a glance

Step-by-step architecture

Step 1: Auth guard

Step 2: Vectorize the query

Step 3: Tag-level semantic recall

Step 4: Map tags back to active public users

Step 5: Cubic similarity scoring

Step 6: Keep the top 100 candidates

Step 7: Impression-level reranking with LATERAL JOIN

Step 8: Return an agent-readable payload

Handling two embedding models

Why not search all impressions globally?

Relevant indexes

Engineering takeaways

Closing

0 Comments

Please log in to comment on this post.

More Posts

More From innovationsiyu

Related Jobs

Commenters (This Week)