VeritasLab Built a Behavioral Intelligence Layer for Solana. Here's What We Learned.

VeritasLab Built a Behavioral Intelligence Layer for Solana. Here's What We Learned.

1 2 9
calendar_today agoschedule5 min read

Most Web3 security tools ask: is this contract dangerous?

We asked a different question: who is the person behind it, and what have they done before?

That shift — from contract risk to actor risk — changed everything about how we built NexusVeritas.

The Problem Nobody Was Solving

Solana processes thousands of token launches daily. The existing security tools fall into three buckets:

Audit firms (CertiK, Hacken) analyze code before deployment. They tell you whether a contract is technically sound. They don't tell you whether the person who wrote it has launched 400 identical tokens in the last 30 days.

On-chain analytics (Arkham, Nansen, Bubblemaps) map money flows. They're descriptive — they tell you what happened. They don't model behavioral risk.

Anti-scam scanners (GoPlus, De.Fi) flag known honeypot patterns and approval risks. Rule-based, reactive, no memory across actors.

None of them answer the question that actually matters before you interact with a token: what kind of operator created this, and does their behavioral pattern match known scam profiles?

The Architecture: Actor Risk Engine

We built what we call an Actor Risk Engine (ARE). The pipeline is straightforward:

tokens_clean.txt
→ enrich_parallel.js (fetch on-chain data via Helius RPC)
→ operator_classify.js (behavioral classification)
→ db_insert.js (pgvector storage)

Each creator wallet gets enriched with three behavioral layers:

Structural — wallet age, funding sources count, funding concentration

Behavioral — total incoming SOL, transfer patterns, recycling loops

Operational — tokens created, signature density, launch frequency

These 25 normalized features form vector_v2, stored in PostgreSQL with pgvector for cosine similarity search.

The Classification Layer

We defined 8 operator archetypes based on signal rules with confidence weights:

ArchetypeBaseline RiskKey SignalsWALLET_FACTORYelevatedsingle token, fresh wallet, micro fundingWALLET_FACTORY_HUBelevatedrecycling loop + factory patternROTATION_OPERATORhighsplit init pattern, 500+ tokens, no visible fundingINDUSTRIAL_DEPLOYERneutral500+ tokens, 3000+ signatures, high activityINFRASTRUCTURE_HUBunknown50+ SOL incoming, many funding sourcesEXCHANGE_FUNDED_DEPLOYERunknownhub pattern + exchange funding signalsPROFESSIONAL_CREATORlow-medium20–500 tokens, visible funding, 7+ days activeCASUAL_CREATORlowunder 20 tokens, externally funded

Classification returns operator_class, confidence, baseline_risk, and matched_signals — the exact rules that fired. No black box.

Example output:

json{
"operator_class": "ROTATION_OPERATOR",
"confidence": 0.85,
"baseline_risk": "high",
"matched_signals": [

"split_init_pattern",
"fresh_wallet",
"tokens_created_500_plus"

]
}

What We Got Wrong (and Fixed)

recycling_loop was a stub

For weeks, recycling_loop was hardcoded to false. The field existed in the schema, the classifier used it as a signal, but no detection logic ran.

The fix: in getFunding(), we now track both directions of SOL flow. If a wallet received SOL from address X and also sent SOL back to X within the same transaction batch — it's a loop.

javascript// incoming: someone → addr
if (t.toUserAccount === addr && t.amount >= 500000) {
inc.set(f, { count: e.count + 1, sol: e.sol + t.amount / 1e9 });
}
// outgoing: addr → someone
if (t.fromUserAccount === addr && t.amount >= 500000) {
out.add(t.toUserAccount);
}
// recycling_loop: received from X and sent back to X
const recycling_loop = [...inc.keys()].some(f => out.has(f));

Simple set intersection. Catches the pattern reliably within the 50-transaction window.

Graph flood attack vector

build_graph.js builds a funding graph: which wallets funded which operators, and which wallets funded those funders. Without degree limits, a single high-volume transit wallet could anchor hundreds of nodes and create false cluster signals.

We added MAX_DEGREE_THRESHOLD = 50. Any wallet outside knownServices.json with more than 50 child nodes gets flagged HIGH_DENSITY_TRANSIT and excluded from hub scoring:

javascriptconst highDensityNodes = new Set();
for (const [parent, children] of parentMap.entries()) {
if (children.length > MAX_DEGREE_THRESHOLD && !KNOWN_SET.has(parent.toLowerCase())) {

highDensityNodes.add(parent);
graph.get(parent).flag = 'HIGH_DENSITY_TRANSIT';

}
}

for (const node of graph.values()) {
if (node.flag === 'HIGH_DENSITY_TRANSIT') { node.hub_score = 0; continue; }
node.hub_score = hubScore(node.creator_count, childCount, !!node.parent);
}

Known services (exchanges, DEXes) are exempt — they legitimately move high volumes.

The Honest Bottleneck: Signal Quality

After 565 creators classified, we hit a wall: P50 similarity = 1.0.

Cosine similarity across the full dataset was collapsing to near-identical values. candidate_discovery.js — which finds behaviorally similar operators across archetypes — was returning noise.

The cause: most vectors had too many features at boundary values (0 or 1). Not enough continuous signal variance to differentiate actors.

This is not a bug in the vector math. It's a signal quality problem. The classifier has 8 archetypes and returns sensible labels, but the underlying features don't yet carry enough entropy to power similarity search.

The fix requires temporal vectors — separating behavioral features into 30d, 90d, and all-time windows. A ROTATION_OPERATOR who was highly active 90 days ago but dormant recently looks different from one who is active today. Current vectors flatten that timeline.

This is Q3 work. Until then, similarity search is degraded.

What "Actor-Centric" Actually Means in Practice

The fundamental reframe is this: a contract is not the threat. The operator is.

The same wallet that launched a clean token six months ago can launch a rug today. CertiK audits the contract. We track the wallet across every token it ever touched, model its behavioral evolution, and flag when the pattern shifts toward known high-risk profiles.

This is closer to how financial crime detection works — Visa doesn't just flag a bad transaction, it models cardholder behavior over time and detects deviation. We're building that for on-chain actors.

The practical output for integrators:

Wallets (Phantom, Backpack): show a risk warning at transaction simulation time, before the user signs
DEX aggregators (Jupiter): filter token listings based on creator archetype
Launchpads: gate participation based on operator history

One API call, one score, full explanation.

Stack Decisions

PostgreSQL + pgvector over a dedicated graph DB. At current scale (565 operators, target 1000+), adjacency lists in Postgres are sufficient. Memgraph becomes relevant at 1000–1500 operators with real-time query requirements. Premature infrastructure is how you burn runway.

Rule-based classification over ML at this stage. XGBoost makes sense after 500+ labeled examples. Before that, weighted rules with explainable outputs are more reliable and debuggable than a model trained on insufficient data.

Helius RPC for Solana data. The enhanced API gives parsed transaction data without manual deserialization. Worth the cost at this stage.

Node.js pipeline over a compiled language. The bottleneck is network I/O (RPC calls), not CPU. Node's async model handles CONCURRENCY=5 cleanly.

Where This Goes

The roadmap is staged around signal quality, not architectural complexity:

Now: improve signal primitives (split_init_pattern, temporal window analysis), scale to 1000+ creators

Q3: temporal vectors (30d/90d/all_time), XGBoost baseline, Review UI

Q4: Memgraph migration at load trigger, GNN embeddings for vector_v3, B2B API (Phantom, Jupiter integration)

The hard part is not the graph database or the embeddings pipeline. The hard part is accumulating enough clean signal data to make behavioral classification reliable at scale.

We're building that now.

NexusVeritas is open-source. GitHub: github.com/cryptaveritas

This article is based on internal technical documentation by VeritasLab.
Authenticity certified by CryptaVeritas digital fingerprint protocol.
Document hash (SHA-256): 3369f77ea8d34d39b6427f3f31c8c9834ea3812f814c9a810cc73fcac6ec73a9
Verify: cryptaveritas.github.io/cryptaveritas-verify

2 Comments

0 votes
0 votes
🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download)

Pocket Portfolio - Apr 1

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

Dharanidharan - Feb 9

The Audit Trail of Things: Using Hashgraph as a Digital Caliper for Provenance

Ken W. Algerverified - Apr 28

The Death of Smart Contract Audits: Why NexusVeritas Hunts Web3 Scammers via Behavioral DNA

VeritasLab - Jun 12

Architecting a Local-First Hybrid RAG for Finance

Pocket Portfolio - Feb 25
chevron_left
517 Points12 Badges
Web3 · Solana · Open Sourcegithub.com/cryptaveritas
4Posts
14Comments
1Connections
Building open-source cryptographic verification infrastructure for Web3. CryptaVeritas · NexusVeritas · Veritas Ecosystem.

Related Jobs

View all jobs →

Commenters (This Week)

8 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!