UltrafastSecp256k1 v4.0 — Optional Secondary secp256k1 Backend for Evaluation

Question

UltrafastSecp256k1 v4.0 — Optional Secondary secp256k1 Backend for Evaluation

calendar_todayMay 16 • schedule4 min read

Overview
UltrafastSecp256k1 v4.0 is a high-performance secp256k1 engine built for evaluation as an optional secondary backend for Bitcoin Core. The goal is not to replace libsecp256k1, but to make it possible to measure, compare, and selectively enable an alternative implementation under controlled conditions.

This post presents the current state of the project, its integration model, and the evidence gathered through continuous audit infrastructure.

Repository: https://github.com/shrec/UltrafastSecp256k1 Release v4.0.0: https://github.com/shrec/UltrafastSecp256k1/releases/tag/v4.0.0

Integration Model
Integration uses a shim layer that exposes the identical secp256k1.h API surface. Bitcoin Core can be built with the alternative backend using a single CMake flag:

cmake -B build -DSECP256K1_BACKEND=ultrafast
cmake --build build
The default backend remains libsecp256k1 unchanged. All existing Bitcoin Core C++ source files remain unmodified — only the CMake build system references a different library.

Fork demonstrating this integration: https://github.com/shrec/bitcoin/tree/feature/ultrafast-secp256k1-backend

Shim API coverage:

secp256k1.h — context, pubkey, seckey
secp256k1_extrakeys.h — keypair, x-only pubkey (BIP-340/341)
secp256k1_schnorrsig.h — Schnorr sign/verify (BIP-340)
secp256k1_ecdh.h — ECDH
secp256k1_recovery.h — ECDSA recovery
secp256k1_ellswift.h — ElligatorSwift (BIP-324)
secp256k1_musig.h — MuSig2 (BIP-327, all 14 functions)
Performance — Bitcoin Core Integration Paths
All numbers from bench_bitcoin (Bitcoin Core's native benchmark harness) on Intel i5-14400F, GCC 14.2.0, Release+LTO, intel_pstate/no_turbo=1, taskset -c 0, nice -20, 5 runs.

Canonical artifact: docs/BITCOIN_CORE_BENCH_RESULTS.json

Transaction signing:

BenchmarkUltralibsecp256k1Delta
SignSchnorrWithMerkleRoot83.9 µs113.4 µs+35% faster
SignSchnorrWithNullMerkleRoot84.0 µs113.0 µs+35% faster
SignTransactionECDSA149.5 µs165.1 µs+10% faster
SignTransactionSchnorr125.4 µs137.5 µs+10% faster
Script verification:

BenchmarkUltralibsecp256k1Delta
VerifyScriptP2TR_KeyPath45.4 µs46.3 µs+2.0% faster
VerifyScriptP2TR_ScriptPath76.5 µs83.8 µs+10% faster
VerifyScriptP2WPKH46.0 µs45.8 µsparity (within noise)
Block validation aggregate (ConnectBlock, 2000 unique signatures):

ScenarioUltralibsecp256k1Delta
All ECDSA254.3 ms257.4 ms+1.2% faster
All Schnorr253.0 ms255.3 ms+0.9% faster
Mixed (2k Schnorr + 1k ECDSA)253.9 ms257.7 ms+1.5% faster
Without LTO: ConnectBlock is ~0.5–1.0% slower than libsecp256k1 due to i-cache pressure from a larger code footprint. LTO is required for Ultra to win the aggregate. This tradeoff is documented in docs/SHIM_KNOWN_DIVERGENCES.md.

Bitcoin Core test suite: 749/749 passing with Ultra backend (GCC 14.2.0, May 2026).

Performance — Constant-Time Signing Primitives
From docs/bench_unified_2026-05-16_gcc14_x86-64.json. CT-vs-CT co-measured in the same run (ratios are TSC-independent):

OperationUltra CTlibsecp256k1Ratio
CT ECDSA sign21.6 µs59.7 µs1.30× faster
CT Schnorr sign (BIP-340)18.1 µs46.5 µs1.28× faster
Schnorr verify84.3 µs84.3 µsequal
Field arithmetic primitives:

PrimitiveUltralibsecp256k1Ratio
field_mul20.1 ns26.5 ns1.32×
field_sqr18.7 ns22.3 ns1.19×
field_inv1253.9 ns1506.0 ns1.20×
field_from_bytes4.8 ns13.0 ns2.71×
Continuous Audit and Assurance (CAAS)
The repository includes a continuous audit infrastructure tracking security regressions across every commit.

Current state at v4.0.0:

262 exploit PoC modules covering 20+ CVE/attack classes — all pass
369 total audit modules (262 exploit PoC + 107 non-exploit)
CAAS autonomy score: 100/100 (8/8 gates)
Source: docs/SECURITY_AUTONOMY_KPI.json

Audit surfaces include: nonce reuse, side-channel timing (dudect), CT boundary verification, batch verify soundness, MuSig2/FROST protocol attacks, adaptor signatures, DER BIP-66 strict parsing, BIP-340/RFC-6979 known-answer tests, Wycheproof vectors, structure-aware fuzzing, differential testing vs libsecp256k1, and Python-based algebraic property testing.

Security properties enforced:

Constant-time signing: ECDSA, Schnorr, MuSig2, FROST, BIP-324 XDH
Per-context DPA blinding (secp256k1_context_randomize fully implemented)
Strict scalar parsing for private key inputs (parse_bytes_strict_nonzero)
Fail-closed batch signing APIs
BIP-66 strict DER enforcement in all shim paths
Benchmarking Methodology
Benchmarks distinguish between:

CT vs CT — constant-time Ultra signing vs constant-time libsecp signing (production-equivalent, fair comparison)
Bitcoin Core integration — bench_bitcoin binary, real validation pipeline
LTO vs no-LTO — both measured and documented
Warm-cache vs cold-cache — noted where applicable
All claimed improvements include error percentage from nanobench's internal statistics. Inconclusive results (overlapping ranges) are reported as such. Results are reproducible from the canonical JSON artifacts in the repository.

Reproducibility
Builds are deterministic:

-ffile-prefix-map strips source paths from debug info
SOURCE_DATE_EPOCH awareness
Fixed -march=x86-64-v3 (no host-native variation)
SLSA provenance attached to v4.0.0 release artifacts (via Sigstore/Cosign)
Platform Coverage
CI passing on all platforms as of v4.0.0:

PlatformArchitectureCompiler
Linuxx86-64-v3GCC 14 / Clang 17
LinuxARM64GCC 14
LinuxRISC-V 64GCC 14
macOSARM64 (Apple Silicon)Clang 15
Windowsx86-64MSVC / GCC
Additional: Android ARM64 (NDK), WASM, ESP32, STM32.

Known Limitations
ConnectBlock improvement requires Release+LTO. Without LTO: ~0.5–1.0% slower than libsecp256k1.
GPU backends (CUDA, OpenCL, Metal) are present but not part of the Bitcoin Core evaluation profile.
This covers the CPU backend only.
Current Objective
No mandatory integration path is proposed. The current objective is to make an alternative backend available for evaluation on technical grounds, with reproducible evidence, minimal integration surface, and an easy rollback path.

The default backend remains libsecp256k1. All existing behavior is preserved unless the new backend is explicitly enabled at build time.

Reviewer Entry Path
git clone https://github.com/shrec/UltrafastSecp256k1
cd UltrafastSecp256k1
python3 ci/verify_external_audit_bundle.py --allow-commit-mismatch
Key documents:

docs/BITCOIN_CORE_BACKEND_EVIDENCE.md — full reviewer package
docs/BENCHMARKS.md — benchmark methodology and raw data
docs/AUDIT_CHANGELOG.md — security audit history
docs/SHIM_KNOWN_DIVERGENCES.md — documented behavioral differences from libsecp256k1
All performance numbers are from controlled benchmark runs with hard turbo lock. Raw data available in the repository. Canonical benchmark artifact: docs/BITCOIN_CORE_BENCH_RESULTS.json, docs/bench_unified_2026-05-16_gcc14_x86-64.json.

4 Comments

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

juhist · Answer 1 · 2026-05-18T05:45:42+0000

juhist • May 18

Actually a cool read. Optional backend support sounds useful for testing/flexibility. Was compatibility or raw performance the bigger challenge here?

Vano Chkheidze • May 18

@[juhist] Thanks — honestly compatibility/convergence ended up being the harder part.

Raw performance engineering is comparatively straightforward once you isolate invariants, reduce recomputation, optimize hot paths, and benchmark carefully.

The difficult part was making everything continuously agree with each other:

shim behavior vs libsecp semantics
CI vs actual execution paths
benchmark claims vs generated artifacts
docs vs implementation
audit tooling vs real cryptographic properties
platform-specific behavior
rollback-safe integration

A large amount of the v4.0 cycle was spent fixing small edge-case bugs, compatibility mismatches, CI drift, and verification inconsistencies rather than chasing raw speed.

Performance improvements still happened during that process, but convergence/compatibility definitely consumed more engineering time overall.

Gimi · Answer 2 · 2026-05-19T02:33:01+0000

Gimi • May 18

The optional backend approach is smart idea. No forced adoption, just measurable evidence.
262 exploit PoC modules in your CAAS is solid coverage. Most implementations stop at known-answer tests and fuzzing. Timing side channels and batch soundness are often missed.
I've small question on LTO. The ConnectBlock wins are real but the i-cache penalty without it creates a tradeoff. Do you recommend gating the Ultra flag behind Release+LTO detection for downstream integrators who might not control their build pipeline?

Hope to hear from you soon!

Vano Chkheidze • May 18

@[Gimi] Yes, non-blocking warning at cmake time is the right approach.

We just landed it. When SECP256K1_BACKEND=ultrafast + CMAKE_BUILD_TYPE=Release + no LTO, cmake now prints:

UltrafastSecp256k1: LTO is OFF
ConnectBlock throughput will be ~10% lower than bundled libsecp256k1
without Link Time Optimisation. With LTO it is +1-2% faster.
For production / packaged builds add:
-DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON
Why non-blocking (not error): Packagers running Debian/Ubuntu hardened builds, local dev builds, and CI without cpupower access all have legitimate reasons to skip LTO. Refusing the build would be hostile to the exact audience we want to adopt it.

The actual numbers from turbo-locked canonical benchmarks:

No-LTO + stripped build: ~-10% ConnectBlock (I-cache, 1.2 MB .text vs 400 KB)
LTO + any build: +1-2% ConnectBlock, +20-26% Schnorr signing
The I-cache story is honest: the algorithm depth that gives the signing wins (GLV, SafeGCD, FE52) is exactly what creates the .text pressure. LTO is the clean resolution — the linker colocates hot functions from both Bitcoin Core and Ultra together, eliminating the competition for L2/L3 cache lines. The warning makes that tradeoff visible without being gatekeeping.

	Tuesday Coding Tip 06 - Explicit template instantiation Jakub Neruda - Apr 7
	Tuesday Coding Tip 02 - Template with type-specific API Jakub Neruda - Mar 10
	Merancang Backend Bisnis ISP: API Pelanggan, Paket Internet, Invoice, dan Tiket Support Masbadar - Mar 13
	# CAAS: The Audit System That Caught a Private Key Leak During a Platform Port Vano Chkheidze - Apr 13
	UltrafastSecp256k1 v3.67.0 Vano Chkheidze - Apr 22

UltrafastSecp256k1 v4.0 — Optional Secondary secp256k1 Backend for Evaluation

4 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Tuesday Coding Tip 06 - Explicit template instantiation

Tuesday Coding Tip 02 - Template with type-specific API

Merancang Backend Bisnis ISP: API Pelanggan, Paket Internet, Invoice, dan Tiket Support

# CAAS: The Audit System That Caught a Private Key Leak During a Platform Port

UltrafastSecp256k1 v3.67.0

More From Vano Chkheidze

UltrafastSecp256k1 v3.67.0

# CAAS: The Audit System That Caught a Private Key Leak During a Platform Port

UltrafastSecp256k1 v3.60

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,718 amazing developers

Don't have an account? Sign up

OR

UltrafastSecp256k1 v4.0 — Optional Secondary secp256k1 Backend for Evaluation

4 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Vano Chkheidze

Related Jobs

Commenters (This Week)