ÉLAN

Build agents that recover, coordinate, and ship.

Élan is a BEAM-native multi-agent runtime with durable state, git-native provenance, and policy-governed tool orchestration. It is designed for long-running autonomous systems that keep their promises even when machines do not.

Name origin: Élan comes from the French word "élan", meaning momentum or spirited energy.

View repository Read the PRD

Status

Working prototype. The runtime core, recovery path, agent loop, embedding API, and recursive exploration API are implemented locally. Embedded-consumer hardening is now complete; the Epistemic Integrity Layer (Phase 1–2) is live — RAG grounding, ConfabulumGate, GroundtraceRecord telemetry, and CompetenceSignal are all wired.

Durable state Replay-safe events, checkpoints, tasks, messages, and retention controls are wired.

Agent loop Multi-turn tool calls, streamed steps, and operator controls are live.

Embeddable CLI/TUI and headless OTP embedding now share the same runtime and config surface.

RAG Grounding 4-step stack protocol with Perplexity sonar retrieval and citation-grounded verify step.

ConfabulumGate 9-type confabulum taxonomy gate applied after each pipeline step; halts at score > 0.65.

GroundtraceTelemetry Per-step signed records emitted to append-only audit_store_<run_id>.jsonl hash chain.

Why Élan

Recovery correctness

Agents restart with explicit state, checkpoints, and idempotent side effects so that a crash never becomes data loss.

Operational truth

A full audit trail of events, decisions, and tool usage gives you a clear source of truth.

Supervised coordination

Supervised agents coordinate through durable tasks, leases, retries, and explicit provenance boundaries.

Design principles

Let it crashSupervision trees make failure visible and recoverable.

Explicit stateTransitions are validated and recorded, not implied.

Idempotent actionsSide effects are tracked so repeats are safe.

Policy-governedCapabilities are checked before execution.

Architecture snapshot

Élan runs one agent per supervised process, backed by an event log, checkpoint store, task graph, and message log. Agents use gen_statem for explicit transitions, a bounded tool-calling loop for reasoning, callback-based step streaming for callers, and git branches plus worktrees for provenance isolation. The same runtime can run under the CLI/TUI or be embedded as a library in another OTP application.

The BenchArena/stack execution layer now runs a 4-step protocol with formal epistemic guarantees at each stage:

Question
  ↓
[Decompose] ─── ConfabulumRate gate ─── halt?
  ↓
[RAG Retrieve] (Perplexity sonar × N sub-questions, citation-grounded)
  ↓
[Verify + grounded context] ─── ConfabulumRate gate ─── halt?
  ↓
[Synthesise] ─── ConfabulumRate gate ─── halt?
  ↓
Answer (confidence_score, certainty_vocab, GroundtraceRecord)

ConfabulumRate Gate

9-type taxonomy applied after every step. gate/2 returns {:pass, score} or {:halt, type, score}. Synthesis is blocked on halt. Threshold: 0.65 per type.

GroundtraceTelemetry

Per-step signed record emitted to audit_store_<run_id>.jsonl. Each record carries prev_record_hash — tamper-evident by construction.

Epistemic Integrity Layer

Phases 1A–1C close the TruthfulQA regression (stack 53.3% vs standard 100%) by adding retrieval grounding, confabulation gating, and confidence propagation — transforming the stack from “formally interesting” to “demonstrably trustworthy.”

RAG Grounding Phase 1A

The stack adapter’s 3-step protocol was extended to a 4-step protocol: decompose → retrieve → verify → synthesise.

Each sub-question triggers a Perplexity sonar retrieval call (up to 5 sub-questions, capped for latency).
Retrieved passages and citations URLs augment the verify-step prompt as grounded context blocks.
Verify step explicitly prefers retrieved sources over model memory when they conflict.
Result: model memory replaced by retrieved, timestamped, citable groundtrace.

ConfabulumRate Gate Phase 1B

Promoted from post-hoc classifier to in-pipeline gate. Synthesis is blocked on any halt signal.

9 confabulum types:

factual_error temporal_confusion entity_substitution causal_inversion scope_collapse authority_fabrication numeric_drift modal_confusion negation_flip

gate/2 scores the answer across all 9 types using lightweight textual heuristics.
Returns {:pass, aggregate_score} if all types are below threshold (0.65).
Returns {:halt, worst_type, worst_score} if any type exceeds the threshold.
Gate is called after decompose, verify, and synthesise. A halt scores 0.0 in BenchArena.

ConfidenceScore + CertaintyVocabulary Phase 1C

confidence_score: float() added to each SubTask, propagated as min(own, min(deps)) — confidence degrades monotonically along dependency chains. CompetenceSignal injects a vocabulary-appropriate prefix into the synthesis prompt, enforcing honest hedging at the generation step.

Verified≥ 0.95 — declarative, machine-checkable

HighConfidence0.80–0.95 — qualified with source

Moderate0.60–0.80 — hedged, caveats warranted

Uncertain0.40–0.60 — human review recommended

Halted< 0.40 — synthesis blocked

Audit Trail — GroundtraceRecord

Phase 2 — Complete

Every BenchArena run — or production pipeline execution — produces a complete, signed, append-only GroundtraceRecord per SubTask. An auditor with the record can reconstruct the full execution path without access to the original runtime.

GroundtraceRecord schema Phase 2A

20-field immutable struct. Each field is content-addressed; the hash chain links every record to its predecessor.

%GroundtraceRecord{
  record_id:          UUID,          # globally unique, content-addressed
  run_id:             String,        # links to parent BenchArena/pipeline run
  subtask_id:         String,        # SubTask.id from SemanticIR
  adapter:            atom(),        # :stack | :agent_loop | :perplexity_standard
  model_id:           String,        # e.g. "sonar-pro-20260401"
  model_temperature:  Float,         # 0.0 for deterministic mode
  prompt_hash:        String,        # SHA-256 of exact prompt sent
  retrieved_sources:  [%{url, title, retrieved_at, passage_hash}],
  raw_response_hash:  String,        # SHA-256 of raw API response
  tokens_in:          Integer,
  tokens_out:         Integer,
  latency_ms:         Integer,
  confabulum_verdict: ConfabulumVerdict,  # Pass | Halt(type, score)
  confidence_score:   Float,
  certainty_vocab:    CertaintyVocabulary,
  score:              Float,         # BenchArena score for this SubTask
  timestamp_utc:      DateTime,
  prev_record_hash:   String,        # hash of previous record in chain
  record_hash:        String         # SHA-256(all fields except record_hash)
}

Hash chain & tamper evidence Phase 2A

Each record carries prev_record_hash — any modification to a historical record invalidates all subsequent hashes.
valid_chain?/1 verifies the full chain integrity; proved in Lean 4 (chain_tamper_evident theorem).
verify_record/1 performs single-record tamper check by recomputing and comparing the hash.
Store: audit_store_<run_id>.jsonl — append-only JSON-Lines, written to bench_results/.

GroundtraceTelemetry hooks Phase 2B

Agent-based process maintaining run_id, prev_record_hash, and record_count state.
emit/3 called after each adapter step — builds record, appends to AuditStore, updates hash chain.
Emits [:bench_arena, :groundtrace, :emitted] telemetry event (logged at debug level).
Graceful no-op if agent not running or store path not writable.

Rule 17a-4 WORM path

The GroundtraceRecord schema provides the foundation. Rule 17a-4 compliance is a storage policy on top: configurable 6-year retention, non-deletable records, retrieval SLA. File-based initially; production deployment requires WORM storage (AWS S3 Object Lock or equivalent).

SEC Rule 17a-4 FINRA 4511 WORM AuditStore

Roadmap

Runtime foundation Complete: supervision, recovery, checkpoints, tasks, and retention

Safety and provenance Complete: policy engine, safe tools, DataHandles, and git-native worktrees

Agent loop and UX Complete: multi-turn loop, step streaming, recursive exploration, CLI/TUI, and embedding API

Phase 1 ✓ Complete RAG grounding (Perplexity sonar) + ConfabulumRate gate + ConfidenceScore / CertaintyVocabulary

Phase 2 ✓ Complete GroundtraceRecord schema (Lean 4 + Elixir) + telemetry hooks + AuditStore

Phase 3 ✓ Complete CompetenceSignal wired into synthesis step — vocabulary gate enforced at generation

External validation In progress: real Vertex, real Postgres, full-suite reruns, and first-consumer proof

Next: TruthfulQA parity verification Run run_regression_truthfulqa.exs — target ≥ 95% on stack adapter; gate in CI on merge to main

FINRA / SEC / SOC2 Ready

Enterprise Compliance

5 compliance modules built into the Élan orchestration layer, covering all CRITICAL gaps identified in the Block.xyz-grade fintech compliance audit.

506Total tests

5Fintech modules

82New compliance tests

ComplianceAuditLog

π 94.2

FINRA 4511 SEC 17a-4 SOC2 CC7 PCI Req.10

Immutable HMAC-chained event log with 6-year retention, chain integrity verification, and regulatory export API.

AgentPolicyEngine

π 91.7

SOC2 CC6 FINRA 3110 SR 11-7

RBAC with kill-switch per agent class. Autonomous agents gated by risk tier and human review thresholds.

ModelRegistry

π 88.4

SR 11-7 FINRA 3110 SOC2 CC8

Full SR 11-7 model inventory: risk tiers, validation status, pre-deployment approval workflow, board reporting.

NonRepudiationChain

π 79.3

FINRA 4511 SEC 17a-4

Cryptographic HMAC chain-of-custody. Every agent action is provably linked — tampering is detected at verification.

IncidentResponse

π 71.4

SOC2 CC9 BSA PCI DSS 12.10

P0/P1 incidents auto-kill affected agents. Circuit breaker halts all operations. Automated RCA + SOC2-compliant reports.

FINRA SEC SOC2 BSA/AML PCI-DSS GDPR SR 11-7

SRA / ABA / IRAP / EU AI Act Ready

Legal AI Compliance

4 legal-grade compliance modules built into the Élan orchestration layer — covering attorney-client privilege [confidential comms between lawyer and client] compartmentalisation, per-matter data residency [where data is physically stored and processed], EU AI Act QMS conformance, and IRAP [Information Security Registered Assessors Program — Australia's gov't infosec certification] Essential Eight controls.

506Total tests

4Legal AI modules

70New legal tests

PrivilegeGuard

π 96.4

ABA Rule 1.6 SRA Standard 6.3 US v. Heppner

Attorney-client privilege compartmentalisation at the orchestration layer. Zero-retention tagging for privileged context windows — privilege waiver is the #1 enterprise law firm sales blocker.

DataResidencyEngine

π 86.2

GDPR Art.44-49 UK IDTA IRAP AU-onshore

Per-matter jurisdiction tagging. Agent inference routes to the correct regional execution zone — PROTECTED matters enforced as AU-only. Magic Circle / Big Law data-residency gate satisfied.

EUAIActQMS

π 83.5

EU AI Act Art.9 Art.10 Art.13 Art.17

Quality Management System for EU AI Act conformance. Risk management, training data governance, human oversight chain, conformity assessment — ahead of the August 2026 Annex III deadline.

ISMControlSuite

π 79.1

IRAP ISM Essential Eight ML3 ISM-2074

ACSC Essential Eight at Maturity Level 3: MFA, application control, patch management, audit logging. ISM-2074 AI usage policy enforced — required for Australian government legal work.

SRA ABA MRPC GDPR Art.44 EU AI Act IRAP ISO 27001 UK IDTA

Build with Élan

Élan is ready for first-consumer proof. If you care about resilient agents and provable execution, explore the PRD, wire it into a host app, and help validate the live provider and persistence paths.

View repository Open an issue