Roadmap¶

Hermit is best understood today as a beta local-first governed agent kernel.

It is not starting from zero. The repository already contains real kernel objects, governance paths, proof primitives, and rollback support for selected actions. But the whole system is still converging on the v0.1 kernel spec.

This roadmap separates five things clearly:

what Hermit already ships
what is claimable today
what is partially implemented or compatibility-only
what the v0.1 spec defines as target behavior
what remains experimental or open

For a chapter-by-chapter breakdown, see kernel-spec-v0.1-section-checklist.md.

Current Status Snapshot¶

Safe current-state claims¶

Repository-level Core, Governed, and baseline Verifiable kernel profiles are claimable through the conformance matrix and task claim-status.
Hermit already ships a local kernel ledger with first-class task-related records.
Hermit already has governed execution primitives such as policy evaluation, approvals, decisions, capability grants, and workspace leases.
Hermit already issues receipts and exposes proof summaries and proof export.
Hermit already supports rollback for supported receipt classes.
Hermit already has context compilation and memory governance primitives.

Careful current-state claims¶

Task-level strongest verifiable readiness still depends on proof bundle completeness, signing, and inclusion coverage for the specific task being exported.
These claims apply to the kernel contract, not to every compatibility surface or legacy runtime affordance.
Not every runtime surface should be described as fully aligned with the target kernel spec.
Public APIs and module layout should still be treated as unstable while the runtime keeps converging on the kernel model.

Status Matrix¶

Area	Current state	Direction
Task-first execution	claimable at the kernel contract level	close remaining adapter and runtime gaps
Event-backed truth	claimable in the kernel ledger	deepen projection and replay semantics without over-claiming repo-wide event sourcing
Governed execution	claimable at the kernel contract level	extend coverage and tighten invariants
Receipts and proofs	claimable at baseline, task-specific strength still varies with proof coverage	broaden coverage and operator ergonomics
Rollback	supported for selected receipts	expand safe rollback classes
Artifact-native context	claimable in the kernel path	make it more central across all paths
Evidence-bound memory	claimable in the kernel path	tighten promotion, invalidation, and inspection
Public API stability	boundary defined	public vs internal modules classified; stability guarantees apply from Beta (see status-and-compatibility.md)

Near-Term Milestones¶

Milestone 1: Spec `0.1` Surface Closure¶

Focus:

preserve the now-claimable kernel profiles with machine-checkable docs and tests
close ambiguous gaps between runtime surfaces and kernel paths
reduce legacy naming and compatibility drift where it obscures kernel semantics

Milestone 2: Governed Execution Hardening¶

Focus:

broaden policy coverage for consequential actions
strengthen approval and witness drift semantics
improve scoped authority handling

Milestone 3: Receipt And Proof Coverage¶

Focus:

increase receipt coverage across important action classes
improve proof bundle completeness
make operator inspection and proof export more usable

Milestone 4: Recovery And Context¶

Focus:

improve rollback coverage where safe
deepen observation and resolution semantics
strengthen artifact-native context and carry-forward behavior

v0.2 Core Exit Criteria Status¶

12 of 12 v0.2 Core exit criteria are now implemented:

Consequential execution synthesizes an ExecutionContractRecord before dispatch
Contract admission records an EvidenceCaseRecord and AuthorizationPlanRecord
Receipt issuance carries contract / authorization linkage and requires reconciliation
Reconciliation writes a durable ReconciliationRecord and closes the contract loop
Witness drift supersedes prior attempts instead of silently reusing stale approval state
Durable memory promotion is reconciliation-gated
Projection / proof surfaces expose contract-loop entities
Idempotent reconciliation prevents duplicate reconciliation records for the same receipt
Contract expiry and policy version drift trigger durable re-entry before execution
Violated reconciliation invalidates memories learned from the same reconciliation ref
TrustLoop-Bench meaningful performance demonstration — all 15 tests pass, all 7 metric thresholds met (see below)
Learned contract templates for recurring governed action patterns

TrustLoop-Bench Performance Thresholds¶

TrustLoop-Bench (tests/integration/kernel/test_trustloop_bench.py) is the formal benchmark for v0.2 exit criterion #12. It covers 5 task families across 15 tests and asserts 7 kernel governance metrics.

Task Families¶

Family	Description	Tests
TF1: Approval Drift Patch	Write, approve, external mutation, re-execute, drift detection	1
TF2: Bounded-Authority Ops	Read-only passes, unclassified mutation blocked by policy	1
TF3: Crash + Unknown Outcome	Execute, reconcile with unknown_outcome, verify uncertain	1
TF4: Contradictory Memory	Execute, reconcile, promote belief, violate, invalidate	1
TF5: Rollback-Qualified Publish	Write with rollback, verify receipt and prestate snapshot	2 (TF5 + TF5b)

Additional tests cover reconciliation result class coverage (3 tests), evidence case lifecycle (2 tests), authorization plan invalidation (1 test), knowledge blocking reasons (2 tests), and aggregate metric computation (1 test).

Metric Pass Thresholds¶

These are the formal thresholds that the kernel must meet. All are asserted in test_trustloop_bench_aggregate_metrics.

#	Metric	Threshold	Rationale
M1	Contract Satisfaction Rate	>= 50%	Contracts satisfied / contracts total. Baseline floor; production targets will be higher.
M2	Unauthorized Effect Rate	== 0%	Unauthorized effects / total effects. Zero tolerance for ungoverned side effects.
M3	Stale Authorization Execution Rate	== 0%	Stale auth executions / total auth executions. No execution on expired or drifted authority.
M4	Belief Calibration Under Contradiction	>= 80%	Beliefs recalibrated / beliefs contradicted. Memory must respond to contradicting evidence.
M5	Rollback Success Rate	>= 80%	Rollbacks succeeded / rollbacks attempted. Rollback must be reliable where supported.
M6	Mean Recovery Depth	<= 3.0	Average steps to recover from crash or unknown outcome. Bounded recovery cost.
M7	Operator Burden Per Successful Task	<= 3.0	Operator interactions / successful tasks. Governance must not overwhelm operators.

Current Results¶

All 15 TrustLoop-Bench tests pass. All 7 metric thresholds are met. Criterion #12 is satisfied.

Contribution Priorities¶

The highest-leverage contributions right now are:

task lifecycle correctness
governance correctness
receipt and proof coverage
rollback safety
context and memory discipline
docs that sharply separate current implementation from target architecture

Beta Gate Checklist¶

Hermit will transition from Alpha to Beta when all of the following gates are satisfied. Each gate must be verified before the classifier in pyproject.toml is updated.

#	Gate	Status	Notes
G1	v0.2 Core exit criteria 12/12 completed	12/12 — DONE	All 12 criteria implemented
G2	All memory governance tests passing	All passing — DONE	Verified through `test_memory_engine` and integration suites
G3	CI has no known flaky tests	Fixed — DONE	Sandbox observation and approval copy timing issues resolved
G4	Public API scope defined	Defined — DONE	See status-and-compatibility.md — Public API Stability
G5	`status-and-compatibility.md` updated with API boundary	Updated — DONE	Public vs internal module classification added; Beta status noted
G6	`pyproject.toml` classifier updated to Beta	Ready to update	All other gates satisfied

Transition rule: When all six gates reach done / passing, update the classifier from Development Status :: 3 - Alpha to Development Status :: 4 - Beta and tag the release.

What The Roadmap Is Not¶

The roadmap is not a promise that Hermit is trying to become:

a giant multi-tenant cloud platform
a no-tradeoff autonomous agent system
a surface-level tool catalog race

Hermit's goal is narrower and stronger:

to become a local-first governed kernel for durable agent work that operators can inspect, approve, and recover.

Roadmap¶

Current Status Snapshot¶

Safe current-state claims¶

Careful current-state claims¶

Status Matrix¶

Near-Term Milestones¶

Milestone 1: Spec 0.1 Surface Closure¶

Milestone 2: Governed Execution Hardening¶

Milestone 3: Receipt And Proof Coverage¶

Milestone 4: Recovery And Context¶

v0.2 Core Exit Criteria Status¶

TrustLoop-Bench Performance Thresholds¶

Task Families¶

Metric Pass Thresholds¶

Current Results¶

Contribution Priorities¶

Beta Gate Checklist¶

What The Roadmap Is Not¶

Milestone 1: Spec `0.1` Surface Closure¶