Testing February 27, 2026 3 min read

Shadow-Mode Canary Score: The Pre-Launch Gate That Prevents Agent Rollout Regret

A weighted canary score with pass/fail thresholds, owner actions, and rollout stages so teams can graduate workflows from shadow mode safely.

ExaClaw

Operator Research

Moving from shadow mode to autonomous writes without a hard gate is reckless.

Most rollout regret comes from one mistake: treating “it looks good” as evidence.

Operator Insight

The core argument: graduation from shadow mode requires a weighted score, minimum sample quality, and strict fail conditions.

Canary Score Formula

Canary Score = 0.30A + 0.20L + 0.20O + 0.15F + 0.15C

A: action accuracy versus accepted outcomes
L: latency stability at p95
O: override pressure
F: failure containment performance
C: operator clarity/readiness

Graduation Policy

Score band	Action	Owner
`>= 85`	Promote to limited autonomy	Dev lead + on-call operator
`70-84`	Stay in shadow mode and patch weakest dimension	Workflow owner
`< 70`	Block promotion	Incident captain

Hard stop: promotion is blocked if any single dimension is below 70, even when total score passes.

Minimum Evidence Requirements

At least 50 representative cases
At least one peak-load window
At least one injected failure drill
Signed pass/fail decision log

Concrete example: total score 86 with failure containment 62 still fails gate.

Rollout Playbook

Stage 1: Shadow Mode (0% User Impact)

Log decisions only.
Compare against accepted human outcomes.

Stage 2: Limited Canary (5-10%)

Enable low-blast-radius autonomous writes.
Monitor accuracy drift, latency tail, and overrides.

Stage 3: Expansion (25% -> 50% -> 100%)

Expand only if score stays >= 85 across two windows.
Freeze if any P1/P2 incident appears.

Tradeoffs and Limits

Strict sample requirements delay launch speed.
Representative sample collection can be costly for niche workflows.
High score can still miss new failure modes after product changes.
Overweighting latency can promote fast but low-quality behavior.

Source Citations

CTA

Use the same pre-launch gate: Get the Incident Drill Pack