Latency Budgets for Agent Tool Calls: Keep the Loop Fast Without Sacrificing Accuracy

Most “slow model” complaints are really slow tool orchestration.

Without a hard latency budget, multi-tool agents quietly drift from useful to unusable.

Operator Insight

The core argument: latency must be budgeted per workflow step with explicit degrade rules, not optimized ad hoc.

L_e2e = L_plan + sum(L_tool_i) + L_post

If you run five tool hops at 1.6s p95 each, you already spend 8s before post-processing.

Step	p95 target	Degrade action	Owner
Planner/model	`1.5s`	Smaller reasoning profile for low-risk intents	Model owner
Retrieval/read tools	`2.0s`	Return partial context and continue async enrichment	Data/tool owner
External write tools	`2.5s`	Queue write with confirmation step	Workflow owner
Post-processing	`1.0s`	Trim non-critical formatting	App owner
Transport overhead	`1.0s`	Send immediate progress state	Channel owner

Total: 8.0s p95 budget.

Metric	Trigger	Immediate action
Workflow p95	`> 8s` for 30 min	Reduce concurrency and queue non-urgent jobs
Timeout rate	`> 2%` over 1h	Switch to fallback dependency path
Tool-level p95	Above budget for 3 windows	Bypass or replace slow tool
Queue median wait	`> 2s`	Re-tier queue priorities or add workers