Agent Drilldown
Agent Drilldown is where you go when you need to know exactly what your agents are doing with the resources they consume. Every run is attributed, every token is counted, and every failure is surfaced — giving you the visibility to optimize cost, catch reliability issues early, and prove that your AI workforce is delivering value.
This is the tab that turns “I think agents are working” into “I know exactly what each agent ran, how much it cost, and whether it succeeded.”
KPI cards
Six cards give you the headline numbers for agent execution:
| Card | What it tells you |
|---|---|
| Runs | Total agent sessions completed in the time window, with a failure count. The dual-series sparkline shows successful vs failed runs over time. |
| Failure Rate | Percentage of runs that ended in failure. Zero is the target. Any non-zero rate deserves investigation — click through to see which agents failed and why. |
| Tokens | Total tokens consumed across all runs, broken down into input, output, and cost. This is your primary cost signal. The sparkline trends token usage over the window. |
| Avg Duration | Mean run duration across all sessions. Watch for this creeping upward — longer runs mean higher cost and slower feedback loops. |
| Score Coverage | Percentage of runs that received a quality score (via review or audit). Low coverage means you’re flying blind on agent output quality. |
| Execution Ops | Total commands executed and conversation turns across all runs. High op counts with low completion may indicate agents are spinning without progress. |
Every card links directly to the Agent Sessions view, filtered to the relevant metric.
Token usage charts
The center panel is the heart of the drilldown — a multi-view chart that breaks down token consumption across three dimensions. Toggle between views using the tab buttons above the chart.
By Agent
The default view. A stacked bar chart showing total tokens per agent, split into three categories:
- Input (uncached) — tokens sent to the model that weren’t served from cache.
- Output — tokens generated by the model.
- Cached — input tokens served from prompt cache, reducing cost.
This view answers: Which agents are consuming the most resources? Agents with disproportionately high token counts may need prompt optimization, capability tuning, or task scope reduction.
By Provider
A bar chart showing total tokens grouped by provider (e.g., Codex, Claude). Use this to understand your provider cost split and make informed decisions about which models to use for which workloads.
By Model
A stacked bar chart showing tokens per model (e.g., GPT 5.4, Opus), split by input, output, and cached. This is your most granular cost optimization lever — if one model is consuming far more tokens than another for similar work, consider whether a more efficient model could handle that workload.
Drilldown Perspectives carousel
Four perspectives provide targeted operational insights:
Token Pressure
Ranks agents by total token consumption with cost attribution. Each row shows the agent handle, total tokens, cost, and run count. The severity badge (HIGH, MEDIUM, LOW) is based on concentration — if a single agent dominates token usage, that’s flagged as high pressure.
Click Open on any agent to jump to their filtered session list.
Score Coverage
Shows which agents have been scored and their average quality ratings. Low score coverage means you’re running agents without verifying output quality — a risk that compounds over time.
Reliability Watch
Surfaces agents with the highest failure rates. Each row shows the agent handle, failure count, and total runs. Zero failures across all agents is the target state. Any failures here should trigger a session review to understand the root cause.
Attribution Confidence
Reports how well agent runs are mapped to known agent identities. Runs tagged as “unknown target” indicate sessions that couldn’t be attributed to a registered agent — which means you have blind spots in your execution trail.
| Confidence level | What it means |
|---|---|
| High | Run is cleanly attributed to a known agent. |
| Medium | Run is attributed but with some ambiguity. |
| Low | Attribution is uncertain — review the session metadata. |
| Unknown | Run couldn’t be mapped to any registered agent. |
Putting it to work
Step 1 — Check Failure Rate and Runs
Start here every time. Zero failures with a healthy run count means your agents are reliable. Any failures warrant clicking through to the session details.
Step 2 — Review Token Pressure
Switch to the Token Pressure perspective. If one agent is consuming 80% of your tokens, that’s the one to optimize first — whether through prompt tuning, model selection, or task scoping.
Step 3 — Compare By Model
Toggle to the By Model chart view. If a cheaper model can handle the same workload, you’ve found an easy cost win. Use the provider and model views together to build a cost optimization strategy.
Step 4 — Verify Score Coverage
Low score coverage means you’re trusting agent output without verification. Increase audit frequency or enable automated scoring to close the gap.
Next steps
- Open Agent Sessions to inspect individual runs, replay decision trails, and see exactly what each agent did.
- Use Agents to adjust agent configurations, prompts, or model assignments based on what you’ve learned.
- Return to Workspace Pulse to check whether agent activity is translating into delivery progress.