# Reviewable AI Agent Workflows
Autonomous agents are easy to demo and hard to trust. The missing layer is not another prompt. It is reviewable execution: evidence, findings, scores, gates, and approval records attached to the work itself.

Reviewable AI Agent Workflows are how Valdr keeps agent output from becoming untraceable automation. Every serious workflow needs a way to inspect what happened, decide whether it is good enough, and stop work before it ships.

Valdr makes review part of the runtime, not an afterthought.

## What changes

Without reviewable workflows, agent output arrives as a blob of text or a diff with unclear provenance. Reviewers must reconstruct the prompt, the constraints, the commands, the files changed, and the reasoning after the fact.

With Valdr, review and audit attach to the execution timeline. Reviewers can inspect the session, publish findings, request changes, approve work, and record scores. Auditors can evaluate the run against structured dimensions and preserve the result for dashboards and future decisions.

{{< cards >}}
  {{< card title="Evidence travels with the work" subtitle="Reviewers can inspect transcripts, prompts, events, diffs, task context, and prior decisions" icon="eye" >}}
  {{< card title="Approval gates are explicit" subtitle="Work can stop at review, audit, or readiness checks before it advances" icon="shield-check" >}}
  {{< card title="Findings become workflow state" subtitle="Review decisions, comments, scores, and recommendations attach to the task and session" icon="clipboard-list" >}}
  {{< card title="Audits measure quality" subtitle="Seven-dimension scorecards turn agent runs into comparable, inspectable evidence" icon="chart-bar" >}}
{{< /cards >}}

## What reviewers can now do

Reviewable workflows turn agent output into an evidence package.

| Review goal | What Valdr enables |
|-------------|--------------------|
| Understand what happened | Inspect the session timeline, prompt, config, tool calls, commands, and worktree diff |
| Evaluate against acceptance criteria | Review the output in the context of the task, requirements, and project expectations |
| Publish findings | Record review comments, status, recommendation, and lightweight scores |
| Route changes | Send feedback back to the executor session or launch a follow-up run |
| Audit quality | Launch auditor agents and ingest structured scorecards for the run |
| Gate completion | Require approved reviews and score evidence before work advances |

This is the difference between "the agent says it is done" and "the work is reviewable."

## Review and audit are different layers

Valdr separates review from audit because they answer different questions.

| Layer | Question it answers | Typical output |
|-------|---------------------|----------------|
| **Review** | Is this work acceptable for the task? | Findings, comments, approval, changes requested, task-level score |
| **Audit** | How did this session perform as an agent run? | Evidence review, seven-dimension scorecard, scored session history |

Review is the delivery gate. Audit is the quality and reliability signal. Together they make agent work inspectable at both the task level and the system level.

## The reviewable workflow loop

{{% steps %}}

### Capture execution

The executor session records prompt, config, transcript, events, commands, and file changes.

### Launch review

A reviewer sees the task, source session, worktree diff, and relevant context instead of starting from a cold diff.

### Publish decision

The reviewer records findings, score, recommendation, and status. Approval and change requests become workflow state.

### Audit the session

An auditor can inspect compact evidence and produce a seven-dimension scorecard that stays attached to the scored session.

### Advance or route feedback

If gates pass, the task can move forward. If gates fail, the orchestrator routes feedback to the executor or launches follow-up work.

{{% /steps %}}

Every step leaves evidence. That is what makes agent workflows operational instead of speculative.

## What gets preserved

| Evidence | Why it matters |
|----------|----------------|
| **Session transcript** | Shows the agent's reasoning, tool calls, commands, and responses |
| **Prompt and config** | Proves what instructions, model, tools, and launcher preset shaped the run |
| **Worktree diff** | Shows the actual code or docs changed by the agent |
| **Review comments** | Captures human or reviewer-agent findings in the task record |
| **Review status** | Makes approval, rejection, and changes-requested states explicit |
| **Audit scorecards** | Provide structured quality signals across agent runs |
| **Execution timeline** | Keeps feedback, follow-up instructions, and review decisions attached to the same run |

This is what most AI agent systems skip. Valdr treats review evidence as part of the product, not cleanup after the work is done.

Over time, review history becomes durable operational evidence instead of disappearing into chat transcripts and pull request comments. Teams can see which agents repeatedly need correction, which workflows produce reliable output, and where standards should be encoded as capabilities.

## Pairing with orchestration

Reviewable workflows are the governance layer in the Valdr stack:

- **Workspace Knowledge** gives reviewers source and decision context.
- **Agent Sessions** preserve the execution timeline.
- **Team Capabilities** define review and audit standards.
- **Multi-Agent Orchestration** routes work through reviewers, auditors, and feedback loops.

Together, they let teams ship agent-assisted work without pretending autonomy removes the need for judgment.

## Guardrails

Review and audit should make workflows safer without hiding responsibility:

- Do not treat executor output as complete until review gates pass.
- Keep reviewer and auditor roles distinct when quality matters.
- Attach review feedback to the execution timeline instead of starting disconnected chats.
- Use structured scores as evidence, not as a substitute for reviewing the actual work.
- Let failed reviews or low scores stop the workflow.
- Preserve enough transcript and diff context for later debugging.

Good governance does not slow agent workflows down by default. It stops the wrong work at the right point.

## Next steps

- Read the [`pm_review` MCP reference](/valdr/docs/valdr-mcp/reviews/) for review lifecycle actions.
- Read the [`pm_audit` MCP reference](/valdr/docs/valdr-mcp/audits/) for auditor launches and score ingestion.
- Pair this with [Multi-Agent Workflow Orchestration](/valdr/docs/features/orchestration/) to see how review gates control workflow progression.

