Every number here is synthetic — and honestly labeled.
Tracewell is a portfolio demo: the 50 runs are fabricated, but the token counts are derived from real prompt lengths and the failure modes are ones you'd actually get paged for. No live model, no real systems, no invented benchmarks. That discipline is the point.
Run Timelineloading…
—
Select a run to view trace.
Select a step to inspect.
What this is
Portfolio demo showing production agent observability concepts
50 synthetic runs across 3 fictional agents at Redwood Labs (B2B SaaS)
Token counts derived from real prompt lengths — not fabricated
The v3→v4 compliance policy regression models a real class of production bug
Failure modes match what real teams get paged for at 3 AM
Honest limits
No live model — replay routes to a local JSON fixture, not a real LLM
No real agent data — all runs, companies, and contracts are synthetic
No ingestion SDK — this does not instrument real agents; it visualizes pre-recorded data
No persistent storage — session state is in-memory only; refreshing resets everything
No multi-agent orchestration graphs — single-agent linear step trees only
Prompt diffs compare system-prompt snapshots, not full multi-turn conversation histories