GenAI Observability Platform
AI Engineering · Data Engineering
Overview
A purpose-built observability platform for production GenAI workloads. The system captures every LLM call across any provider (including direct API integrations and multi-provider routing services) and surfaces token usage, response times, full inputs/outputs, and cost metrics through an intuitive interface designed for navigating millions of calls per day.
Challenge
The client operated complex multi-agent workflows across multiple LLM providers and routing services, generating millions of GenAI calls daily. Existing observability tools weren't built for LLM workloads. They couldn't correlate calls across providers, trace agent decision chains, or surface token-level cost breakdowns. The client also needed the instrumentation to have near-zero performance impact on their production agents, ruling out synchronous logging approaches.
Solution
We built Workflows: a fully async observability pipeline designed for minimal instrumentation overhead. A lightweight SDK captures call metadata, full inputs, and full outputs, then pushes events to SQS, decoupling telemetry collection from agent execution. An ingestion service processes the queue and indexes structured call data into Elasticsearch, optimized for high-cardinality queries across provider, model, agent, workflow, and time dimensions. The frontend provides intuitive drill-down from high-level workflow traces to individual call payloads, with full input/output inspection at every level. Built-in reporting lets the client generate output-based analytics, compare model performance across versions and releases, and track quality and cost trends over time. The entire system was designed around the client's domain: data models, navigation patterns, terminology, and reporting workflows all reflect how their team actually operates, not a generic dashboard bolted on after the fact.
Results
1M+
Daily GenAI calls tracked
Provider-agnostic
Any LLM, any routing service
<1ms
Instrumentation overhead
Sub-second
Query latency at scale