rigami.
AI Practice
/// evals

Evals & observability

We treat LLM apps like any other production system — offline evals, online evals, regression suites, tracing and token budgets.

LangSmithBraintrustOpenTelemetry
/// How we deliver

Production AI,
not demo-AI.

Every AI feature we ship is wrapped in the same discipline as the rest of our production systems — offline evals, online evals, tracing, cost budgets and rollback.

We pick the right model per job (OpenAI GPT-5, Anthropic Claude Sonnet 4.5, Google Gemini 3, Gemini Nano Banana for images) and we stay portable — no vendor lock-in.

You own the prompts, the eval sets, the observability, and the infra. We leave you a system you can run, not a black box.

Ready to ship AI that
survives production?

Start a conversation