01 / AI

AI agents & RAG

Production AI agents and retrieval pipelines that survive contact with real users.

What we ship

An agent or RAG feature that handles ten thousand real queries a week without melting. The whole thing: prompt strategy, retrieval, tool surface, rate limiting, observability, eval harness, cost router.

Model-agnostic by design. Claude, GPT, Gemini, open-weights via Bedrock or Vertex, all routed by request shape so the average request hits the cheapest model that can answer it. Most engagements cut per-session cost six to ten times versus a 'send everything to Opus' baseline.

Where teams usually break

Vector store with no tenant filter. CSV upload that overrides the system prompt. Embeddings recomputed every chat turn. SQL tool exposed to the model. No max-steps cap, so a prompt injection eats your token budget.

We have shipped fixes for every one of these in production. The patterns are in the field-notes blog. The work is in your codebase.

Stack we have in production

Claude (Anthropic API, Bedrock), GPT-5 and GPT-4o (OpenAI, Azure OpenAI), Gemini (Vertex), open-weights via Bedrock or vLLM. Retrieval on Azure AI Search, AWS Knowledge Bases, Vertex Vector Search, or pgvector. Embeddings via voyage-3, text-embedding-3, or gte. Frameworks: Claude Agent SDK, LangGraph, LlamaIndex, or hand-rolled when the abstraction costs more than it saves.

What you receive

Deliverables, by line item.

Working agent or RAG endpoint deployed to your cloud
Retrieval pipeline with tenant isolation and cache
Tool surface defined and constrained (no raw SQL or shell)
Eval harness that runs in CI on every change
Cost-router middleware (Haiku for routing, Sonnet for reasoning, Opus where it earns it)
Production runbook + on-call dashboard

How the engagement runs

Four steps, no mystery.

01
Discovery call
Thirty minutes with the senior engineer who does the work. Your environment, your constraint, your deadline.
02
Written scope in 48h
Deliverables, timeline, and one fixed price, in writing. Sign it through your normal procurement flow.
03
Build in your tenant
Work lands in your repos and your cloud accounts from week one. A demo of working output every week.
04
Handoff, documented
Runbook, architecture decisions, and a working session with your team. You own everything we built.

Other practice areas

Engagements often combine these.

02 / Cloud

Multi-cloud architecture

03 / DevOps

DevOps & IaC

04 / Marketplace

Marketplace engineering