AI Orchestration
That Ships.
Two MIT CSAIL doctorates who built an agent harness and a human preference learning framework before the field had names for either. We design and build AI orchestration for companies that need it to work in production.
What we do
Services for companies serious about production-grade AI systems.
AI System Design & Build
We design and build the full stack of your AI system — multi-agent orchestration, LLM harnesses, structured output, RAG systems, and tool use. From architecture to handoff, production-grade and maintainable by your team.
AI Evaluation & Metrics
We design evaluation frameworks that measure what your AI system actually does — not proxies that drift from user value as you optimize against them. Includes human preference collection, benchmark design, and audit of existing metrics for Goodhart's Law failure modes.
AI System Audits
We review your existing AI system for reliability, cost efficiency, latency, and risk. You'll receive a prioritized report with concrete recommendations, not a generic framework.
Advisory
Ongoing strategic guidance for leadership. We help CTOs and VPs make informed decisions on AI vendor selection, build-vs-buy tradeoffs, team structure, and roadmap prioritization.
Why Vassar A.I.
The name comes from the address. The Stata Center — home of MIT's Computer Science and Artificial Intelligence Laboratory — sits at 32 Vassar Street in Cambridge. We spent our doctoral years there, working on problems that didn't have names yet.
2014, MIT CSAIL
CoMo
An orchestration layer that held a structured conversation with a user, translated intent into what the AI backend required, managed state, selected and retried tools on failure, and closed a verification loop.
Today this is called
Agent harness
2012, MIT CSAIL
CLAIRE
A framework for ranking AI systems from forced-choice pairwise human comparisons — proven unique global optimum, active selection to minimize annotation cost, evaluations for under $20.
Today this is called
RLHF preference modeling
We didn't learn this from a conference talk. We built the underlying systems.
Work
Selected engagements — client names withheld by agreement.
Academic publisher · 2022
Healthcare Workforce Training
We designed and built an AI-powered test-generation and assessment system to help train nurses during the COVID-era shortage — a high-stakes application at the intersection of healthcare education and language AI, at a moment when the cost of getting it wrong was measured in patient outcomes.
Book a Free 30-Min Session
No pitch. No sales funnel. Just a focused conversation with an AI orchestration expert to review your current setup, challenges, or plans.
For CTOs, VPs, Directors, and technical leads
About Vassar A.I.
Co-Founder
Andrew Sabisch
Andrew's doctoral research at MIT CSAIL produced CoMo — a whiteboard that converses about code. The core contribution was the Mixed-Initiative Code-Generation Framework: an orchestration layer that decided what to ask, when to ask it, and how to translate user-expressed intent into what the AI backend required. It managed state, selected and retried tools on failure, and closed the loop through animated code verification. In 2026, this pattern has a name: agent harness. Back in 2014, Andrew was proud to call it his dissertation.
Co-Founder
Ali Mohammad
Ali's doctoral research at MIT CSAIL produced CLAIRE — a framework for ranking AI systems from forced-choice pairwise human comparisons. It proved a unique global optimum, used active selection to minimize annotation cost, and outperformed BLEU at capturing what humans actually preferred. RLHF — the technique behind InstructGPT and ChatGPT — is structurally the same framework. Ali built the mathematical infrastructure for it in 2012, a decade before the field needed it at scale. His thesis also diagnosed what is now called Goodhart's Law in ML: optimize a proxy metric long enough and it stops measuring what you care about.