AI orchestration for an accounting firm managing 100+ client administrations

The client

For a specialized accounting firm managing 100+ client administrations, we're developing an AI automation stack on top of their existing accounting platform (Yuki). The firm handles administration and bookkeeping for a wide range of clients. An operation where volume, auditability, and compliance all need to line up at once.

The client has asked not to be named; technical details are presented independently of any client-specific data.

The problem

The client has used a well-established accounting platform for years, with its own IDR stack (OCR, classification, supplier recognition per document). That platform does one thing well: extracting the right basics per individual document. What it doesn't do:

Spot patterns across administrations — if supplier X is always booked to GL account Y in administration A, you'd want that to be proposed in administration B too
Proactively flag to an operator that a known document type is stuck somewhere in the workflow
Perform one-click booking via the SOAP API once the operator approves
Generate multi-admin reports without the operator manually exporting CSVs from 100+ administrations

With 100+ administrations, each of those actions becomes exponentially more expensive. That's exactly the gap our stack fills.

Our positioning: complementary, not a replacement

A key design choice: we do not replace the platform's IDR features. We explicitly position our stack as complementary. The accounting platform does what it does well (per-document recognition). Our stack adds the cross-administration layer: orchestration, supplier-to-GL suggestions based on history, one-click booking, and monitoring of where the platform's own automation gets stuck.

That distinction prevents endless re-implementation and keeps our stack sharply focused on where the real gains are.

The architecture

We've designed the stack in six layers, each with a clear responsibility:

1. Intake — documents and triggers

Email intake via Gmail PubSub
Webhook intake (POST /hooks/document-intake)
Manual upload via the admin dashboard
Scheduled jobs (nightly sync at 02:00, hourly retry, morning review)

2. Intake server

Express server that deduplicates incoming PDFs (SHA-256 + supplier|invoice|amount), stores them, and triggers the AI orchestration layer.

3. AI orchestration layer

A locally-running MCP-compatible agent gateway that coordinates seven specialized AI skills:

Skill	Role
Admin context	Load administrative context for decisions
Document archive	Store PDF in the accounting platform
Supplier match	Match invoice to a known supplier
Build proposal	Generate booking proposal including GL account
Exception review	Handle deviations and edge cases
Review callback	Process operator approval or rejection
Nightly sync	Sync the local datastore with the accounting platform

The agents talk to the accounting platform via two MCP servers: one for the SOAP API (29 tools), and a custom browser-automation MCP (on Playwright) for functionality not available through SOAP. We've open-sourced the SOAP-MCP server so the community can benefit from it too.

4. Policy engine

For each booking proposal an explicit policy determines which of three outcomes applies:

Outcome	Condition	Action
Auto-post	Exact match · amount below threshold	Posted automatically
Review queue	Likely match · amount above threshold	Routed to operator
Dead letter	Duplicate or hard block	Held for manual action

Thresholds and rules are configurable per client and per supplier type — so the policy stays explainable and steerable.

5. Human-in-the-loop via Telegram

Exceptions aren't dropped but actively presented. For each review or dead-letter invoice, the operator receives a Telegram message with inline buttons: approve, reject, or retry. The decision is sent back via a review callback and logged in the audit trail.

6. Admin dashboard

A Next.js admin dashboard where the operator:

views workflow status per administration (open, data-entry complete, in progress, settled)
generates multi-admin reports (GL balances, depreciation, salary costs, workflow)
tunes the policy engine (thresholds, supplier rules, GL labels)
searches the audit trail for compliance questions

7. Infra: observability and self-healing

PM2 for process management
Watchdog polling gateway health, auto-restarting on failure, and sending Telegram alerts on issues
Circuit breaker patterns: a max number of restarts per hour to prevent stuck-state loops
Immutable audit log in SQLite — every decision, prompt, and tool call is reproducible after the fact

What we've delivered without claiming what we can't

At the time of writing, the stack is not yet fully in production. We've deliberately chosen a careful rollout per pipeline component, with the client closely involved at every step. What already works:

✅ Full document intake pipeline (email / webhook / upload with deduplication)
✅ AI skills for archiving, supplier matching, and proposal generation
✅ Policy engine with three-state classification
✅ Telegram integration for human-in-the-loop
✅ Multi-administration nightly sync (100+ administrations via SOAP)
✅ Admin dashboard with workflow overviews and reports
✅ Watchdog + self-healing
✅ Open-source MCP server published on npm + GitHub

In development / on the roadmap:

🚧 Cross-admin GL suggestions based on supplier_gl_history
🚧 Full multi-admin reporting pipeline
🚧 Gradual rollout to production per client administration

What we take from this

This engagement gave us solid patterns we now apply with other clients as well:

Build complementary instead of replacing — accept what existing platforms do well and focus on the white space
MCP as integration standard for AI agents — production-grade, auditable, and reusable
A policy engine for agent autonomy — write down explicitly what the AI may and may not do, instead of hoping the model behaves
Human-in-the-loop as a feature, not a fallback — operator buttons via Telegram/Slack actually make automation easier to adopt
Audit trail as the basis for trust — immutable logging makes every AI decision explainable later
Open-source publication of what's generic enough to extract — strengthens expertise and brings community feedback in

Stack and approach

Layer	Choice
Runtime	Node.js 20+, TypeScript (strict, ESM)
Monorepo	npm workspaces
Frontend dashboard	Next.js 16 (App Router), React 19, Tailwind CSS
Backend (intake)	Express 5
Database	SQLite via better-sqlite3 (local, fast, audit-trail-friendly)
Browser automation	Playwright (Chromium)
AI protocol	Model Context Protocol (`@modelcontextprotocol/sdk`)
LLM	Claude (via MCP-compatible agent gateway)
Human-in-the-loop	Telegram bot with inline keyboards
Process mgmt	PM2 with ecosystem config and cron_restart
Hosting	Dedicated local machine + SMB mount for logs/db
Testing	Vitest

Who is this relevant for?

Accounting firms with dozens or hundreds of client administrations seeking cross-admin automation
Finance organizations wanting to strengthen their existing accounting platform with AI without lock-in
Software partners building agentic workflows around a SOAP or REST API
Anyone evaluating how to integrate AI safely into a production process with audit trail and human control

Looking into AI orchestration yourself?

Do you have a process spread across dozens of administrations or sites, and want to explore how AI could speed up orchestration and reporting — while keeping control and compliance? Schedule a no-obligation conversation via info@codemill.dev or the contact form. We'd love to think through it with you.