AI orchestration for an accounting firm managing 100+ client administrations

An end-to-end AI automation stack that sits on top of an existing accounting platform to automate cross-administration tasks: orchestration, GL suggestions, reporting, and human-in-the-loop, for an accounting firm.

Client
Administratiekantoor
Status
In actieve ontwikkeling, gefaseerde uitrol naar productie
Tech stack
TypeScriptNode.js 20+Next.js 16Express 5SQLite (better-sqlite3)PlaywrightMCP SDKClaudeTelegram botPM2

The client

For a specialized accounting firm managing 100+ client administrations, we're developing an AI automation stack on top of their existing accounting platform (Yuki). The firm handles administration and bookkeeping for a wide range of clients. An operation where volume, auditability, and compliance all need to line up at once.

The client has asked not to be named; technical details are presented independently of any client-specific data.

The problem

The client has used a well-established accounting platform for years, with its own IDR stack (OCR, classification, supplier recognition per document). That platform does one thing well: extracting the right basics per individual document. What it doesn't do:

  • Spot patterns across administrations — if supplier X is always booked to GL account Y in administration A, you'd want that to be proposed in administration B too
  • Proactively flag to an operator that a known document type is stuck somewhere in the workflow
  • Perform one-click booking via the SOAP API once the operator approves
  • Generate multi-admin reports without the operator manually exporting CSVs from 100+ administrations

With 100+ administrations, each of those actions becomes exponentially more expensive. That's exactly the gap our stack fills.

Our positioning: complementary, not a replacement

A key design choice: we do not replace the platform's IDR features. We explicitly position our stack as complementary. The accounting platform does what it does well (per-document recognition). Our stack adds the cross-administration layer: orchestration, supplier-to-GL suggestions based on history, one-click booking, and monitoring of where the platform's own automation gets stuck.

That distinction prevents endless re-implementation and keeps our stack sharply focused on where the real gains are.

The architecture

We've designed the stack in six layers, each with a clear responsibility:

1. Intake — documents and triggers

  • Email intake via Gmail PubSub
  • Webhook intake (POST /hooks/document-intake)
  • Manual upload via the admin dashboard
  • Scheduled jobs (nightly sync at 02:00, hourly retry, morning review)

2. Intake server

Express server that deduplicates incoming PDFs (SHA-256 + supplier|invoice|amount), stores them, and triggers the AI orchestration layer.

3. AI orchestration layer

A locally-running MCP-compatible agent gateway that coordinates seven specialized AI skills:

SkillRole
Admin contextLoad administrative context for decisions
Document archiveStore PDF in the accounting platform
Supplier matchMatch invoice to a known supplier
Build proposalGenerate booking proposal including GL account
Exception reviewHandle deviations and edge cases
Review callbackProcess operator approval or rejection
Nightly syncSync the local datastore with the accounting platform

The agents talk to the accounting platform via two MCP servers: one for the SOAP API (29 tools), and a custom browser-automation MCP (on Playwright) for functionality not available through SOAP. We've open-sourced the SOAP-MCP server so the community can benefit from it too.

4. Policy engine

For each booking proposal an explicit policy determines which of three outcomes applies:

OutcomeConditionAction
Auto-postExact match · amount below thresholdPosted automatically
Review queueLikely match · amount above thresholdRouted to operator
Dead letterDuplicate or hard blockHeld for manual action

Thresholds and rules are configurable per client and per supplier type — so the policy stays explainable and steerable.

5. Human-in-the-loop via Telegram

Exceptions aren't dropped but actively presented. For each review or dead-letter invoice, the operator receives a Telegram message with inline buttons: approve, reject, or retry. The decision is sent back via a review callback and logged in the audit trail.

6. Admin dashboard

A Next.js admin dashboard where the operator:

  • views workflow status per administration (open, data-entry complete, in progress, settled)
  • generates multi-admin reports (GL balances, depreciation, salary costs, workflow)
  • tunes the policy engine (thresholds, supplier rules, GL labels)
  • searches the audit trail for compliance questions

7. Infra: observability and self-healing

  • PM2 for process management
  • Watchdog polling gateway health, auto-restarting on failure, and sending Telegram alerts on issues
  • Circuit breaker patterns: a max number of restarts per hour to prevent stuck-state loops
  • Immutable audit log in SQLite — every decision, prompt, and tool call is reproducible after the fact

What we've delivered without claiming what we can't

At the time of writing, the stack is not yet fully in production. We've deliberately chosen a careful rollout per pipeline component, with the client closely involved at every step. What already works:

  • ✅ Full document intake pipeline (email / webhook / upload with deduplication)
  • ✅ AI skills for archiving, supplier matching, and proposal generation
  • ✅ Policy engine with three-state classification
  • ✅ Telegram integration for human-in-the-loop
  • ✅ Multi-administration nightly sync (100+ administrations via SOAP)
  • ✅ Admin dashboard with workflow overviews and reports
  • ✅ Watchdog + self-healing
  • ✅ Open-source MCP server published on npm + GitHub

In development / on the roadmap:

  • 🚧 Cross-admin GL suggestions based on supplier_gl_history
  • 🚧 Full multi-admin reporting pipeline
  • 🚧 Gradual rollout to production per client administration

What we take from this

This engagement gave us solid patterns we now apply with other clients as well:

  • Build complementary instead of replacing — accept what existing platforms do well and focus on the white space
  • MCP as integration standard for AI agents — production-grade, auditable, and reusable
  • A policy engine for agent autonomy — write down explicitly what the AI may and may not do, instead of hoping the model behaves
  • Human-in-the-loop as a feature, not a fallback — operator buttons via Telegram/Slack actually make automation easier to adopt
  • Audit trail as the basis for trust — immutable logging makes every AI decision explainable later
  • Open-source publication of what's generic enough to extract — strengthens expertise and brings community feedback in

Stack and approach

LayerChoice
RuntimeNode.js 20+, TypeScript (strict, ESM)
Monoreponpm workspaces
Frontend dashboardNext.js 16 (App Router), React 19, Tailwind CSS
Backend (intake)Express 5
DatabaseSQLite via better-sqlite3 (local, fast, audit-trail-friendly)
Browser automationPlaywright (Chromium)
AI protocolModel Context Protocol (@modelcontextprotocol/sdk)
LLMClaude (via MCP-compatible agent gateway)
Human-in-the-loopTelegram bot with inline keyboards
Process mgmtPM2 with ecosystem config and cron_restart
HostingDedicated local machine + SMB mount for logs/db
TestingVitest

Who is this relevant for?

  • Accounting firms with dozens or hundreds of client administrations seeking cross-admin automation
  • Finance organizations wanting to strengthen their existing accounting platform with AI without lock-in
  • Software partners building agentic workflows around a SOAP or REST API
  • Anyone evaluating how to integrate AI safely into a production process with audit trail and human control

Related

Looking into AI orchestration yourself?

Do you have a process spread across dozens of administrations or sites, and want to explore how AI could speed up orchestration and reporting — while keeping control and compliance? Schedule a no-obligation conversation via info@codemill.dev or the contact form. We'd love to think through it with you.