The client
For a specialized accounting firm managing 100+ client administrations, we're developing an AI automation stack on top of their existing accounting platform (Yuki). The firm handles administration and bookkeeping for a wide range of clients. An operation where volume, auditability, and compliance all need to line up at once.
The client has asked not to be named; technical details are presented independently of any client-specific data.
The problem
The client has used a well-established accounting platform for years, with its own IDR stack (OCR, classification, supplier recognition per document). That platform does one thing well: extracting the right basics per individual document. What it doesn't do:
- Spot patterns across administrations — if supplier X is always booked to GL account Y in administration A, you'd want that to be proposed in administration B too
- Proactively flag to an operator that a known document type is stuck somewhere in the workflow
- Perform one-click booking via the SOAP API once the operator approves
- Generate multi-admin reports without the operator manually exporting CSVs from 100+ administrations
With 100+ administrations, each of those actions becomes exponentially more expensive. That's exactly the gap our stack fills.
Our positioning: complementary, not a replacement
A key design choice: we do not replace the platform's IDR features. We explicitly position our stack as complementary. The accounting platform does what it does well (per-document recognition). Our stack adds the cross-administration layer: orchestration, supplier-to-GL suggestions based on history, one-click booking, and monitoring of where the platform's own automation gets stuck.
That distinction prevents endless re-implementation and keeps our stack sharply focused on where the real gains are.
The architecture
We've designed the stack in six layers, each with a clear responsibility:
1. Intake — documents and triggers
- Email intake via Gmail PubSub
- Webhook intake (POST
/hooks/document-intake) - Manual upload via the admin dashboard
- Scheduled jobs (nightly sync at 02:00, hourly retry, morning review)
2. Intake server
Express server that deduplicates incoming PDFs (SHA-256 + supplier|invoice|amount), stores them, and triggers the AI orchestration layer.
3. AI orchestration layer
A locally-running MCP-compatible agent gateway that coordinates seven specialized AI skills:
| Skill | Role |
|---|---|
| Admin context | Load administrative context for decisions |
| Document archive | Store PDF in the accounting platform |
| Supplier match | Match invoice to a known supplier |
| Build proposal | Generate booking proposal including GL account |
| Exception review | Handle deviations and edge cases |
| Review callback | Process operator approval or rejection |
| Nightly sync | Sync the local datastore with the accounting platform |
The agents talk to the accounting platform via two MCP servers: one for the SOAP API (29 tools), and a custom browser-automation MCP (on Playwright) for functionality not available through SOAP. We've open-sourced the SOAP-MCP server so the community can benefit from it too.
4. Policy engine
For each booking proposal an explicit policy determines which of three outcomes applies:
| Outcome | Condition | Action |
|---|---|---|
| Auto-post | Exact match · amount below threshold | Posted automatically |
| Review queue | Likely match · amount above threshold | Routed to operator |
| Dead letter | Duplicate or hard block | Held for manual action |
Thresholds and rules are configurable per client and per supplier type — so the policy stays explainable and steerable.
5. Human-in-the-loop via Telegram
Exceptions aren't dropped but actively presented. For each review or dead-letter invoice, the operator receives a Telegram message with inline buttons: approve, reject, or retry. The decision is sent back via a review callback and logged in the audit trail.
6. Admin dashboard
A Next.js admin dashboard where the operator:
- views workflow status per administration (open, data-entry complete, in progress, settled)
- generates multi-admin reports (GL balances, depreciation, salary costs, workflow)
- tunes the policy engine (thresholds, supplier rules, GL labels)
- searches the audit trail for compliance questions
7. Infra: observability and self-healing
- PM2 for process management
- Watchdog polling gateway health, auto-restarting on failure, and sending Telegram alerts on issues
- Circuit breaker patterns: a max number of restarts per hour to prevent stuck-state loops
- Immutable audit log in SQLite — every decision, prompt, and tool call is reproducible after the fact
What we've delivered without claiming what we can't
At the time of writing, the stack is not yet fully in production. We've deliberately chosen a careful rollout per pipeline component, with the client closely involved at every step. What already works:
- ✅ Full document intake pipeline (email / webhook / upload with deduplication)
- ✅ AI skills for archiving, supplier matching, and proposal generation
- ✅ Policy engine with three-state classification
- ✅ Telegram integration for human-in-the-loop
- ✅ Multi-administration nightly sync (100+ administrations via SOAP)
- ✅ Admin dashboard with workflow overviews and reports
- ✅ Watchdog + self-healing
- ✅ Open-source MCP server published on npm + GitHub
In development / on the roadmap:
- 🚧 Cross-admin GL suggestions based on
supplier_gl_history - 🚧 Full multi-admin reporting pipeline
- 🚧 Gradual rollout to production per client administration
What we take from this
This engagement gave us solid patterns we now apply with other clients as well:
- Build complementary instead of replacing — accept what existing platforms do well and focus on the white space
- MCP as integration standard for AI agents — production-grade, auditable, and reusable
- A policy engine for agent autonomy — write down explicitly what the AI may and may not do, instead of hoping the model behaves
- Human-in-the-loop as a feature, not a fallback — operator buttons via Telegram/Slack actually make automation easier to adopt
- Audit trail as the basis for trust — immutable logging makes every AI decision explainable later
- Open-source publication of what's generic enough to extract — strengthens expertise and brings community feedback in
Stack and approach
| Layer | Choice |
|---|---|
| Runtime | Node.js 20+, TypeScript (strict, ESM) |
| Monorepo | npm workspaces |
| Frontend dashboard | Next.js 16 (App Router), React 19, Tailwind CSS |
| Backend (intake) | Express 5 |
| Database | SQLite via better-sqlite3 (local, fast, audit-trail-friendly) |
| Browser automation | Playwright (Chromium) |
| AI protocol | Model Context Protocol (@modelcontextprotocol/sdk) |
| LLM | Claude (via MCP-compatible agent gateway) |
| Human-in-the-loop | Telegram bot with inline keyboards |
| Process mgmt | PM2 with ecosystem config and cron_restart |
| Hosting | Dedicated local machine + SMB mount for logs/db |
| Testing | Vitest |
Who is this relevant for?
- Accounting firms with dozens or hundreds of client administrations seeking cross-admin automation
- Finance organizations wanting to strengthen their existing accounting platform with AI without lock-in
- Software partners building agentic workflows around a SOAP or REST API
- Anyone evaluating how to integrate AI safely into a production process with audit trail and human control
Related
- 🧩 Open-source MCP server that came out of this project
- 📘 Technical blog post on MCP and accounting
- 🤖 Our AI services
Looking into AI orchestration yourself?
Do you have a process spread across dozens of administrations or sites, and want to explore how AI could speed up orchestration and reporting — while keeping control and compliance? Schedule a no-obligation conversation via info@codemill.dev or the contact form. We'd love to think through it with you.
