97% test coverage, 6 AI vendors orchestrated under hard cost ceilings Built for clinical practices and high-trust service businesses

NXA Content Factory, Telegram-Driven Generative Content Engine

An AI-native content orchestration engine for clinical practices. Produces high-fidelity short-form videos and images (Urdu and English) from a single Telegram command, guarded by Human-in-the-Loop approvals and strict financial guardrails. 194 tests, 97.46% coverage.

Node.js 22 LTS + TypeScript 5.5 (ESM) oclif (CLI) + grammY (Telegram) better-sqlite3 (Local Ledger) Cloudflare R2 (Object Storage) Google Veo 3.1 Fast (Video AI) Claude Haiku 4.5 (Reasoning) fal.ai + OpenAI TTS + Uplift Orator FFmpeg (Media Stitching)

Published May 2026 Private repository, hexagonal architecture available on request

TL;DR

An AI-native content orchestration engine for clinical practices and high-trust service businesses. Send /video "Doctor gives advice on skincare" to a Telegram bot, and the system orchestrates a sophisticated multi-vendor AI pipeline (Claude Haiku 4.5 for scripting, Veo 3.1 Fast for video, fal.ai Hummingbird for lipsync, OpenAI / Uplift Orator for TTS, FFmpeg for stitching) to deliver a brand-aligned video for approval. 194 tests, 97.46% coverage. Built on a pristine hexagonal architecture.

The Problem

Clinical practices, aesthetic clinics, and high-trust service businesses cannot scale content production manually:

Hiring an in-house team is expensive and slow to ramp.
Generative AI alone hallucinates compliance-sensitive details (especially in healthcare-adjacent verticals).
Multi-vendor AI orchestration is fragile without idempotency, cost ledgers, and circuit breakers. One bad night of unattended generation can ring up a four-figure invoice.

What was needed: a Human-in-the-Loop pipeline that lets a non-technical operator generate brand-aligned video and image content from a single Telegram command, with hard cost ceilings, full audit trail, and zero unattended overspend.

Outcome

194

Tests passing across the production pipeline

97.46%

Code coverage, enforced in CI

AI vendors orchestrated under cost ceilings and circuit breakers

Stub mode for local development, zero API spend

The factory is built like a financial system:

Per-run cost ceilings prevent any single command from exceeding budget. Every token, second of video, and image generated is tracked in a local SQLite ledger.
Operational kill-switch (/halt) pauses all paid vendor generation system-wide on a single Telegram command.
Atomic SQLite job queue with SHA-256 idempotency keys prevents double-billing during crashes. Vendor job IDs are persisted before polling begins.
Auto-lifecycle storage on Cloudflare R2 with 7-day rolling retention, ensuring zero storage bloat.
Multi-brand personas via YAML configuration (Doctor, Founder, Neutral) for dynamic identity swapping across runs.
Built-in cron commands auto-prune expired R2 assets and snapshot the database daily.

Tech Stack

Component	Technology	Purpose
Runtime	Node 22.20 LTS, TypeScript 5.5 (ESM)	Modern strict TypeScript, native ESM
Interface	oclif 4 (CLI) + grammY (Telegram long-polling)	Operator interface on phone or shell
Database	better-sqlite3 12	Local ledger, job queue, idempotency keys
Object Storage	Cloudflare R2 (`@aws-sdk/client-s3`)	Generated media with 7-day rolling retention
Image AI	Nano Banana 2 and Flux Kontext Pro (via fal.ai)	High-fidelity image generation
Video AI	Google Veo 3.1 Fast	720p / 1080p cinematic video generation
Audio AI	Uplift Orator (Urdu) + OpenAI `tts-1` (English)	Bilingual hyper-realistic voiceovers
Lipsync	Tavus Hummingbird-0 (via fal.ai)	Avatar lip-sync to generated audio
LLM Reasoning	Anthropic Claude Haiku 4.5	Script refinement, caption generation
Media Stitching	`ffmpeg-static`	Final asset composition

Node.js 22 LTS TypeScript oclif grammY Claude Haiku 4.5 Google Veo 3.1 fal.ai Cloudflare R2 FFmpeg

Key Functionality

Telegram bot interface for full operational control from a phone: /video, /image, /status, /cost, /halt, /resume, /cancel.
6-stage video pipeline: scripting (Claude Haiku 4.5) → captions → bilingual TTS (Uplift Orator / OpenAI) → cinematic video (Veo 3.1 Fast) → lipsync (Tavus Hummingbird) → FFmpeg stitch → Telegram delivery for HITL approval.
Hexagonal architecture (ports and adapters): pure business use cases isolated from external API providers and CLI / Telegram adapters.
Cost ledger and reporting via factory cost-report --run <run-id>: detailed CLI table breakdown of token / second cost per vendor per run.
Stub provider mode for $0 local development. Fully smoke-test the entire pipeline without API spend.
Brand persona system with YAML-seeded identities for dynamic per-run brand swapping.
Health check probe and automated cron pruning + DB backups.

Telegram bot interaction, cost ledger CLI output, hexagonal architecture diagram

Demo Video, Coming Soon

Walkthrough, Telegram command to brand-aligned video in 5 minutes

Estimated length: ~5 minutes

Building something similar?

If a multi-agent pipeline, voice AI deployment, or production automation system is on your roadmap, let's talk through how this applies to your context.

Book a 15-min Call See Other Case Studies