NXA Content Factory, Telegram-Driven Generative Content Engine
An AI-native content orchestration engine for clinical practices. Produces high-fidelity short-form videos and images (Urdu and English) from a single Telegram command, guarded by Human-in-the-Loop approvals and strict financial guardrails. 194 tests, 97.46% coverage.
TL;DR
An AI-native content orchestration engine for clinical practices and high-trust service businesses. Send /video "Doctor gives advice on skincare" to a Telegram bot, and the system orchestrates a sophisticated multi-vendor AI pipeline (Claude Haiku 4.5 for scripting, Veo 3.1 Fast for video, fal.ai Hummingbird for lipsync, OpenAI / Uplift Orator for TTS, FFmpeg for stitching) to deliver a brand-aligned video for approval. 194 tests, 97.46% coverage. Built on a pristine hexagonal architecture.
The Problem
Clinical practices, aesthetic clinics, and high-trust service businesses cannot scale content production manually:
- Hiring an in-house team is expensive and slow to ramp.
- Generative AI alone hallucinates compliance-sensitive details (especially in healthcare-adjacent verticals).
- Multi-vendor AI orchestration is fragile without idempotency, cost ledgers, and circuit breakers. One bad night of unattended generation can ring up a four-figure invoice.
What was needed: a Human-in-the-Loop pipeline that lets a non-technical operator generate brand-aligned video and image content from a single Telegram command, with hard cost ceilings, full audit trail, and zero unattended overspend.
Outcome
The factory is built like a financial system:
- Per-run cost ceilings prevent any single command from exceeding budget. Every token, second of video, and image generated is tracked in a local SQLite ledger.
- Operational kill-switch (
/halt) pauses all paid vendor generation system-wide on a single Telegram command. - Atomic SQLite job queue with SHA-256 idempotency keys prevents double-billing during crashes. Vendor job IDs are persisted before polling begins.
- Auto-lifecycle storage on Cloudflare R2 with 7-day rolling retention, ensuring zero storage bloat.
- Multi-brand personas via YAML configuration (Doctor, Founder, Neutral) for dynamic identity swapping across runs.
- Built-in cron commands auto-prune expired R2 assets and snapshot the database daily.
Tech Stack
| Component | Technology | Purpose |
|---|---|---|
| Runtime | Node 22.20 LTS, TypeScript 5.5 (ESM) | Modern strict TypeScript, native ESM |
| Interface | oclif 4 (CLI) + grammY (Telegram long-polling) | Operator interface on phone or shell |
| Database | better-sqlite3 12 | Local ledger, job queue, idempotency keys |
| Object Storage | Cloudflare R2 (@aws-sdk/client-s3) | Generated media with 7-day rolling retention |
| Image AI | Nano Banana 2 and Flux Kontext Pro (via fal.ai) | High-fidelity image generation |
| Video AI | Google Veo 3.1 Fast | 720p / 1080p cinematic video generation |
| Audio AI | Uplift Orator (Urdu) + OpenAI tts-1 (English) | Bilingual hyper-realistic voiceovers |
| Lipsync | Tavus Hummingbird-0 (via fal.ai) | Avatar lip-sync to generated audio |
| LLM Reasoning | Anthropic Claude Haiku 4.5 | Script refinement, caption generation |
| Media Stitching | ffmpeg-static | Final asset composition |
Key Functionality
- Telegram bot interface for full operational control from a phone:
/video,/image,/status,/cost,/halt,/resume,/cancel. - 6-stage video pipeline: scripting (Claude Haiku 4.5) → captions → bilingual TTS (Uplift Orator / OpenAI) → cinematic video (Veo 3.1 Fast) → lipsync (Tavus Hummingbird) → FFmpeg stitch → Telegram delivery for HITL approval.
- Hexagonal architecture (ports and adapters): pure business use cases isolated from external API providers and CLI / Telegram adapters.
- Cost ledger and reporting via
factory cost-report --run <run-id>: detailed CLI table breakdown of token / second cost per vendor per run. - Stub provider mode for $0 local development. Fully smoke-test the entire pipeline without API spend.
- Brand persona system with YAML-seeded identities for dynamic per-run brand swapping.
- Health check probe and automated cron pruning + DB backups.
Telegram bot interaction, cost ledger CLI output, hexagonal architecture diagram
Walkthrough, Telegram command to brand-aligned video in 5 minutes
Building something similar?
If a multi-agent pipeline, voice AI deployment, or production automation system is on your roadmap, let's talk through how this applies to your context.