Skip to content
Back to all work
97% test coverage, 6 AI vendors orchestrated under hard cost ceilings Built for clinical practices and high-trust service businesses

NXA Content Factory, Telegram-Driven Generative Content Engine

An AI-native content orchestration engine for clinical practices. Produces high-fidelity short-form videos and images (Urdu and English) from a single Telegram command, guarded by Human-in-the-Loop approvals and strict financial guardrails. 194 tests, 97.46% coverage.

Node.js 22 LTS + TypeScript 5.5 (ESM) oclif (CLI) + grammY (Telegram) better-sqlite3 (Local Ledger) Cloudflare R2 (Object Storage) Google Veo 3.1 Fast (Video AI) Claude Haiku 4.5 (Reasoning) fal.ai + OpenAI TTS + Uplift Orator FFmpeg (Media Stitching)
Published May 2026 Private repository, hexagonal architecture available on request

TL;DR

An AI-native content orchestration engine for clinical practices and high-trust service businesses. Send /video "Doctor gives advice on skincare" to a Telegram bot, and the system orchestrates a sophisticated multi-vendor AI pipeline (Claude Haiku 4.5 for scripting, Veo 3.1 Fast for video, fal.ai Hummingbird for lipsync, OpenAI / Uplift Orator for TTS, FFmpeg for stitching) to deliver a brand-aligned video for approval. 194 tests, 97.46% coverage. Built on a pristine hexagonal architecture.

The Problem

Clinical practices, aesthetic clinics, and high-trust service businesses cannot scale content production manually:

  1. Hiring an in-house team is expensive and slow to ramp.
  2. Generative AI alone hallucinates compliance-sensitive details (especially in healthcare-adjacent verticals).
  3. Multi-vendor AI orchestration is fragile without idempotency, cost ledgers, and circuit breakers. One bad night of unattended generation can ring up a four-figure invoice.

What was needed: a Human-in-the-Loop pipeline that lets a non-technical operator generate brand-aligned video and image content from a single Telegram command, with hard cost ceilings, full audit trail, and zero unattended overspend.

Outcome

194
Tests passing across the production pipeline
97.46%
Code coverage, enforced in CI
6
AI vendors orchestrated under cost ceilings and circuit breakers
$0
Stub mode for local development, zero API spend

The factory is built like a financial system:

  • Per-run cost ceilings prevent any single command from exceeding budget. Every token, second of video, and image generated is tracked in a local SQLite ledger.
  • Operational kill-switch (/halt) pauses all paid vendor generation system-wide on a single Telegram command.
  • Atomic SQLite job queue with SHA-256 idempotency keys prevents double-billing during crashes. Vendor job IDs are persisted before polling begins.
  • Auto-lifecycle storage on Cloudflare R2 with 7-day rolling retention, ensuring zero storage bloat.
  • Multi-brand personas via YAML configuration (Doctor, Founder, Neutral) for dynamic identity swapping across runs.
  • Built-in cron commands auto-prune expired R2 assets and snapshot the database daily.

Tech Stack

ComponentTechnologyPurpose
RuntimeNode 22.20 LTS, TypeScript 5.5 (ESM)Modern strict TypeScript, native ESM
Interfaceoclif 4 (CLI) + grammY (Telegram long-polling)Operator interface on phone or shell
Databasebetter-sqlite3 12Local ledger, job queue, idempotency keys
Object StorageCloudflare R2 (@aws-sdk/client-s3)Generated media with 7-day rolling retention
Image AINano Banana 2 and Flux Kontext Pro (via fal.ai)High-fidelity image generation
Video AIGoogle Veo 3.1 Fast720p / 1080p cinematic video generation
Audio AIUplift Orator (Urdu) + OpenAI tts-1 (English)Bilingual hyper-realistic voiceovers
LipsyncTavus Hummingbird-0 (via fal.ai)Avatar lip-sync to generated audio
LLM ReasoningAnthropic Claude Haiku 4.5Script refinement, caption generation
Media Stitchingffmpeg-staticFinal asset composition

Key Functionality

  • Telegram bot interface for full operational control from a phone: /video, /image, /status, /cost, /halt, /resume, /cancel.
  • 6-stage video pipeline: scripting (Claude Haiku 4.5) → captions → bilingual TTS (Uplift Orator / OpenAI) → cinematic video (Veo 3.1 Fast) → lipsync (Tavus Hummingbird) → FFmpeg stitch → Telegram delivery for HITL approval.
  • Hexagonal architecture (ports and adapters): pure business use cases isolated from external API providers and CLI / Telegram adapters.
  • Cost ledger and reporting via factory cost-report --run <run-id>: detailed CLI table breakdown of token / second cost per vendor per run.
  • Stub provider mode for $0 local development. Fully smoke-test the entire pipeline without API spend.
  • Brand persona system with YAML-seeded identities for dynamic per-run brand swapping.
  • Health check probe and automated cron pruning + DB backups.

Telegram bot interaction, cost ledger CLI output, hexagonal architecture diagram

Demo Video, Coming Soon

Walkthrough, Telegram command to brand-aligned video in 5 minutes

Estimated length: ~5 minutes

Building something similar?

If a multi-agent pipeline, voice AI deployment, or production automation system is on your roadmap, let's talk through how this applies to your context.