I'm excited to share the MVP of my latest personal project: a fully end-to-end automated YouTube video factory. This system is designed to transform a simple text idea into a complete, published video, supporting both short-form (Shorts) and long-form content.

It orchestrates a suite of powerful APIs and tools to automate the entire workflow:
- Scripting: OpenAI API (GPT)
- Voiceover: ElevenLabs API for high-quality TTS
- Visuals: Flexible image generation via Replicate (SDXL) or a cost-effective alternative using the Pixabay API
- Subtitles: Precise, locally-run transcription with OpenAI Whisper
- Music: Royalty-free background music sourced from the Jamendo API
- Assembly: A robust, two-step FFmpeg process that ensures smooth, "judder-free" animations
- Publishing: Direct upload to YouTube via the YouTube Data API v3
Beyond just connecting APIs, the focus was on building a resilient and developer-friendly system. This includes:
- A comprehensive
health_check.py script to validate the environment.
- Automated
pytest tests for key integrations like Pixabay and Jamendo to handle API errors gracefully.
- A full scheduling system for macOS using
launchd to automate future uploads.
This project was a deep dive into system architecture, complex automation pipelines, and solving low-level integration challenges.
You can see the system's architecture in these diagrams:
I'm always open to feedback and connecting with fellow developers!