YouTube Montage
v1.1.0Real YouTube footage, single-pass cloned narration, frame-synced captions.
A cinematic storytelling video built from sourced YouTube clips, intercut with title cards and narration. Best for evolution stories, retrospectives, 'state of X' explainers, founder stories, and any editorial piece where the visual layer comes from public sources.
Good for
- Evolution / history of an industry or technology
- State-of-the-field recap videos
- Retrospective montages tied to a thesis
- Founder / company origin stories built from archival footage
- News-moment explainers with a clear editorial angle
Not this template
- Product demos (use product-spotlight)
- Original-shoot brand films
- Talking-head research reports (use research-talkinghead)
- Under-30-second social teasers
Inputs (what the brief needs)
| Field | Type | Required | Description |
|---|---|---|---|
| title | string | required | Public-facing title of the video (used in manifest + cards). |
| topic | string | required | 1-2 sentences: what the video is about. This is what the story-writer starts from. |
| intent | string | required | Why the video exists: what feeling or argument it's supposed to land in the viewer's head. |
| duration | number | optional | Target duration in seconds (30-180). Will be auto-rebudgeted to fit narration.default: 95 |
| aspectRatio | enum | optional | Which aspect ratio to render in.default: 16:9 |
| emphasis | string[] | optional | 3-5 key beats or talking points to make sure the script hits. |
| avoid | string[] | optional | Framings, tones, or phrasings to stay away from (e.g. 'doomer framing', 'buzzwords'). |
| references | string[] | optional | Optional reference videos or articles the story-writer can pull from. |
Pipeline (what happens when you run it)
- 1Source YouTube clips~3m 0s
For each body scene: yt-dlp search 12 candidates → source-quality filter (blacklist + relevance ≥20%) → rank by quality+relevance+trust → try top results in order → download → watermark check → use clip or fall through.
yt-dlpsource-quality-filterwatermark-ocr - 2Generate single-pass narration~25s
Concatenate all scene narration (plain period+space separators, no ellipses) → ONE ElevenLabs /with-timestamps call → decode base64 audio → ffmpeg atempo=0.85 → cut into per-scene MP3s using character alignment → persist alignment-slices.json for caption sync.
elevenlabs-with-timestampsffmpeg-atempoffmpeg-cut - 3Auto-rebudget scenes to actual narration~1s
Measure each voiceover file's actual duration, extend any scene whose narration overflows its pre-budgeted slot (+0.5s breathing room), recompute contiguous startSec cursor. Total video duration grows from the target (this is intentional).
- 4Preflight audio review~1s
Build the audio timeline from script + VO files (no rendering). Refuse to proceed if: two narrations overlap, a narration exceeds its scene, source-audio clips overlap. NON-BYPASSABLE.
preflight-review - 5Render with Remotion~4m 40s
Invoke `npx remotion render` with inputProps={script, clipPaths, voiceoverPaths, alignmentSlices}. The composition dispatches per-scene renderers, wraps every Audio in a duration-bounded Sequence, drives captions from alignment-slices.
remotion-render - 6Deterministic post-render review~8s
Extract sample frames, compute audio metrics, run all deterministic rubric checks (black frames, audio clipping, fps, resolution, duration). Flag critical issues.
deterministic-review
Voice profile (locked across this template)
Cloned voice from the Zavis reference reel. Generated in a single ElevenLabs /with-timestamps call for the entire script, then ffmpeg atempo=0.85 post-processed for cinematic pacing. All Zavis YouTube Montage videos use this voice across the board.
Voice settings
Endpoint
How this voice was cloned
Method: ElevenLabs Instant Voice Cloning (IVC) — POST /v1/voices/add
Reference: Instagram Reel
- Downloaded the Reel video via yt-dlp
- Extracted the audio track with ffmpeg at 44.1kHz mono
- Uploaded the audio as a sample to ElevenLabs voice cloning
- Received voice_id Eju2qVkYu4KE2cJnwGzA
- The raw clone speaks ~15% too fast — we post-process every generation through ffmpeg atempo=0.85 (pitch-preserving) to land it at cinematic pacing.
- Voice settings were tuned over v3 → v5: stability 0.55 → 0.78 (v4's conversational drift was causing filler pauses), style 0.40 → 0.15 (lower = fewer breath/um artifacts), similarity_boost 0.85 → 0.90 (stronger identity lock).
- Do NOT lower stability or raise style without reading the v4 failure notes in the Playbook.
- The combined narration text is sent with plain period+space scene separators — NEVER ellipses, which caused the v4 narration stutter bug.
Voice samples
Skills loaded (in order)
Click any skill to read the full SKILL.md source.
Zavis brand context, voice, audience, non-negotiables.
System architecture — what lives where, what loads when.
Visual discipline: X& reference, color palette, typography scale, motion language.
This template's authoritative reference. Load FIRST after video-os when making a montage.
Beat-sheet structure: three-act arc, tension curve, landing.
First 3 seconds: promise, question, or image that earns the next 92.
Retention budget: when to escalate, when to rest, how the CTA integrates.
Single-pass narration discipline: full sentences, no ellipses, voice settings, parallel narrative rule.
Two-tier query strategy: named entities vs concept/atmosphere. THIS IS THE #1 QUALITY LEVER.
Post-render review loop: deterministic checks + vision checks + decision tree.
Tools it uses
Review rubric (template-specific)
- Every source-audio clip's keyword is actually heard at the trim point
- Every muted b-roll clip's visual is on-topic for that beat (Tier 1 entities are recognizable, Tier 2 concepts are atmospherically aligned)
- No clip is longer than the scene it's placed in
- Title cards are not held longer than 4 seconds
- The narrator never starts speaking during a source-audio clip
- Captions land within 100ms of the spoken word (alignment-driven)
- End card shows exactly one Zavis wordmark
- No visible watermarks, channel bugs, or 'click to download' banners
Past generations
| Run | Title | Date | Duration | Scenes | Status |
|---|---|---|---|---|---|
| 20260411-080631 | The Evolution of AI | 4/11/2026 | 108.7s | 16 | approved |