Zavis Prompt Craft

You are writing prompts that another agent (or this same agent in a later step) will execute. The prompts you write are the difference between a generic AI video and a Zavis video. This skill teaches the exact prompt structures that produce the right output.

IMPORTANT: This skill pairs with the Playbook at packages/templates/youtube-montage/PLAYBOOK.md. Read the Playbook first — it has the first principles. This skill has the prompts themselves.

Rule zero: never prompt for "the topic"

The single most common failure is an agent starting a script generation with "write me a video about X." The output will be generic because the input is generic. Always convert a topic into an ARGUMENT before prompting for script content.

The argument-extraction prompt

When the user gives you a topic, run this internal prompt first:

Given the topic "{topic}" and the video intent "{intent}":

1. What is the one-sentence argument this video makes about the topic?
   The argument should be:
   - A claim, not just a description
   - Contrarian, surprising, or reframing (not obvious)
   - Specific enough that a viewer could agree or disagree
   - Short enough to say in one breath

2. What is the tension curve this argument creates?
   - What's the setup (the world before)?
   - What's the dip (the struggle/failure/winter)?
   - What's the rise (the turning point)?
   - What's the landing (where we are now, and what comes next)?

3. How does Zavis naturally fit into the landing?
   The answer must be EMERGENT — Zavis is one example of "what happens next"
   in the story's own logic. Never "Zavis is a {product category}."

Keep iterating until the argument feels inevitable. If you can't state the argument in one sentence, you don't have a video yet.

Example argument extraction (from the canonical case study)

Topic: Evolution of AI Intent: Make viewers feel AI is still at the beginning, not the peak.

Argument extraction run:

"Seventy years of quiet work, five years of explosion, and we're still at the beginning."
Setup: 1956 Dartmouth, the idea. Dip: the AI winter, sixty years. Rise: 2012-2022 acceleration. Landing: the people building what comes next are already here.
Zavis fits as "one of those people" — the argument sets up a logical slot for "who builds the next phase," Zavis occupies it without being named.

The master prompt: script generation

Once you have the argument, use this prompt to generate the full script. Fill in the placeholders.

You are writing a script for a Zavis YouTube Montage video. The template
is versioned 1.1.0 and the full reference is in
packages/templates/youtube-montage/PLAYBOOK.md and
skills/zavis-template-youtube-montage/SKILL.md.

THE BRIEF
---------
Title: {title}
Topic: {topic}
Intent: {intent}
Duration: {duration}s (soft target, auto-rebudget will extend if narration needs more)
Aspect ratio: {aspectRatio}

THE ARGUMENT (derived)
----------------------
{one-sentence argument}

THE TENSION CURVE (derived)
---------------------------
Setup: {what the world looked like at the start}
Dip: {the struggle, failure, or quiet period — load-bearing for contrast}
Rise: {the turning points and acceleration}
Landing: {where we are now, and what comes next — sets up Zavis integration}

CONSTRAINTS (non-negotiable)
----------------------------
1. Narration is parallel-narrative: nothing the narrator says should also
   appear as on-screen text in the same scene.
2. Narration is full sentences, not period-split fragments. The only
   allowed period-splits are intentional list-rhythm moments
   (e.g., "GPT one. GPT two. GPT three.").
3. No ellipses anywhere. Use commas for micro-pauses and periods for
   sentence-end pauses.
4. Every scene with a youtube-clip declares a tier:
   - Tier 1: query is the entity name + disambiguator (year, event,
     "interview", "launch", "keynote", "demo", "press conference")
   - Tier 2: query is a shot description (subject + action + composition
     + lighting/mood) with ≤3 substantive tokens
5. Zavis enters the narrative ONLY in the closing reflection, the CTA,
   and the end card. Never as a feature list. Never as a generic CTA.
6. The first sentence the narrator speaks must be ≤8 words, contain a
   concrete noun, and open a curiosity loop that the rest of the video
   answers.
7. The end card tagline must NOT be the literal word "Zavis" — the
   EndCard component auto-suppresses it. Use an integrated message that
   extends the video's argument.

OUTPUT FORMAT
-------------
Write a ScriptDocument in TypeScript compatible with the schema at
packages/pipeline/src/script/schema.ts. Include:
- brief (title, topic, intent, duration, aspectRatio, voice="zavis",
  emphasis[], avoid[], customNotes)
- scenes[]: each scene with id, type, startSec, durationSec, narration?,
  visual (kind, query or primary, etc.), textOverlay?
- narrator: "zavis"
- music: { mood, energyArc }
- metadata: { createdAt, iteration, version, notes }

SCENES (required structure)
---------------------------
- 1 cold-open color-fill scene (0.0-1.3s)
- 1 hook title-card scene (1.3-6.5s) — the hook sentence
- 8-12 body scenes alternating Tier 1 and Tier 2 visuals
- 1 closing reflection body scene
- 1 CTA title-card scene (integrated tagline)
- 1 end-card outro scene (tagline + integrated subtagline)

Narration pacing: ~160 words per minute. For a {duration}s video, aim for
~{targetWords} words total narration.

Music energy arc: one value per scene in [0, 1], describing the mood
intensity. Usually starts low, dips at the "winter" beat, builds toward
peak at the climax, settles for closing + CTA, fades on end card.

Before writing, STATE THE ARGUMENT in one sentence and confirm the
tension curve makes sense.

Prompt variations for specific steps

Generating just the beat sheet (pre-narration)

Given the argument "{argument}" and the tension curve below:
- Setup: {setup}
- Dip: {dip}
- Rise: {rise}
- Landing: {landing}

Write 12-18 single-line beats in tension order. For each beat:
- A one-sentence description of what happens
- Whether it's a hook / body / title-card / cta / outro
- Whether its visual is Tier 1 (named entity) or Tier 2 (concept)
- An estimated duration in seconds (±0.5s of actual)

Do NOT write the narration yet. Just the beat structure.

Generating narration for a single beat

Given the beat "{beat description}" and:
- The visual that will be on screen: {visual description}
- The caption that will appear: {caption text}
- The narrator voice: Zavis (cloned, single-pass, tempo 0.85, stability 0.78)

Write the narration for this beat in 1-3 full sentences. Constraints:
- Do NOT say what the caption already says.
- Do NOT say what the visual obviously shows.
- DO add context, consequence, contrast, feeling, or time.
- Total length should fit in approximately {duration}s at 160 wpm
  = {wordBudget} words.
- Write in "talking to a smart friend at a bar" voice, not classroom prose.
- Full sentences only — period-splits allowed ONLY for deliberate list rhythm.

Generating visual queries from a script

Given this scene from the script:
- Beat: {beat}
- Narration: "{narration}"
- Caption: "{caption}"

Decide the query tier:

Tier 1 check: does the narration mention a specific person, product,
event, or company that the viewer will recognize and expect to see?
- If YES → Tier 1. Write a query in the form
  "{entityName} {context} {year}" — e.g., "Sam Altman OpenAI interview 2021".
  Do NOT add "stock" / "cinematic" / "b-roll" to Tier 1 queries — they
  filter out the real footage you want.
- If NO → Tier 2. Write a shot description with ≤3 substantive tokens
  (stop-words don't count). Include a mood/composition word like
  "cinematic" / "slow motion" / "aerial" / "macro".

Then run the candidate through the relevance gate mentally:
- Tokenize the query (stop-word filtered)
- Imagine YouTube's top result
- Does the imagined title share ≥20% of the query's substantive tokens?
- If no, rewrite the query.

Reviewing a generated script

You are reviewing a generated ScriptDocument for a Zavis YouTube Montage.
Check it against this rubric:

ARGUMENT
- [ ] Is there a clear one-sentence argument?
- [ ] Does the tension curve have a dip and a rise?
- [ ] Does the landing set up Zavis naturally (not as an interruption)?

NARRATION
- [ ] Is the first sentence ≤8 words, concrete, curiosity-opening?
- [ ] Are there any ellipses in narration text? (must be zero)
- [ ] Are there any period-split fragments that aren't list rhythm?
- [ ] Is any narration line saying the same thing as a caption or overlay?

VISUALS
- [ ] Is every youtube-clip scene classified as Tier 1 or Tier 2?
- [ ] Do Tier 1 queries contain the entity name + a disambiguator?
- [ ] Do Tier 2 queries have ≤3 substantive tokens?
- [ ] Do any queries contain "stock" / "free" / "no copyright"?

STRUCTURE
- [ ] Is there a cold-open, hook, 8+ body scenes, closing, CTA, end card?
- [ ] Does the total duration fit within 80-110% of target?
- [ ] Does every scene's durationSec accommodate the narration at 160 wpm?

ZAVIS
- [ ] Does Zavis appear only in the closing/CTA/end card (not mid-video)?
- [ ] Is the end card tagline NOT the literal word "Zavis"?

For each failing item, propose a specific fix. Do not rewrite the whole
script — just the parts that failed.

The meta-prompt rule: always cite the Playbook

When you invoke any of the prompts above, always include this line:

Full reference: packages/templates/youtube-montage/PLAYBOOK.md

This tells the downstream agent (or you in a later step) where to find the first principles, the case study, and the failure modes. The prompts above are the MINIMUM — the Playbook has the context that prevents drift over long runs.

Prompt hygiene

DO

✓ State the argument BEFORE writing any script content
✓ Include the constraint list verbatim in every script-generation prompt
✓ Derive the tension curve explicitly, as a separate step from script writing
✓ Write the hook LAST, after the body and landing
✓ Validate against the review rubric after generation
✓ Iterate on individual beats instead of regenerating the whole script
✓ Cite the Playbook

DON'T

✗ Start with "write me a video about X"
✗ Let the model pick the duration — declare it in the brief
✗ Skip the argument-extraction step
✗ Generate narration and queries in the same pass (quality drops)
✗ Let the model decide whether Zavis goes in the middle of the video (it doesn't)
✗ Use ellipses or period-splits in the prompt's example narration (it teaches the model the wrong pattern)
✗ Trust a script that hasn't been reviewed against the rubric

When the prompt isn't working

If you run the master prompt and the output feels flat, diagnose in this order:

Is the argument clear? If you can't restate it, the model can't either. Iterate on the argument first.
Is the tension curve shaped? If everything is rising or everything is flat, there's no contrast. Add a dip or cut some rises.
Are the constraints in the prompt? The master prompt includes 7 non-negotiable constraints. If you trimmed them to "save tokens," the model will drift.
Did you write the hook first? Move the hook generation to AFTER the body + landing.
Is the voice prompt clear? The model needs to know it's writing for the cloned voice, at 160 wpm, in talking-to-a-friend register. If you didn't specify, it'll write classroom prose.

If all of the above are correct and the output is still flat, the ARGUMENT is probably still wrong. Iterate on it.