Do not treat video context as “more prompt.” Treat it as pre-production. The more you decide before rendering, the fewer credits you spend fixing avoidable script, voice, product, or pacing issues.
Start with the workflow
Pick the workflow before you attach files. Different video types need different context.| Workflow | Best for | Context that matters most |
|---|---|---|
| Full AI video | Fast exploration, broad hooks, concept testing, visual ideas. | Audience, angle, product facts, visual style, claims, target language, and examples of what good looks like. |
| UGC speaking / talking-head | Direct-response hooks, founder-style messages, testimonials, longer spoken explanations. | Script, creator description, delivery style, language fit, claims, pronunciation, and testimonial boundaries. |
| UGC lifestyle | Product/app story with a person on screen but less dependence on exact lip-sync. | Scene plan, product shots, market setting, voiceover, B-roll, and target-language context. |
| B-roll voiceover | Apps, SaaS, websites, high-trust products, product demos, exact UI, and non-English/localized ads. | Screen recordings, product clips, storyboard, voiceover, captions, overlays, and what each clip proves. |
| Animated brand/product video | Stylized product motion, brand characters, simple explainers, or abstract offers. | Brand style, product shape, colors, logo treatment, motion references, and end-card rules. |
| Studio / hybrid edit | Final assembly, multi-scene videos, reused footage, overlays, captions, music, and export. | Ordered scenes, clip timing, caption style, CTA, aspect ratios, and approval notes. |
What to provide
Script and storyboard
Hook, angle, scene order, voiceover, CTA, target length, and which moments must be shown rather than invented.
Product and UI references
Product shots, packaging, labels, app screenshots, website captures, screen recordings, demos, and any exact visual assets.
B-roll and source footage
Existing clips, customer footage, founder footage, testimonials, app walkthroughs, product handling, lifestyle clips, and old ads to reuse.
Reference videos
Examples of pacing, framing, creator delivery, edit density, caption style, hook structure, or visual mood.
Voice and pronunciation
Language, market, accent, age, gender, pace, tone, custom pronunciation, brand-name spelling, acronyms, and words that are often misread.
Claims and guardrails
Approved claims, forbidden claims, required disclaimers, testimonial rules, legal restrictions, and anything the ad must not imply.
Caption and overlay rules
Caption style, position, emphasis, on-screen text, subtitles, end card, offer text, CTA, and what should be edited after render.
Export requirements
Target platforms, aspect ratios, safe zones, duration, file handoff needs, and whether the output is for testing or final launch.
What belongs where
Some video information is durable. Some only belongs to one video.| Context | Put it here | Example |
|---|---|---|
| Brand-wide video style | Brand context | Caption style, tone, logo placement, visual polish level, “do not use fake testimonials.” |
| Saved video defaults | Video Context tab | Captions on/off/auto, caption preset, caption position, shadow, font, colors, end-card behavior, and pronunciation audio. |
| Product-specific truth | Product context | Exact packaging, app screens, claims, pricing, feature names, demos, and offer restrictions. |
| One video brief | Current task/chat | ”Make a 20-second Father’s Day variant with a calm founder-style voice.” |
| Exact source footage | Asset Drive or attached to the task | Screen recording, product clip, founder video, customer testimonial, or B-roll scene. |
| One-off inspiration | Attached reference | ”Use this competitor ad’s pacing, but not their brand, claims, or visuals.” |
| Reusable learning | Saved context | ”For this market, B-roll voiceover beats synthetic talking-head testimonials.” |
How context is used
Understand what stays in the current chat and what becomes durable context.
Saved video defaults
Use Context & Skills → Video Context for video defaults that should apply across future generations for the brand. These settings are saved at brand level and are passed to the video agent before it generates.| Setting | What it controls | How to use it |
|---|---|---|
| Captions | Whether generated videos should receive post-processing captions. | Use On when captions should be a fixed brand default, Off when videos should stay clean unless requested, and Auto when Superscale should decide per video. |
| End Card | Whether videos should receive a final brand/product card after the generated video. | Use On for a fixed closing-card default, Off to avoid automatic end cards, and Auto when an end card should be used only when it improves the ad. |
| Caption position | Where captions appear in the frame. | Choose top, center, or bottom based on the footage and platform safe zones. |
| Caption shadow | Caption separation from busy footage. | Increase the shadow for noisy or high-motion backgrounds; reduce it for clean, premium, or editorial videos. |
| Baseline text | The default caption text style. | Set font, weight, and color for ordinary caption words. |
| Highlighted text | The style for emphasized caption words. | Use a brand color or heavier weight when specific words should pop. |
| Caption preset | The overall caption look. | Pick the preset that matches the brand: plain/corporate for clean demos, bold creator presets for direct-response UGC, or softer presets for lifestyle. |
| Brand pronunciation | How the brand name should be said in generated videos. | Upload a short .mp3, .wav, or .m4a clip when the brand name, acronym, or product name is easy to mispronounce. |
Auto does not mean “ignore this setting.” It means Superscale makes the choice for the current video, and if it uses captions or an end card, it should use the saved brand style.
Use real footage when exactness matters
AI can generate useful motion and scenes, but it should not be asked to invent details that have to be commercially exact. Use real footage, screenshots, or product references for:- App UI, website flows, dashboards, checkout pages, and onboarding steps.
- Product packaging, labels, bottles, supplements, books, devices, and physical product shape.
- Founder, customer, testimonial, clinical, financial, legal, or regulated proof.
- Precise text, pricing, small UI labels, logos, disclaimers, and compliance language.
- Game footage, software walkthroughs, before/after proof, or any result that must be literal.
- Hook scenes, transitions, explainers, creator intros, visual metaphors, background lifestyle shots, and fast concept variations.
- Shots where mood and message matter more than exact product fidelity.
- Early exploration before you commit to a higher-control hybrid edit.
If a logo, product, UI, or text must not change, attach the exact source asset and say what must stay fixed. Do not rely on the video model to redraw exact commercial assets from memory.
Script and scene plan
Video context should tell Superscale what each scene is supposed to do. A good scene plan is short but explicit.| Scene field | What to define |
|---|---|
| Job | Hook, problem, proof, demo, objection, comparison, social proof, or closing beat. |
| Visual source | AI-generated scene, product clip, screen recording, uploaded B-roll, static image, or previous generation. |
| Spoken line | What the creator or voiceover says in that scene. |
| On-screen text | Headline, callout, price, offer, disclaimer, or CTA that should be added as an overlay or edit layer. |
| Must preserve | Product, logo, UI, face, voice, text, claim, timing, or exact clip. |
| Can vary | Hook, setting, creator, pacing, B-roll, CTA, music, overlay, or caption style. |
Do not ask the video model to draw captions, subtitles, logo holds, or end-card frames inside the generated scene. In Superscale’s current video flow, captions and end cards are post-processing layers. Keep the generated video focused on the scene, then apply captions and the end card through the video settings or the final edit.
Voice and localization
Voice context is about market fit, not just language. Before rendering, specify:- Target market and language variant, not just the language.
- Accent and delivery: energetic, calm, founder-like, creator-style, premium, playful, technical, native, or studio.
- Pronunciation hints for brand names, product names, acronyms, payment terms, and unusual words.
- Whether the voice is native audio, voiceover, or a silent/ambient scene with captions.
- Whether visuals need localization too: currency, screenshots, people, examples, slang, competitors, and claims.
Captions, overlays, and end cards
Captions are part of the creative, not a finishing detail. Decide them before rendering when they affect pacing or safe zones. Use context to define:- Caption preset or style: subtle, bold creator captions, high-contrast, editorial, playful, or plain.
- Position: bottom, top, center, or safe-zone-specific.
- Shadow strength: none, light, medium, or heavy depending on how busy the footage is.
- Baseline and highlighted text: font, weight, color, and which words or phrases need emphasis.
- Overlay copy: headline, proof point, CTA, offer, discount, disclaimer, or app/store badge.
- End card: final screen, logo, product, URL, CTA, promo code, or app-store prompt.
Before spending video credits
Run this check before the first render.Lock the goal
Decide whether the video is a hook test, product demo, testimonial-style ad, localization, retargeting creative, or winner iteration.
Choose the workflow
Pick full AI, UGC speaking, UGC lifestyle, B-roll voiceover, animated, or Studio/hybrid before choosing assets.
Approve the script
Review hook, claims, spoken line, CTA, and target length before rendering. Fix script problems in chat, not after video generation.
Attach exact inputs
Add the product shot, UI screenshot, screen recording, B-roll, voice notes, reference ad, or testimonial that should guide the output.
Mark what cannot change
Say what must stay exact: product, label, logo, UI, price, claim, disclaimer, voice, caption, or source clip.
Check voice and market
Add language, accent, pronunciation, delivery style, and localization notes before a full batch.
Common video context mistakes
Asking the model to invent exact product footage
Asking the model to invent exact product footage
If the product, UI, packaging, or claim must be accurate, attach the real asset. Use AI to create around the asset, not to guess it.
Treating B-roll as decoration
Treating B-roll as decoration
B-roll should prove something: show the app flow, product texture, use case, result, objection, or moment the voiceover is talking about.
Approving the video before approving the script
Approving the video before approving the script
Script, claims, and pronunciation are cheaper to fix before render. If the message is wrong, do not spend credits to discover that in motion.
Forgetting the target market
Forgetting the target market
“Spanish” is not the same as “Mexican Spanish.” Localization includes examples, screenshots, people, currency, and cultural context.
Using synthetic testimonials where trust is fragile
Using synthetic testimonials where trust is fragile
For regulated, premium, legal, health, finance, or high-trust offers, avoid fake first-person proof. Use approved claims, real proof, founder-style explainers, or B-roll voiceover instead.
Retrying the same broken generation
Retrying the same broken generation
If a video gets stuck, renders unusably, or keeps drifting, stop and diagnose the cause. Tighten context, switch workflow, or report the generation instead of burning credits.
Format shortcuts
| If you are making | Start with |
|---|---|
| Mobile app video | App screen recording, app-store screenshots, top review language, target user, and a B-roll voiceover or talking-head-over-screen workflow. |
| SaaS or website video | Screen recording, website screenshots, feature proof, customer problem, voiceover, and overlays that explain the UI. |
| Physical product video | Product shots from multiple angles, packaging rules, claims, lifestyle use cases, and B-roll showing the real product. |
| Localized video | Target market, language variant, voice/accent, localized screenshots, local offer, and pronunciation hints. |
| Regulated or high-trust video | Approved claims, disclaimers, real proof, compliance no-gos, and a hybrid workflow that avoids fake testimonial claims. |
| Winner iteration | The winning ad, what made it work, what variable to change, and what must stay fixed. |
Creating videos
Choose the right video workflow and understand how video generation comes together.
Scripts
Steer the hook, storyboard, spoken line, and scene structure before rendering.
Voiceover & lip-sync
Control voice, localization, accents, and pronunciation.
Getting video right in fewer tries
Avoid wasting credits on avoidable retries.