Skip to main content
Velo’s AI handles three core jobs automatically: generating the script, producing the narration, and recording the final video. Together, these steps take your raw input - a recording, a link, a document, or a guided walkthrough and turn it into a polished, narrated video without requiring you to be present for the final output.

Script Generation

After your recording session or file upload, the AI processes your narration audio, screen interactions, and any uploaded content. It produces a structured, scene-by-scene script that you can review and edit before anything is recorded. The script is the foundation of your Velo — every subsequent step flows from it.

Voice Narration

The AI speaks the script using your VeloTwin voice, a clone of your real voice built from a short training recording. Every word of narration in the final video is spoken by your AI voice, maintaining a consistent, natural-sounding delivery throughout.

Video Recording

A browser agent reads the completed script and records the final video by replaying your screen — navigating, clicking, and interacting exactly as the script instructs. You don’t need to be present during this step.

The Chat Interface

All of this happens inside a chat. The AI keeps you in the loop at every step, showing you the script, asking for confirmation, flagging issues, and responding to your instructions. You can type requests at any point to change a scene, adjust the tone, add a step, or regenerate a section.

Script Flow by Creation Method

The AI and script flow works differently depending on how you started your Velo. Capture Screen and Upload a Recording: The script is generated after your session or upload. You review and edit it in the editor once the initial video is built. Paste a Link and Upload a PDF: The script is built from your URL or document before the video is recorded. You review and edit Scene Cards in the chat first, then the video is generated from the approved script.