LTX 2 Fast: Text-to-Video AI with 4K Motion & Sound Sync

ltx/ltx-2/fast/text-to-video

Generate instant video previews with LTX 2 Fast—optimized for speed, low latency, and fast text-to-video generation.

Prompt *

A photorealistic portrait of a stylish young woman sitting at an outdoor Parisian café with red woven bistro chairs and circular white tables. She has long, smooth, straight, honey-blonde hair, parted neatly in the middle. She is wearing a classic black beret, a crisp white button-up shirt, and a dark blue tie with diagonal gold stripes. Over her shirt, she wears a slightly oversized black leather jacket. She accessorizes with pearl and gold hoop earrings on one ear. Her makeup features sharp, winged eyeliner, defined eyebrows, and a natural nude lip color, giving her a chic, confident look. The scene captures a soft afternoon ambient light, with clear reflections on the glass window behind her. Through the window, vintage-style golden lettering that reads "ROYH..." is visible, partially cut off due to her posture. In the background, a Parisian street is subtly mirrored in the glass. The camera angle is eye-level, front-facing and slightly to the right, creating a natural, elegant composition. The color palette is warm and clean with a touch of Parisian café ambience. Extremely detailed, 8k resolution, sharp focus, shallow depth of field, photorealistic style, natural lighting, cinematic atmosphere, high fashion editorial aesthetic.

The textual description used to generate the video.

Duration

Resolution

The resolution of the generated video.

Aspect Ratio (W:H)

The aspect ratio of the generated video.

Frames Per Second

Frames per second of the generated video.

Generate Audio

Whether to generate audio for the generated video.

Idle

The rate is $0.04 per second for 1080p, $0.08 per second for 1440p, and $0.16 per second for 2160p.

Introduction to LTX 2 Fast Text-to-Video

Start from the main model page: [LTX 2 Pro Text-to-Video](https://www.runcomfy.com/models/ltx/ltx-2/pro/text-to-video). LTX 2 Fast Text-to-Video delivers fast text-to-video generation from text prompts. It is optimized for speed, making it the ideal choice for real-time previewing, quick ideation, and fast text-to-video generation workflows where immediate feedback is key.

What this mode does#

LTX 2 Fast Text-to-Video prioritizes fast text-to-video generation. It generates video clips from text prompts significantly faster than the Pro model, allowing for rapid iteration and real-time experimentation. Use this mode to explore concepts, test prompts, and create drafts with our fast text-to-video generation engine before finalizing with high-fidelity generation.

Speed profile (Fast)#

High Velocity: Optimized for minimal latency and quick turnaround.
Preview-Ready: Ideal for checking composition, motion, and timing in seconds via fast text-to-video generation.
Resolution: Defaults to 1080p for maximum generation speed.
Trade-off: Fast text-to-video generation sacrifices some fine texture detail and extreme resolution (4K) in exchange for near-interactive performance.

How Fast Text-to-Video works#

Streamlined Inference: Uses a lighter-weight inference process to reduce computation time per frame.
Prompt-Driven: Turns your text description into motion immediately, focusing on broad strokes and narrative flow.
Efficient DiT: Leverages the efficiency of the Diffusion Transformer architecture to maintain structural coherence even during fast text-to-video generation.

Retake workflow#

Fast mode is the perfect starting point for the Retake workflow:

1) Rapidly generate multiple variations with fast text-to-video generation to find the best motion or composition.

2) Select your favorite clip.

3) Send it to Retake to refine, upscale, or adjust specific elements: LTX 2 Retake Video

Inputs#

prompt (required): Describe the scene, action, and style.
duration: Supports a wide range (6s to 20s), with longer durations (12s+) requiring 1080p/25fps.
resolution: 1080p (fastest), 1440p, 2160p.
aspect_ratio: 16:9.
fps: 25 or 50.
generate_audio: true/false (default true).

Recommended settings#

Rapid Prototyping: 1080p, 25 FPS, 6s duration. This gives the fastest feedback loop.
Smooth Motion Preview: 1080p, 50 FPS, 6s duration.
Concept Exploration: Use diverse prompts with generate_audio enabled to test both visual and sonic atmosphere quickly.

Related Models

pixverse/v5.5/effects

Transform stills into narrative clips with synced audio and fluid camera motion.

pika-2-2/text-to-video

Create high quality videos from text prompts using Pika 2.2.

kling-2-6/motion-control-pro

Cinematic motion model for fluid scene creation and adaptive visual editing.

kling-2-6/motion-control-standard

AI-driven motion conversion tool enabling precise, stable animation creation

wan-2-2/lora/image-to-video

Transform stills into cinematic motion with open-source precision tools.

wan-2-2/text-to-video

Generate high quality videos from text prompts with Wan 2.2 Plus.

Frequently Asked Questions

What makes LTX 2 Fast Text-to-Video different?

It is engineered for speed. While Pro focuses on maximum fidelity, Fast Text-to-Video minimizes latency, allowing you to generate and view video concepts in seconds. It's the best choice for fast text-to-video generation when time is the priority.

Does Fast mode support 4K and audio?

Yes. You can still select up to 2160p (4K) resolution and enable audio generation. However, increasing resolution to 4K will naturally increase generation time compared to the lightning-fast text-to-video generation 1080p baseline.

What are the ideal use cases for this mode?

Use it for rapid brainstorming, prompt testing, storyboarding, and creating quick social media drafts. It allows you to iterate on ideas quickly with fast text-to-video generation before committing to a final high-fidelity render.

Can I upgrade a Fast video to Pro quality?

Indirectly, yes. Use fast text-to-video generation to lock in your prompt and composition. Then, use the same prompt and seed in the Pro Text-to-Video workflow, or take your Fast result into the Retake Video tool for refinement.

What duration limits apply to Fast mode?

Fast mode supports standard durations (6s, 8s, 10s) and extended durations up to 20s. Note that durations above 10 seconds are currently optimized for 1080p resolution and 25 FPS to ensure successful fast text-to-video generation.

Is the audio quality the same as Pro?

Yes, the audio generation capability is identical. The "Fast" designation applies to the video generation steps and optimization, not the audio synthesis.

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

What this mode does#

Speed profile (Fast)#

High Velocity: Optimized for minimal latency and quick turnaround.

Preview-Ready: Ideal for checking composition, motion, and timing in seconds via fast text-to-video generation.

Resolution: Defaults to 1080p for maximum generation speed.

Trade-off: Fast text-to-video generation sacrifices some fine texture detail and extreme resolution (4K) in exchange for near-interactive performance.

How Fast Text-to-Video works#

Streamlined Inference: Uses a lighter-weight inference process to reduce computation time per frame.

Prompt-Driven: Turns your text description into motion immediately, focusing on broad strokes and narrative flow.

Efficient DiT: Leverages the efficiency of the Diffusion Transformer architecture to maintain structural coherence even during fast text-to-video generation.

Frequently Asked Questions