ComfyUI>Workflows>Self Forcing | Autoregressive Keyframe-to-Video Generation

Self Forcing | Autoregressive Keyframe-to-Video Generation

Workflow Name: RunComfy/Self Forcing

Workflow ID: 0000...1233

Self Forcing trains autoregressive video diffusion models by simulating the inference process during training, performing autoregressive rollout with KV caching. It resolves the train-test distribution mismatch and enables real-time, streaming video generation on a single RTX 4090 while matching the quality of state-of-the-art diffusion models. In simple terms, it allows you to generate smooth videos by providing start and end keyframe reference images along with a guiding text prompt, producing identity-consistent and motion-smooth video synthesis.

Self Forcing: Autoregressive Keyframe-to-Video Generation

Self Forcing is an advanced keyframe-driven video generation model. Self Forcing enables smooth, high-quality video synthesis by generating motion between a start and end keyframe, guided by descriptive text prompts.

Built upon autoregressive video diffusion architectures with KV caching, Self Forcing excels at generating temporally consistent, identity-preserving motion across frames. The Self Forcing joint keyframe-text approach allows for fluid transitions, while maintaining subject structure and style throughout the generated video.

Why Use Self Forcing?

Self Forcing

Self Forcing offers:

Keyframe-Based Generation: Self Forcing uses start and end reference images to control appearance and motion
Prompt + Keyframe Control: Self Forcing blends creative text descriptions with reference structure
Autoregressive Motion: Self Forcing provides smooth, temporally consistent transitions between frames
Identity Preservation: Self Forcing maintains subject fidelity across generated sequences
Ideal for Streamlined Video Creation: Self Forcing is perfect for character-driven storytelling, cinematic animation, and concept video synthesis

Whether you're generating animations, cinematic sequences, or identity-consistent AI videos, Self Forcing gives you full creative control while ensuring smooth and realistic motion with Self Forcing technology.

Input Images

Self Forcing

In this section, you will upload your Start Keyframe and End Keyframe images for Self Forcing. These two images define the beginning and ending appearance of your Self Forcing generated video.

Upload both reference images using the provided Load Image nodes for Self Forcing.
Use optional Resizing and Cropping nodes to adjust your images for optimal Self Forcing alignment and aspect ratio.
Properly aligned and well-cropped keyframes improve Self Forcing motion consistency throughout the generated sequence.

Video Duration

Self Forcing

Set the total number of frames your Self Forcing video will generate.

Longer frame counts allow for more gradual, fluid transitions between keyframes in Self Forcing.
Shorter frame counts result in quicker Self Forcing transitions.
Typical Self Forcing range: 16–48 frames depending on desired length and motion complexity.

Model

Self Forcing

This group loads the Self Forcing autoregressive video diffusion model. The Self Forcing workflow automatically selects the correct model version for you.

Self Forcing is built on autoregressive rollout with KV caching.
Self Forcing ensures stable, temporally coherent motion generation.
Self Forcing allows real-time inference on high-end GPUs like RTX 4090.

Prompts

Self Forcing

In this section, you can enter your Text Prompt to guide the Self Forcing generation.

Combine prompts with your keyframes to influence the Self Forcing style, background, or motion context.
Use descriptive and clear language to maximize Self Forcing creative control.
Negative prompts can also be used to suppress unwanted elements in Self Forcing.

Outputs

Self Forcing

Once Self Forcing generation is complete:

Your Self Forcing video will be saved automatically in the Comfyui > output folder inside your ComfyUI directory.
Self Forcing files are stored as video clips (MP4 or image sequences depending on configuration).

Acknowledgement

This workflow uses the Self Forcing model developed by guandeh.
The Self Forcing workflow integrates Wan Video Wrapper nodes by kijai to enable seamless Self Forcing video generation inside ComfyUI.
Full credit goes to both authors for their original Self Forcing model development and integration work.

GitHub Repository: https://github.com/guandeh17/Self-Forcing

Want More ComfyUI Workflows?

Wan 2.1 | Revolutionary Video Generation

Create incredible videos from text or images with breakthrough AI running on everyday CPUs.

FramePack Wrapper | Efficient long Video Generation

Create stable, 60s+ long videos with minimal cloud resources.

Wan 2.1 Fun | I2V + T2V

Empower your AI videos with Wan 2.1 Fun.

Put It Here Kontext | Object Replacement

Put anything anywhere. Kontext makes it look real. Works perfectly.

Wan 2.1 LoRA

Enhance Wan 2.1 video generation with LoRA models for improved style and customization.

Hunyuan3D-1 | ComfyUI 3D Pack

Create multi-view RGB images first, then transform them into 3D assets.

Hunyuan Video | Text to Video

Generates videos from text prompts.

MV-Adapter | High-Resolution Multi-view Generator

Generate 360-degree views of anything from a single image or description.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.