Wan2.2 Fun Inp: First-to-Last Frame Video Generation in ComfyUI
Wan2.2 Fun Inp turns two still images into a coherent video by guiding the model from a first frame to a last frame with natural interpolation in between. It is designed for artists, animators, and filmmakers who want cinematic consistency while retaining prompt control. The workflow ships with two parallel presets so you can prioritize either ultra-fast 4-step synthesis or more general fp8-scaled generation, both powered by Wan 2.2 Fun Inpaint.
Wan2.2 Fun Inp: First-to-Last Frame Video Generation in ComfyUI
Wan2.2 Fun Inp turns two still images into a coherent video by guiding the model from a first frame to a last frame with natural interpolation in between. It is designed for artists, animators, and filmmakers who want cinematic consistency while retaining prompt control. The workflow ships with two parallel presets so you can prioritize either ultra-fast 4-step synthesis or more general fp8-scaled generation, both powered by Wan 2.2 Fun Inpaint.
Key models in Comfyui Wan2.2 Fun Inp workflow
- Wan 2.2 Fun Inpaint 14B (fp8 scaled) The main diffusion backbone specialized for “Fun Inpaint” video generation. Two variants are included: high noise for larger motion and creative transitions, and low noise when you need tighter fidelity to your start/end frames. • High noise: wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors • Low noise: wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors
- Lightning 4-Step LoRA for I2V An optional LoRA that compresses the sampling schedule to just four steps for rapid iteration, ideal for previews and quick drafts. • Low noise LoRA: wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors • High noise LoRA: wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors
- Wan VAE The VAE handles latent–pixel conversions used by Wan models; it preserves detail and tone during decode/encode. See the Wan 2.2 package on Hugging Face.
- CLIP text encoder Encodes your positive and negative prompts into conditioning vectors that steer the visual narrative. Reference implementation: openai/CLIP.
- ComfyUI Video Helper Suite (export) Combines generated frames into an MP4 at your chosen frame rate. Repo: ComfyUI-VideoHelperSuite.
How to use Comfyui Wan2.2 Fun Inp workflow
The graph contains two parallel groups you can toggle depending on speed vs generality. Enable only one at a time for clean runs.
Group: Wan2.2_fun_Inp fp8_scaled + 4 steps LoRA
Use this for very fast previews. The group loads the Wan 2.2 backbone plus a Lightning 4-Step LoRA and routes your prompts through the short sampler path. Provide your start and end images, then adjust the high-level parameters as needed. Internally, WanFunInpaintToVideo (#111) seeds the trajectory from first to last frame, while a short sampler refines motion and structure in a handful of steps.
Group: Wan2.2_fun_Inp fp8_scaled
Choose this when you want a broader operating range without the 4-step constraint. This path uses the fp8-scaled Wan 2.2 model directly, maintaining the same first-to-last frame guidance but with a standard sampler budget for more nuanced detail recovery and motion shaping. The node WanFunInpaintToVideo (#148) anchors the trajectory and hands off to the downstream sampler for refinement.
Step 2 — Upload start and end images
Both groups include an Upload start and end images section. Plug a start image that sets the opening composition and an end image that defines the final pose or scene. The workflow will interpolate the motion and appearance between them, respecting your text prompts. For best results, keep aspect ratio consistent across both images.
Step 3 — Prompt
Write what you want to see in the Positive Prompt and what to avoid in the Negative Prompt. The nodes CLIP Text Encode (Positive Prompt) and CLIP Text Encode (Negative Prompt) transform your text into conditioning that steers content, style, and dynamics. Use concise, scene-oriented phrases (actions, camera cues, materials, mood) rather than long lists.
Step 4 — Video size & length
Set width, height, and length in the WanFunInpaintToVideo node to define spatial resolution and frame count. Defaults are tuned for a tall 576×1024 video with about 3–4 seconds of motion at 24 fps. Longer sequences generally benefit from the fp8-scaled path; short previews are great with the 4-step LoRA group.
Export to MP4
VHS_VideoCombine assembles frames into an MP4 with a default 24 fps and a quality-friendly CRF. The file names are prefixed for each branch (for example, Fun_Inp and Fun_Inp_4_Step) so you can compare outputs easily. Adjust the frame rate if you need slower or faster playback.
Running only one branch
Box-select a group and use Ctrl+B to enable or disable it. If you enable the fp8_scaled group, disable the fp8_scaled + 4 steps LoRA group, and vice versa. You can also use ComfyUI’s partial execution features to run just the sections you are tweaking.
Key nodes in Comfyui Wan2.2 Fun Inp workflow
WanFunInpaintToVideo (#111 and #148)
The core engine that blends your start_image and end_image into a continuous latent trajectory. It accepts width, height, and length to set video size and duration, then emits a latent sequence plus updated positive/negative conditioning. Start here when tuning continuity, pacing, or composition across the shot.
UNETLoader (#101, #102)
Chooses the Wan 2.2 Fun Inpaint model variant. Use high noise for bolder motion and more transformative interpolations. Use low noise when preserving the start and end frame identity and texture is the priority. Pair either with or without the 4-step LoRA depending on speed needs.
ModelSamplingSD3 (#93)
Configures the sampler schedule used downstream. Keep it aligned with the chosen LoRA or fp8 path. If you see temporal flicker, modest adjustments to the sampler mode or steps can smooth transitions without over-sharpening details.
KSamplerAdvanced (#150)
Applies a refinement pass to the latent sequence. Increase steps slightly if you need crisper micro-detail on faces, hands, or thin structures; reduce steps for softer, dreamier motion. Avoid extreme CFG or step counts that can destabilize temporal consistency.
VHS_VideoCombine (#159)
Merges rendered frames to MP4. Adjust frame_rate for motion feel and playback speed, and keep the default pix_fmt for broad player compatibility. Lower CRF yields larger files with finer gradients; higher CRF compresses more aggressively.
Optional extras
- Match the aspect ratio of your start and end images to the selected
width×heightto reduce unwanted cropping or warping. - For character shots, keep clothing, lighting, and camera angle broadly consistent between the first and last frames to encourage stable identity.
- Start with a short Wan2.2 Fun Inp preview using the 4-step LoRA group, then switch to the fp8-scaled group for your final.
- If the middle of the clip feels too static, try the high noise model; if transitions look chaotic, try low noise and simplify the prompt.
- Keep prompts focused on scene intent (action, atmosphere, camera moves) rather than long adjective chains; Wan2.2 Fun Inp responds best to clear direction.
Acknowledgements
The Wan 2.2 Inp Fun workflow expands the creative possibilities of AI video generation by bridging start-to-end frame control with natural interpolation. It’s a versatile tool for artists, animators, and filmmakers who want cinematic consistency in their AI-driven projects.
Special thanks to the ComfyUI and Wan teams for enabling seamless Inp Fun workflow integration into next-gen creative pipelines.
