Interpolates start-end frames with refined motion control presets
Happy Horse 1.1 is Alibaba's natively multimodal video model. The full model family covers four capabilities — text-to-video, image-to-video, reference-to-video, and video editing — all built on one architecture that generates picture and sound together in a single pass, with synchronized dialogue, ambient noise, and Foley locked to the action instead of added afterward.
The model is tuned for film-grade results: believable, physically grounded motion, rich light and shadow, and cinematic camera work such as push-ins, pull-outs, and rack-focus shifts. It holds character identity and scene continuity across multi-shot sequences and renders a wide range of looks, from ink-wash painting to paper-craft and clay stop-motion.
This release refines the earlier 1.0 generation. Action that used to feel sluggish now carries more pace and weight, and the model handles a wider range of subjects — including Asian faces — with steadier likeness.
On RunComfy, this Happy Horse 1.1 page currently runs the text-to-video mode: you write a prompt and get a short clip with built-in sound. The other modes (image-to-video, reference-to-video, and editing) are part of the same model family.
Output format: Resolution: 720P or 1080P / fps: 24 / duration: 3-15s / aspect ratio: 16:9, 9:16, 1:1 / audio: included
| Parameter | Required | Type | Default | Range / Options | Description |
|---|---|---|---|---|---|
| prompt* | Yes (*) | string | — | — | Text description of the scene, action, and camera movement. |
| resolution | No | string | 1080P | 720P, 1080P | Output resolution tier. |
| ratio | No | string | 16:9 | 16:9, 9:16, 1:1 | Aspect ratio of the video. |
| duration | No | integer | 5 | 3-15 | Clip length in seconds. |
Pricing is time-based and depends on resolution:
| Resolution | Rate |
|---|---|
| 720P | $0.13 per second |
| 1080P | $0.16 per second |
Estimated cost examples
| Duration | 720P | 1080P |
|---|---|---|
| 5 s (default) | ~$0.65 | ~$0.80 |
| 10 s | ~$1.30 | ~$1.60 |
| 15 s | ~$1.95 | ~$2.40 |
Interpolates start-end frames with refined motion control presets
Generate cinematic 3-15s videos from text with optional sound.
Add a person or object into an existing video with smart compositing.
Create rapid high-quality video drafts with precise style and speed
Prompt-based animating with subject fidelity and smooth motion.
AI model for dynamic dubbing and expressive video creation from voice or footage.
Happy Horse 1.1 is a text-to-video model that turns a written prompt into a short clip with natural, physically grounded motion and built-in audio. It suits short social spots, scene prototyping, and product or ad motion where you want sound and movement together from a single description.
Happy Horse 1.1 refines known pain points from the 1.0 generation, with livelier motion pacing instead of sluggish action and steadier handling of diverse subjects, including improved Asian-face fidelity. Based on publicly available information, these changes make Happy Horse 1.1 a more dependable choice for character-driven clips.
Yes. Happy Horse 1.1 produces synchronized audio alongside the video, so each clip arrives with sound rather than as a silent draft. You can describe sound cues in your prompt to guide the ambience and effects.
Happy Horse 1.1 supports 720P and 1080P output at 24 fps, with clip lengths from 3 to 15 seconds. You can also set the aspect ratio to 16:9, 9:16, or 1:1 to match landscape, vertical, or square placements.
Lead with the main subject and one clear action, then add setting, lighting, and any camera movement. Because Happy Horse 1.1 renders motion and sound together, naming the movement and the audio you want gives more predictable results.
The model takes a text prompt plus resolution, aspect ratio, and duration controls, with durations capped between 3 and 15 seconds. Check the current RunComfy parameter panel for the exact limits, since some options may vary by provider settings.
Yes. You can prototype Happy Horse 1.1 in the RunComfy model UI, then call the same model via the RunComfy API with identical parameters for automation. You don't need to host or scale the model yourself.
Generations with Happy Horse 1.1 are billed per second of video and consume usd or credits: $0.13 per second at 720P and $0.16 per second at 1080P. For example, a 5-second 1080P clip costs about $0.80; see the Generation section on the page for current details.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.





