HappyHorse 1.0 Reference to Video: Multi-Image Subject-to-Video AI Model

happyhorse/happyhorse-1-0/reference-to-video

HappyHorse 1.0 Reference to Video fuses up to 9 reference images with a text prompt to generate 3–15s 720P/1080P clips on RunComfy — lock characters, outfits, and props with character1/character2 tags.

Reference Image 1 (character1) *

First reference image. Tag this subject in the prompt as character1. Format: JPEG, JPG, PNG, or WEBP. Short side ≥ 400px, recommended 720P or higher, max 10MB.

Reference Image 2 (character2)

Optional second reference image. Tag this subject in the prompt as character2. Format: JPEG, JPG, PNG, or WEBP. Leave blank to skip.

Reference Image 3 (character3)

Optional third reference image. Tag this subject in the prompt as character3. Format: JPEG, JPG, PNG, or WEBP. Leave blank to skip.

Reference Image 4 (character4)

Optional fourth reference image. Tag this subject in the prompt as character4. Leave blank to skip.

Reference Image 5 (character5)

Optional fifth reference image. Tag this subject in the prompt as character5. Leave blank to skip.

Reference Image 6 (character6)

Optional sixth reference image. Tag this subject in the prompt as character6. Leave blank to skip.

Reference Image 7 (character7)

Optional seventh reference image. Tag this subject in the prompt as character7. Leave blank to skip.

Reference Image 8 (character8)

Optional eighth reference image. Tag this subject in the prompt as character8. Leave blank to skip.

Reference Image 9 (character9)

Optional ninth reference image. Tag this subject in the prompt as character9. Leave blank to skip.

Prompt *

Describe the scene, motion, camera, and lighting. Refer to each reference image with character1, character2, character3 … in the order they appear above. Max 2500 Chinese / 5000 non-Chinese characters.

Aspect Ratio (W:H)

Output video aspect ratio.

Resolution

Output video resolution. The model supports 720P or 1080P.

Duration

Output video duration in seconds. Allowed values: 3–15.

Seed

Optional seed for reproducible generations. Use 0 to let the provider randomize.

Idle

$0.15 per second for 720P and $0.28 per second for 1080P.

Introduction To HappyHorse 1.0 Reference to Video

HappyHorse 1.0 Reference to Video is now available on RunComfy through Alibaba. Upload 1–9 reference images, write a prompt that tags each subject as character1, character2, character3 …, and the model composes them into a single coherent clip with stable identity, costume, and prop fidelity. Built on the #1 Arena-ranked HappyHorse 1.0 unified Transformer (Elo 1392), it keeps faces, outfits, and accessories visually locked while adding cinematic motion in 720P or 1080P.
Ideal for: multi-character storytelling | virtual try-on with prop swaps | character + outfit + accessory videos | brand asset assembly | cinematic ad teasers

HappyHorse 1.0 R2V on X: News and Updates

HappyHorse 1.0 R2V on YouTube: Demos and Reviews

HappyHorse 1.0 Reference to Video#

This template on RunComfy uses Alibaba's async video-synthesis API with the happyhorse-1.0-r2v model. You upload 1 to 9 reference images, refer to each one in the prompt as character1, character2, character3 … in the order they appear, and the model fuses those subjects into a single coherent video while preserving identity, color, materials, and composition.

Instead of choosing between text-to-video freedom and image-to-video fidelity, the model lets you bring a cast — a character, an outfit, a prop, an accessory — into one prompt and direct them with natural language. Powered by a 15B-parameter unified Transformer with DMD-2 distillation, the model delivers 1080p output at competitive speed without sacrificing facial fidelity, garment detail, or scene continuity.

Output format: video / resolution tier: 720P or 1080P / duration: 3–15 seconds / aspect ratio: 16:9, 9:16, 1:1, 4:3, 3:4 / reference images: 1–9 per generation

Parameters#

Parameter	Required	Type	Default	Range / Options	Description
image_url_1*	Yes	string	—	JPEG, JPG, PNG, WEBP	First reference image, tagged as character1 in the prompt.
image_url_2 … image_url_9	No	string	—	JPEG, JPG, PNG, WEBP	Optional additional reference images, tagged as character2 … character9.
prompt*	Yes	string	—	max 2500 Chinese / 5000 non-Chinese chars	Scene, motion, camera, lighting; use character1/character2/… to reference each image.
aspect_ratio	No	string	16:9	16:9, 9:16, 1:1, 4:3, 3:4	Output aspect ratio.
resolution	No	string	1080P	720P, 1080P	Output video resolution tier.
duration	No	integer	5	3–15	Output video duration in seconds.
seed	No	integer	0	0 to 2147483647	Optional random seed. Use 0 to let the provider choose one automatically.
watermark	No	boolean	false	true, false	Whether to include the provider watermark on the generated video.

How to Use#

Upload reference image 1 — usually the main character — and add up to 8 more for outfits, props, or supporting characters.
In the prompt, reference each upload by its position: character1 = image 1, character2 = image 2, and so on.
Describe motion, camera move, lighting evolution, and the visual beat you want.
Pick aspect ratio, 720P or 1080P, and a duration between 3 and 15 seconds.
Optionally fix the seed for repeatable comparisons.
Submit and download the finished clip.

Prompt Tips#

Anchor each character by name in one sentence: "character1 wearing character2, holding character3, walking through a sunlit corridor."
Lead with motion and camera verbs — drift, dolly in, orbit, tilt up, push, reveal.
Specify what must stay locked: face, outfit, packaging, logo placement.
Add lighting evolution (sun moving across the face, neon flickering on) for cinematic results.
Keep each clip to one clear visual beat; the model renders single-intent shots most cleanly.
Use sharp, well-lit, ≥720P reference images; avoid heavily compressed or cropped subjects.
Reuse the same seed when comparing prompt or reference variants.

Image Requirements#

Format: JPEG, JPG, PNG, or WEBP.
Short side ≥ 400px, 720P or higher recommended.
File size ≤ 10MB per image.
Public HTTP/HTTPS URL; avoid blurry, over-compressed, or watermarked source images.

Notes#

This template is reference-to-video; for single-image animation use the HappyHorse 1.0 I2V template, and for prompt-only generation use the HappyHorse 1.0 text-to-video template.
Duration outside 3–15 seconds is not exposed in this template.
Generated video URLs returned by the provider are valid for 24 hours; download or rehost promptly.

Related Models

seedance-v2/fast

Generate cinematic clips faster with multimodal references, lip-sync, and camera control

wan-2-2/image-to-video

Refined AI visuals, real-time control, and pro FX for creators

veo-3/image-to-video

Realistic motion, dynamic camerawork, and improved physics.

wan-2-1/image-to-video

Master complex motion, physics, and cinematic effects.

pika-2-2/text-to-video

Create high quality videos from text prompts using Pika 2.2.

hunyuan/image-to-video

Features smooth scene transitions, natural cuts, and consistent motion.

Frequently Asked Questions

What is HappyHorse 1.0 Reference to Video?

HappyHorse 1.0 Reference to Video is the multi-image subject-to-video mode of HappyHorse 1.0 — the #1 Arena-ranked video model (Elo 1392). It accepts 1 to 9 reference images plus a text prompt that tags each subject as character1, character2, character3 …, then fuses them into a single coherent 720P/1080P clip with stable identity, outfit, and prop fidelity.

How is it different from text-to-video and image-to-video?

Text-to-video starts from words only; image-to-video animates one source frame; reference-to-video brings multiple subjects (a person, a costume, an accessory, a prop) into the same generation and lets you direct them with one prompt. It combines the freedom of text prompting with the identity-locking strength of reference images.

How do I reference each image in the prompt?

The reference order is fixed by upload position. Image 1 is character1, image 2 is character2, image 3 is character3, and so on up to character9. In your prompt you write something like “character1 wearing character2, holding character3, walking through a sunlit corridor” — the model binds each tag to the matching reference image.

What resolution and duration does the model output?

The model outputs native 720P or 1080P clips with selectable durations from 3 to 15 seconds, across 16:9, 9:16, 1:1, 4:3, and 3:4 aspect ratios. Output quality is suitable for ad delivery and social publishing without re-grading.

What are the requirements for reference images?

Each reference image must be JPEG, JPG, PNG, or WEBP, with a short side of at least 400 pixels (720P or higher recommended) and a file size under 10MB, served from a public HTTP/HTTPS URL. Avoid blurry, heavily compressed, or watermarked sources — sharp, well-lit references give the model the best chance to lock identity.

What kind of prompts work best?

Anchor each character tag in one sentence, then describe motion and camera language: drift, dolly in, orbit, tilt up, push, reveal. State what must stay locked (face, outfit, packaging), add lighting evolution for a cinematic feel, and keep each clip to one clear visual beat. Reuse the same seed when comparing prompt or reference variants.

What are the typical use cases?

The model is ideal for multi-character storytelling, virtual try-on with prop swaps, character + outfit + accessory videos, brand asset assembly, packaging-to-presentation transitions, and cinematic ad teasers where you already have a cast of reference assets and need them moving together with stable identity.

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

HappyHorse 1.0 Reference to Video fuses up to 9 reference images with a text prompt to generate 3–15s 720P/1080P clips on RunComfy — lock characters, outfits, and props with character1/character2 tags.

Introduction To HappyHorse 1.0 Reference to Video

HappyHorse 1.0 R2V on X: News and Updates

HappyHorse 1.0 R2V on YouTube: Demos and Reviews

HappyHorse 1.0 Reference to Video#

Parameters#

How to Use#

Prompt Tips#

Image Requirements#

Notes#

Related Models

Frequently Asked Questions

What is HappyHorse 1.0 Reference to Video?

How is it different from text-to-video and image-to-video?

How do I reference each image in the prompt?

What resolution and duration does the model output?

What are the requirements for reference images?

What kind of prompts work best?

What are the typical use cases?

HappyHorse 1.0 Reference to Video fuses up to 9 reference images with a text prompt to generate 3–15s 720P/1080P clips on RunComfy — lock characters, outfits, and props with character1/character2 tags.

Introduction To HappyHorse 1.0 Reference to Video

Examples Of HappyHorse 1.0 R2V Creations

HappyHorse 1.0 R2V on X: News and Updates

HappyHorse 1.0 R2V on YouTube: Demos and Reviews

HappyHorse 1.0 Reference to Video#

Parameters#

How to Use#

Prompt Tips#

Image Requirements#

Notes#

Related Models

Frequently Asked Questions

What is HappyHorse 1.0 Reference to Video?

How is it different from text-to-video and image-to-video?

How do I reference each image in the prompt?

What resolution and duration does the model output?

What are the requirements for reference images?

What kind of prompts work best?

What are the typical use cases?

Examples Of HappyHorse 1.0 R2V Creations

HappyHorse 1.0 Reference to Video: Multi-Image Subject-to-Video AI Model | RunComfy

HappyHorse 1.0 Reference to Video fuses up to 9 reference images with a text prompt to generate 3–15s 720P/1080P clips on RunComfy — lock characters, outfits, and props with character1/character2 tags.

Introduction To HappyHorse 1.0 Reference to Video

HappyHorse 1.0 R2V on X: News and Updates

HappyHorse 1.0 R2V on YouTube: Demos and Reviews

HappyHorse 1.0 Reference to Video#

Parameters#

How to Use#

Prompt Tips#

Image Requirements#

Notes#

Related Models

Frequently Asked Questions

What is HappyHorse 1.0 Reference to Video?

How is it different from text-to-video and image-to-video?

How do I reference each image in the prompt?

What resolution and duration does the model output?

What are the requirements for reference images?

What kind of prompts work best?

What are the typical use cases?

HappyHorse 1.0 Reference to Video: Multi-Image Subject-to-Video AI Model | RunComfy

HappyHorse 1.0 Reference to Video fuses up to 9 reference images with a text prompt to generate 3–15s 720P/1080P clips on RunComfy — lock characters, outfits, and props with character1/character2 tags.

Introduction To HappyHorse 1.0 Reference to Video

Examples Of HappyHorse 1.0 R2V Creations

HappyHorse 1.0 R2V on X: News and Updates

HappyHorse 1.0 R2V on YouTube: Demos and Reviews

HappyHorse 1.0 Reference to Video#

Parameters#

How to Use#

Prompt Tips#

Image Requirements#

Notes#

Related Models

Frequently Asked Questions

What is HappyHorse 1.0 Reference to Video?

How is it different from text-to-video and image-to-video?

How do I reference each image in the prompt?

What resolution and duration does the model output?

What are the requirements for reference images?

What kind of prompts work best?

What are the typical use cases?

Examples Of HappyHorse 1.0 R2V Creations