This workflow brings ByteDance USO to ComfyUI for creators who want identity‑faithful characters and precise style transfer in one place. Built on FLUX.1‑dev, it supports subject‑driven, style‑driven, and combined generation so you can place a character into new scenes while keeping likeness, apply styles from reference images, or do both at once.
Use ByteDance USO when you need strong subject coherence with flexible, high‑quality style control. The graph includes two complementary branches: a subject+style path that conditions on an identity image, and a prompt‑driven path that can be used with or without style references. Both paths save images independently so you can compare results quickly.
The graph has two branches that can run independently. The upper branch uses an identity image plus style references; the lower branch is prompt‑driven and can optionally include style references. Generate from either branch or both.
This step initializes FLUX.1‑dev, the ByteDance USO LoRA, the USO projector, and the SigCLIP vision encoder. It prepares the base model for unified style and subject guidance. Both branches load the same set so you can run subject+style or prompt workflows without reconfiguring models. Once loaded, the model stream is ready for USO’s reference processors.
Provide a clean identity image of your character. The workflow scales it to a suitable working size and encodes it into a latent that preserves key facial or character features. This latent is fused with your prompt so ByteDance USO can place the subject into new scenes while keeping identity. Omit this step if you want style‑only or text‑only generation.
Add one or two style images to guide palette, materials, and brushwork. Each image is encoded with the vision model and applied through USO’s style reference nodes, which layer style influences onto the loaded model. Order matters when using two references, because the second ref is applied after the first. You can bypass this group to run a pure subject‑driven or text‑only pass.
Write an intent‑driven prompt for composition, mood, and details. In the subject+style branch, your prompt is combined with the identity latent and USO’s guidance so text, subject, and style pull in the same direction. In the prompt‑driven branch, the text alone (optionally with style references) steers the image. Keep prompts specific; avoid contradicting the chosen style.
Pick the target resolution for generation. The chosen size influences composition tightness and detail density, especially for portraits vs full‑body shots. If VRAM is limited, start smaller and scale up later. Both branches expose a simple image‑size node so you can tailor aspect and fidelity to your use case.
Each branch samples with a standard sampler, decodes to RGB, and saves to its own output. You will typically get two images per run: one styled subject result and one prompt‑driven result. Iterate by adjusting the prompt or swapping references; resample to explore alternatives or fix the seed for repeatability.
USOStyleReference
(#56)Applies a style image to the current model stream using the USO projector and CLIP‑Vision features. Use one reference for a strong, coherent look or chain two for nuanced blends; the second reference refines the first. If the style dominates too much, try a single, cleaner reference or simplify its content.
ReferenceLatent
(#44)Injects the encoded subject latent into the conditioning path so ByteDance USO preserves identity. Works best with uncluttered identity photos that clearly show the character’s face or defining features. If identity slips, feed a more complete reference or reduce conflicting style cues.
FluxKontextMultiReferenceLatentMethod
(#41)Combines multiple reference signals within the FLUX context pathway. This is where subject and prompt context are balanced before sampling. If results feel over‑constrained, relax references; if they drift, strengthen subject imagery or simplify the prompt.
FluxGuidance
(#35)Controls the strength of text guidance relative to reference signals. Lower values let subject/style lead; higher values enforce the prompt more strongly. Adjust when you see either prompt underfitting (raise guidance) or style/subject being overridden (lower guidance).
ImageScaleToMaxDimension
(#109)Prepares the identity image for stable feature extraction. Smaller max sizes favor broader composition; larger sizes help when the reference is a tight portrait and you need crisper identity cues. Tune based on whether your subject ref is full‑body or a headshot.
EasyCache
(#95)Speeds up inference by reusing intermediate states when changes are minor. Great for prompt tweaks and rapid iteration, but it can slightly reduce micro‑details. Disable it for final, highest‑quality renders.
KSampler
(#31)Runs the diffusion steps and controls stochasticity via seed and sampler choice. Increase steps for more detail, or lock the seed to reproduce a look while changing references. If textures look noisy, try a different sampler or fewer steps with stronger style guidance.
This workflow implements and builds upon the following works and resources. We gratefully acknowledge ByteDance for the USO model and the ComfyUI team for the ByteDance USO ComfyUI Native Workflow tutorial for their contributions and maintenance. For authoritative details, please refer to the original documentation and repositories linked below.
Note: Use of the referenced models, datasets, and code is subject to the respective licenses and terms provided by their authors and maintainers.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.