Seamlessly craft, edit, and fuse images for storytelling, branding, and beyond
AIDC-AI (Alibaba)
Text-to-image generation
Advanced multimodal architecture optimized for high-level semantic understanding and visual generation
Supports multiple aspect ratios including Square HD, Portrait, and Landscape formats
Ovis Image represents the latest advancements from AIDC-AI, leveraging deep visual-language alignment to ensure that the generated output strictly follows the user's intent. Unlike older diffusion models that may struggle with long prompts, Ovis Image maintains coherence across detailed scenarios.
On RunComfy, Ovis Image is hosted as a managed, scalable service exposed in three complementary ways:
Prompt, adjust parameters like guidance scale and steps, and run text-to-image jobs directly in your browser.
Ideal for testing prompt fidelity and exploring the capabilities of Ovis Image before integration.
From the playground view, you can use the model as an API and call it from your own apps or services.
This provides a private, production-ready endpoint matching the configuration you tested.
In all cases, inference runs on RunComfy’s cloud GPUs—no local hardware, drivers, or downloads needed.
Ovis Image on RunComfy exposes a streamlined set of parameters designed for ease of use and consistent results.
When configuring the model, the most critical parameter is the prompt. Ovis Image is specifically designed to handle long, descriptive string inputs (such as specific camera angles, lighting, or outfit details) with high proficiency. Conversely, you can use the negative_prompt string to instruct Ovis Image on what to exclude, such as "blur," "low quality," or "distortion."
For processing control, Ovis Image utilizes num_inference_steps, which defines the number of denoising steps. While the default is 28, Ovis Image typically operates within a range of 20 to 50 steps; higher values increase detail but require more processing time. Furthermore, the guidance_scale (a float value defaulting to 5) dictates how strictly Ovis Image follows the text prompt. You can adjust this between 3.0 and 10.0, where higher values force Ovis Image to adhere closely to the text, while lower values allow for more creative interpretation.
To control the visual dimensions, Ovis Image uses the image_size parameter. This allows you to select from various enum options including square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, and landscape_16_9. By default, Ovis Image uses landscape_4_3.
For reproducibility, Ovis Image accepts a seed integer (from 0 to MAX). Setting a specific seed allows you to reproduce the exact same image in future runs. Finally, you can determine the file type using the output_format parameter. Ovis Image defaults to png, but also supports jpeg and webp formats.
Use 28–35 steps and a Guidance Scale of 5–6. Ensure your prompt includes camera details (e.g., "iPhone photograph," "natural lighting") and specific textures to get the best results from Ovis Image.
If your prompt involves multiple subjects or specific spatial arrangements, increase the Guidance Scale to 7.0 to force Ovis Image to strictly adhere to the text description.
For rapid iteration with Ovis Image, reduce num_inference_steps to 20 and use jpeg output format to minimize latency and file size.
Ovis Image targets high-fidelity outputs suitable for commercial and creative use.
image_size presets (e.g., landscape_4_3) rather than custom pixel dimensions to ensure Ovis Image stays within its training distribution.negative_prompt to scrub generic digital artifacts.Ovis Image excels at scenarios requiring high semantic understanding:
Generate "gorpcore" or streetwear imagery with specific clothing textures and outdoor backgrounds using Ovis Image.
Create unique assets for social media campaigns that require specific brand colors or moods described in text.
Rapidly visualize scripts or concepts where specific actions and interactions are described in the prompt.
https://huggingface.co/AIDC-AI
https://github.com/AIDC-AI/Ovis
Ovis Image models generally follow the licensing terms provided by AIDC-AI. Users should verify the specific model license on the official Hugging Face repository before engaging in large-scale commercial applications.
RunComfy facilitates the infrastructure to run these models but does not supersede the original Ovis Image licensing terms.
Seamlessly craft, edit, and fuse images for storytelling, branding, and beyond
Instruction-based AI for seamless visual editing and scalable style adaptation
Prompt-driven image editing with Nano Banana 2 Edit, with multi-image input plus aspect ratio, resolution, safety tolerance, and output controls.
Sync image edits, remixes, reframe, and background swaps for film.
Sharp visual clarity and fast output for layout-rich image design
Edit images with AI for precise text and visuals.
Yes. RunComfy integrates the official Ovis Image model architecture from AIDC-AI. We provide a managed environment that allows you to run Ovis Image without needing to configure local GPU hardware or handle complex environment dependencies.
Commercial usage depends on the specific license terms set by AIDC-AI for the Ovis Image model. While RunComfy provides the infrastructure to run the model, we do not grant commercial rights to the model weights themselves. Please consult the official AIDC-AI repository to verify if your intended commercial use of Ovis Image is permitted.
Ovis Image is optimized for rapid inference on RunComfy’s cloud GPUs. Typically, generating a standard resolution image (e.g., landscape_4_3) takes only a few seconds. However, increasing the num_inference_steps beyond the default 28 or maximizing the guidance_scale may slightly increase the generation time for Ovis Image.
Ovis Image is tuned for specific aspect ratios to ensure maximum visual coherence. On RunComfy, we support optimized presets including square_hd, portrait_16_9, and landscape_4_3. Adhering to these presets ensures Ovis Image delivers the best possible composition and texture details without the artifacts often seen in arbitrary resolutions.
Ovis Image is specifically designed for high semantic understanding. Unlike some older models that ignore parts of long descriptions, Ovis Image excels at adhering to detailed prompts that describe camera angles, lighting conditions, and specific subject attributes, making it ideal for professional creators requiring precision.
Transitioning Ovis Image to production is seamless. Once you have fine-tuned your parameters (like prompt, seed, and guidance_scale) in the playground, you can use the RunComfy API to programmatically call Ovis Image. The API accepts the exact same JSON inputs used in the UI, allowing you to scale Ovis Image integration instantly.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.





