Run ComfyUI as an API

Production-grade, zero-ops, auto-scaling

How to Run the ComfyUI API

1. Build/Test in ComfyUI Cloud

Create your own ComfyUI workflows in cloud, export the workflow's API JSON and choose which parameters you want to adjust at runtime.

Then, use Cloud Save to bundle your nodes, models, dependencies, and runtime into one reproducible container, ready to deploy as a production-grade ComfyUI API.

2. Deploy Workflows as an API

Pick a saved workflow, choose the hardware you need, and set simple autoscaling rules. Once deployed, your ComfyUI API gets a unique deployment_id that your apps can use to send requests.

Monitor performance, scale up or down as needed, and manage multiple API versions seamlessly.

3. Scale On-Demand

Your ComfyUI API automatically scales up when requests come in and scales down to zero when things are quiet, no extra work needed.

After deployment, you can use the API endpoints to send requests, check progress, get results, or cancel jobs.

cURL

curl --request POST \ --url https://api.runcomfy.net/prod/v1/deployments/{DEPLOYMENT_ID}/inference \ --header "Authorization: Bearer YOUR_API_KEY" \ --header "Content-Type: application/json" \ --data '{' "overrides": { "6": { "inputs": { "text": "futuristic cityscape" } } "189": { "inputs": { "image": "https://example.com/new-image.jpg" } } } }

The Easiest Way to Use ComfyUI API

No-Hassle Deployment

Launch your ComfyUI API in one click from a Cloud Save. No Docker, no CUDA setup, no Kubernetes. Everything runs with the exact nodes, models, and libraries you saved, so results are always consistent.

High-Performance GPUs

Pick the GPU power you need, from 16GB (T4/A4000) to 80GB (A100/H100) and up to 141GB (H200), so you can run heavy models smoothly and reliably.

Scale On-Demand

Your API automatically scales up for traffic bursts and down to zero when idle. Control queue sizes and keep-warm settings to keep latency low and costs in check.

Workflow Versioning

Update with confidence. Manage workflow versions and use rolling updates to add features or roll back without interrupting running jobs.

Real-Time Monitoring

Stay on top of performance with a live dashboard. See request counts, queue times, cold starts, execution speed, and usage patterns to optimize your setup.

200+ Ready-to-Deploy Templates

Start fast with over 200 ready-made community workflows. Explore and customize them to fit your needs, save your version to the cloud, and deploy it as your own ComfyUI API in just minutes.

From Prototype to Production,RunComfy Makes ComfyUI APIEasier Than Ever.

Frequently Asked Questions

What is RunComfy, and how does it differ from local ComfyUI for ComfyUI API?

RunComfy Serverless API turns your ComfyUI workflows into production-grade ComfyUI APIs with auto-scaling and no operations needed. This lets you focus on building generative AI without infrastructure worries. Unlike local ComfyUI setups that require hardware management, CUDA setup, and ongoing monitoring, RunComfy Serverless API handles deployment, scaling, and consistency in the cloud. Your ComfyUI API runs reliably on high-performance GPUs, making it easy to go from prototype to production. For more details, please read the RunComfy Serverless API documentation.

How can I deploy a ComfyUI workflow as a ComfyUI API service?

To deploy a ComfyUI workflow as a ComfyUI API service on RunComfy, start by building it in ComfyUI Cloud and saving it along with your nodes, models, and dependencies. Then, select GPU hardware, set autoscaling rules, and deploy with a few clicks. This creates a serverless ComfyUI API that scales automatically, processes requests asynchronously, and provides endpoints for inferences. You'll have a ready-to-use ComfyUI API without dealing with Docker, Kubernetes, or manual configurations, everything is reproducible and consistent.

How do I start deploying a ComfyUI workflow as a ComfyUI API?

To deploy your ComfyUI workflow as a ComfyUI API on RunComfy, start in ComfyUI Cloud where you can easily create or edit your workflow. Once it's ready, export it as a simple API JSON file and pick the parts you want to tweak during runs, like prompts or seeds—this keeps things flexible. From there, just click Cloud Save. RunComfy takes care of the rest by bundling your workflow, nodes, models, and full setup into a ready-to-use container, so you skip all the technical headaches. Finally, deploy it by selecting your preferred GPU and basic scaling options. You'll instantly get a unique deployment ID to connect your ComfyUI API to your apps or projects. The whole thing is designed to be quick and hassle-free, letting you focus on your creative ideas while getting a scalable ComfyUI API without any DevOps work. For more details, check RunComfy Serverless API - Quickstart documentation.

How do I export a ComfyUI workflow in ComfyUI API format?

For the latest version of ComfyUI, open the ComfyUI interface, locate the Workflow menu in the upper-left corner, and select "Export (API)" from the options. This will generate a JSON file that includes all your nodes, inputs, default values, and connections. For older versions, you need to enable dev mode in the settings (click the gear icon next to Queue Size or in the menu box, then check the "Enable Dev mode Options" box), which will make the "Save (API Format)" button appear in the menu.

What GPUs are available for ComfyUI API, and how do I choose the right one for my workflow?

RunComfy offers a range of high-performance GPUs for your ComfyUI API deployments, with VRAM from 16GB for basic workflows to 141GB for intensive models. To choose the right one for your ComfyUI API workflow, consider your model's size and memory needs, start with around 48GB (like X-Large or X-Large Plus) for most typical tasks to ensure smooth performance, then scale up or down based on testing. Monitor usage in the dashboard to optimize. For full details, visit the RunComfy Pricing page.

Can I use custom nodes, models, or dependencies in my deployed ComfyUI API?

Yes, you can easily include custom nodes, models, or dependencies in your deployed ComfyUI API. Simply add them when saving your workflow in ComfyUI Cloud, such as custom nodes, models, or specific libraries, and they'll be bundled into the container. RunComfy automatically recreates your exact environment for consistent, reliable results every time. No extra setup is required after deployment, so you can build advanced ComfyUI APIs that fit your specific needs.

Can I use RunComfy templates to deploy a ComfyUI API, and can I customize them?

Yes, RunComfy's 200+ templates let you deploy a ComfyUI API quickly, providing workflows corresponding to the latest models. Browse community workflows, fork one, tweak nodes or parameters, and save it as your own. Then deploy it as a customized ComfyUI API. All your changes stay private.

What are the API endpoints after deploying a ComfyUI API, and how to use them?

After deploying your ComfyUI API, you have endpoints for key actions: POST to queue inferences, GET to check job status or progress, GET to retrieve results like images or videos, and POST to cancel jobs. Use your deployment_id in HTTP/REST requests, with API keys for security. This asynchronous design keeps your ComfyUI API efficient, so you can track jobs easily. For full details, visit the RunComfy Serverless API - Async Queue Endpoints documentation.

Can I integrate the ComfyUI API with my existing tech stack?

Yes, you can easily integrate the ComfyUI API with your existing tech stack. It uses simple HTTP/REST calls and JSON data, so it works with common tools like curl, Python, or JavaScript. Check the Quickstart for ready-to-use code snippets to get started fast.

How does auto-scaling work for ComfyUI API, and can I control it to manage costs?

Auto-scaling for your ComfyUI API increases instances during busy times and scales to zero when idle, keeping things efficient. You can set min/max instances, queue sizes, and keep-warm times to fine-tune latency and costs. You're only charged for active GPU time, with no fees for downtime. This flexible control helps you run a cost-effective ComfyUI API that matches your traffic patterns.

How can I monitor and optimize my ComfyUI API's performance?

You can monitor your ComfyUI API with a real-time dashboard that shows request counts, queue times, cold starts, execution speeds, and usage patterns. You can also review billing data in the dashboard to track and optimize costs based on GPU time. Use these insights to adjust GPUs, scaling rules. This helps you keep your ComfyUI API running smoothly, fix issues fast, and manage expenses effectively.

What happens if I need to update my ComfyUI workflow without downtime?

To update your ComfyUI workflow without downtime, save changes as a new version under the same name, this bundles the updates into a fresh container while keeping your live ComfyUI API running on the current version. When ready, edit the deployment to switch to the new version, which rolls out gradually: existing jobs complete on the old one, and new requests use the update. Roll back anytime by selecting a previous version. This ensures your ComfyUI API stays stable and available. For more details, refer to RunComfy Serverless API - Workflow Versions and RunComfy Serverless API - Edit a Deployment.

How is my data kept secure on RunComfy?

Your workflows run on dedicated, isolated GPUs, which guarantees complete resource separation so that no processes or memory are ever shared with other users. This ensures that your computation environment remains private and independent, providing both stability and security. Each ComfyUI execution environment, including the operating system, Python runtime, ComfyUI core, workflow definitions, models, and custom nodes, is encapsulated in its own secure cloud container. These containers are persistent, allowing your entire setup to be reliably reproduced across sessions while remaining fully private to you. Access to these environments is strictly controlled: only you can manage or expose your containerized setup, and no third party, including RunComfy, can access it unless you explicitly choose to share.

Are there any limitations on ComfyUI workflow complexity or ComfyUI API usage?

Most ComfyUI workflows run smoothly with the ComfyUI API. However, very large models may require GPUs with higher VRAM to avoid memory-related issues. The number of concurrent jobs you can run depends on your scaling configuration, and queue limits can be adjusted to fit your workload. For high-volume or specialized needs, enterprise support is available, please reach out to us at hi@runcomfy.com.

How does billing work for the ComfyUI API?

Billing for the ComfyUI API follows a pay-per-use model. You are only charged for the exact number of seconds your GPU is actively running, giving you full cost efficiency and flexibility. For more details, please see the RunComfy Serverless API - Billing documentation.

What kind of support is available if I run into issues with the ComfyUI API?

If you encounter issues while using the ComfyUI API, we recommend first checking the official documentation RunComfy Serverless API – Error Handling, which covers common error codes and troubleshooting steps. If the problem persists or you need additional assistance, you can always contact us at hi@runcomfy.com.

Do you offer services for enterprises or teams?

Yes, we provide solutions tailored for enterprises and teams. For more details and customized support, please contact us directly at hi@runcomfy.com.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.