Open-source AI Toolkit Inference Code (Diffusers)

If you train LoRAs with Ostris AI Toolkit, you’ve probably hit this at least once:

Your AI Toolkit training Samples / Previews look great.
The same LoRA looks different when you run inference in ComfyUI, Diffusers, or another stack.

In most cases, your LoRA isn’t “broken”, your inference pipeline is different.

Small differences add up fast: base model variant, scheduler/step semantics, VAE/CLIP defaults, resolution snapping, and even how the LoRA is applied (adapter vs merge/fuse, model-family quirks).

To make AI Toolkit-style inference easier to reproduce, audit, and debug, RunComfy publishes the reference inference implementation we use for AI Toolkit LoRAs as open source, built with Hugging Face Diffusers.

GitHub repo: runcomfy-com/ai-toolkit-inference

What this open-source repo is for

Use this repo when you want to:

Reproduce AI Toolkit Samples/Previews outside AI Toolkit (with the same inference logic)
Debug “training preview vs inference” drift by inspecting and controlling every part of the pipeline
Build your own inference service (e.g., run behind an API) using a Diffusers-based implementation

If your main goal is simply “run my LoRA and match the training Samples,” you may not need to read the code — RunComfy also ships the same preview-matching behavior via managed inference (Playground/API) and ComfyUI workflows.

What’s inside the repo

The project is designed to make AI Toolkit preview behavior auditable and reproducible. It typically includes:

Base-model specific Diffusers pipelines (image, edit/control, video — depending on the model family)
AI Toolkit training YAML → inference settings (treat the YAML as the “contract”)
LoRA loading + application logic (adapter vs merge/fuse; model-family binding quirks)
Resolution snapping rules to match AI Toolkit Samples/Previews
Optional async server example (e.g., FastAPI) to run inference behind an API

How this relates to RunComfy Trainer Inference

RunComfy uses the same preview-matching idea:

Lock the exact base model / variant
Match model-family inference defaults
Keep the same pipeline behavior used to render AI Toolkit training Samples/Previews

You can run that aligned pipeline in two developer-friendly ways:

Playground / API inference (fast validation + integration)
ComfyUI inference (workflow per base model — load your LoRA and generate preview-matching results)

Guides:

Playground/API parity: AI Toolkit Inference: Get Results That Match Your Training Samples
ComfyUI preview-match workflows: AI Toolkit Inference in ComfyUI: Get Results That Match Your Training Samples
Debugging drift: AI Toolkit Preview vs Inference Mismatch