TL;DR — How fast is your PC for SDXL?
Calibrated 1024×1024 SDXL image generation times (30 steps + refiner, native ComfyUI): RTX 4090 3-5s, RTX 4070 9-14s, RTX 3060 12GB 20-32s, RX 7900 XTX 4-7s, M3 Max 10-17s, M3 Ultra 5-9s. Want yours measured automatically? Run the 9bench test (15 seconds, browser-only). Or skim the tier tables below.

Stable Diffusion XL is the most popular open-source image model in 2026. ComfyUI shipped 4M+ downloads. Automatic1111 still has its loyal user base. Forge, Fooocus, InvokeAI, SwarmUI — every flavour exists. They all run SDXL. The question for most users isn't "which UI" but "will my PC run it well?"

This article gives you calibrated answers. We'll cover the hardware test you can run in 15 seconds (no install), then break down expected SDXL performance by GPU tier with real seconds-per-image numbers from public ComfyUI benchmarks.

The 15-second hardware test (in your browser)

9bench.com runs a hardware probe via WebGPU + WebAssembly + Web Workers. Open the page, click Start, wait 15 seconds. Result: your CPU/GPU/RAM scores plus an AI Capabilities section that predicts SDXL feasibility on your hardware.

What it actually measures and how it predicts SDXL:

  1. GPU detection via WEBGL_debug_renderer_info — extracts the actual GPU model
  2. FP16 support check via WebGPU shader-f16 feature — required for SDXL native speed (FP32 fallback is 2× slower)
  3. VRAM probe — checks max allocatable buffer size, infers usable VRAM
  4. GPU class lookup — matches your detected GPU against a curated table of 50+ entries with calibrated SDXL times sourced from ComfyUI benchmarks, TechPowerUp, public Hugging Face Spaces measurements
  5. Result — predicts seconds per 1024×1024 image (low/high range based on sampler choice)

This isn't a deep-learning benchmark — we don't actually run SDXL in your browser (it'd take minutes). It's a calibrated lookup based on your detected hardware. Honest about being a prediction, not a measurement.

📊 Want to actually run SDXL right now?
Try Hugging Face Spaces (browser-hosted, free with Hugging Face account) or Fooocus (one-click install, easiest local UI for SDXL). 9bench predicts whether it'll be fast on your machine. Those tools actually run it.

Tier-by-tier SDXL performance (1024×1024, 30 steps + refiner)

Numbers below are median seconds per image on stock SDXL (no LoRA, no ControlNet) generating a 1024×1024 image with 30 steps base + 10 steps refiner. Sampler: DPM++ 2M Karras. Sources: ComfyUI public benchmarks, Civitai user reports, Tom's Hardware AI workload tests.

Tier 1: Beast (under 6 seconds per image)

GPU VRAM SDXL 1024² (sec) Batch 4 feasible?
RTX 509032 GB2-4Yes (batch 8+)
RTX 409024 GB3-5Yes (batch 4-6)
RX 7900 XTX24 GB4-7Yes (batch 4)
RTX 508016 GB3-5Yes (batch 2-3)
RTX 4080 Super16 GB4-7Yes (batch 2-3)
Apple M3 Ultraup to 192 GB unified5-9Yes (memory-rich)

Tier 1 is "make it as fast as it can be". Suitable for: Civitai-style mass image creation, SDXL-Turbo experimentation at 60+ images/minute, ComfyUI animation workflows, training LoRAs locally.

Tier 2: Workstation (6-15 seconds per image)

GPU VRAM SDXL 1024² (sec) Batch 2 feasible?
RTX 5070 Ti16 GB4-7Yes
RTX 4070 Ti Super16 GB6-10Yes
RTX 3090 (used)24 GB7-11Yes
RX 7900 XT20 GB5-8Yes
RTX 4070 Super12 GB8-13Yes (tight)
RX 7800 XT16 GB8-12Yes
Apple M3 Maxup to 64 GB unified10-17Yes
RTX 407012 GB9-14Yes (tight)

Tier 2 is the practical creator tier. Generate 4-10 images per minute. Batch generation works. LoRAs and ControlNet add ~30-50% overhead. Refiner stays enabled.

Tier 3: Mainstream (15-30 seconds per image)

GPU VRAM SDXL 1024² (sec) Refiner advised?
RTX 4060 Ti 16GB16 GB12-20Yes
RTX 308010 GB9-14Yes (tight VRAM)
RTX 3070 Ti8 GB12-19Disable for batch
RX 7700 XT12 GB10-15Yes
RTX 4060 Laptop8 GB18-30Disable for batch
RTX 40608 GB14-22Disable for batch
RX 6700 XT12 GB22-35Yes
Apple M3 Pro18-36 GB unified14-24Yes
RTX 3060 12GB12 GB20-32Yes

Tier 3 is "it works but be patient". Generate 2-4 images per minute. Use SDXL-Turbo or LCM LoRA for fast iteration; switch to full 30-step DPM++ for finals. 8 GB VRAM cards work but require --medvram and disabling the refiner for stable batch generation.

Tier 4: Working (30-90 seconds per image)

GPU VRAM SDXL 1024² (sec) Tips
GTX 1080 Ti11 GB30-50Use SDXL-Turbo (8 steps), no refiner
RTX 2060 / 20706-8 GB35-70--medvram, smaller resolution first
Apple M2 / M2 Pro16+ GB unified30-50Use Diffusers with MPS backend
Apple M1 Max32+ GB unified18-30Better than people expect
RX 6600 / 6650 XT8 GB28-45ROCm or Vulkan path; --medvram
Apple M1 Pro16+ GB unified25-40Diffusers MPS

Tier 4 is "you can do it but switch to SDXL-Turbo or LCM-LoRA for usable iteration". Generate 1-2 images per minute on full 30-step. Generate 6-15 images per minute on Turbo (8 steps, no refiner). Most users on this tier should default to Turbo workflows.

Tier 5: Patient (90+ seconds, or skip SDXL for SD 1.5)

On Intel Iris Xe, AMD Radeon 680M, low-end APUs, or ancient discrete GPUs (GTX 1060 6 GB, RX 580): SDXL is technically possible but punishing. Better path:

The 8 GB VRAM trap

SDXL was designed for 12+ GB VRAM but the community has built escape hatches. If you have 8 GB:

All of these work. None are as comfortable as having 12+ GB. If you're shopping for a GPU primarily for SDXL: spend the extra $100-150 for a 12 GB+ card. RTX 3060 12GB used at $200 is the price/perf champion for budget SDXL.

⚠️ The "8 GB is fine" myth
Multiple YouTube tutorials and Reddit posts insist that 8 GB is "totally fine for SDXL". Technically correct. Practically: it means slower generation, can't run refiner with batch, can't use heavy ControlNet stacks, can't train LoRA on 1024² resolution. If SDXL is your main use case, treat 12 GB as the floor. 8 GB is for "I'll occasionally generate one image".

Apple Silicon for SDXL: better than the reputation suggests

Common misconception: "Apple is bad for image generation". Reality is more nuanced.

What's true: on a per-image basis, NVIDIA wins. M3 Max takes 10-17s vs RTX 4090's 3-5s. The 4090 is roughly 3× faster.

What's not true: Apple is unusably slow. M3 Pro at 14-24s/image is comparable to an RTX 3060 12GB at 20-32s. M3 Max at 10-17s/image beats RTX 4060 Ti and RTX 3070. M3 Ultra at 5-9s/image beats everything except RTX 4080 Super and up.

Apple specifics:

SDXL in the browser (yes, this works in 2026)

Browser-based SDXL via WebGPU is now a real option. Hugging Face Spaces hosts dozens of SDXL/Turbo demos. ONNX Runtime Web supports SDXL inference. Browser performance:

Hardware Native SDXL (s) Browser SDXL (s) Slowdown
RTX 40903-520-40~6-8×
RTX 40709-1450-90~6×
M3 Max10-1740-80~4-5×
RTX 3060 12GB20-3290-150~5×

The slowdown ratio is similar to browser vs native LLM: 5-8× depending on hardware. Browser SDXL works well for demos, "try this without installing anything" pages, and quick triage. Not a replacement for native ComfyUI workflows.

What 9bench specifically tells you about SDXL

Run 9bench.com, scroll to AI Capabilities. You'll see:

All this in the same 15-second test that benchmarks your CPU/GPU/RAM. Free, no install, no signup. Open methodology — every number traceable to a public benchmark source.

Common questions

"Can I run SDXL on my GTX 1060 6GB?" Technically yes with --lowvram, but expect 60-120s/image and frequent crashes. SD 1.5 is the better fit for a 1060 6GB.

"Should I buy a used RTX 3090 for SDXL?" Yes — best price/perf VRAM at $700-900. 24 GB lets you run massive ControlNet stacks, train LoRAs, batch-generate without OOM. Faster than a new RTX 4070 for SDXL despite being 4 years old.

"Is SDXL faster on Linux or Windows?" Linux is 5-10% faster on NVIDIA (better CUDA driver overhead) and notably faster on AMD (ROCm 6.0+ matures Linux first). For most users the difference isn't worth dual-booting.

"Will SDXL ever run faster on my hardware?" Yes, every 6 months. SDXL optimisations keep landing — TensorRT for NVIDIA cuts 30%, ROCm Composable Kernels for AMD cut 20%, LCM-LoRA cuts steps by 4-8×. A GPU bought today will be ~2× faster in 2 years on the same SDXL workload.

"What about Stable Diffusion 3 / Flux / SDXL Lightning?" Flux.1 (Black Forest Labs) is heavier than SDXL — needs 12 GB minimum, 16 GB+ comfortable, ~50% slower per image. SD3 Medium is similar to SDXL. SDXL Lightning is much faster (1-step) but lower quality. The hardware tiers above scale similarly: if your tier handles SDXL, it handles SD3 and most Flux quants. Flux fp16 is the exception — that's an RTX 4080+ workload.

Test your PC for SDXL — 15 seconds

9bench detects your GPU, checks FP16 / VRAM / WebGPU support, and predicts SDXL generation time on your specific hardware. Browser-only, no install, no signup. Calibrated against ComfyUI / Automatic1111 public benchmarks.

Test my PC for SDXL →