How do I test if my PC can run Stable Diffusion XL?

The fastest way: visit 9bench.com, run the 15-second test, then check the AI Capabilities section. It detects your GPU, VRAM, FP16 support, and predicts SDXL generation time per 1024×1024 image (calibrated against real ComfyUI/Automatic1111 benchmarks). No install needed.

What GPU do I need for Stable Diffusion XL?

Minimum: any GPU with 8 GB VRAM and FP16 support. Comfortable: 12 GB VRAM (RTX 3060 12GB, RTX 4070, RX 7700 XT). Fast: 16 GB+ (RTX 4080 Super, RX 7900 XTX, RTX 4090). On Apple Silicon, M2 Pro 16GB or better. Below 8 GB you can run SDXL via memory-saving tricks but it's slow and crashes a lot.

Can I run SDXL on 8 GB VRAM?

Yes, but with compromises. Use --medvram or --lowvram flags in Automatic1111, or the Tile VAE node in ComfyUI. Generation takes 25-45 seconds per 1024×1024 image at 8 GB instead of 8-15 seconds at 12 GB+. Refiner stage often needs to be disabled. For batch generation, 8 GB is painful.

How long does SDXL take per image on RTX 4090?

3-5 seconds per 1024×1024 image with 30 steps and refiner enabled (Automatic1111 / ComfyUI). With LCM/Turbo at 8 steps: under 1 second. The RTX 4090 is roughly 3× faster than RTX 4070 for SDXL workloads thanks to memory bandwidth and tensor cores.

Is Stable Diffusion XL faster on Apple Silicon or NVIDIA?

NVIDIA wins on per-image speed. RTX 4090: 3-5 seconds; M3 Max: 10-17 seconds; M3 Ultra: 5-9 seconds. Apple unified memory advantage doesn't help SDXL the way it helps LLM, because SDXL fits comfortably in 24 GB. NVIDIA tensor cores + CUDA-optimised samplers (DPM++ 2M) make the difference. Use Mac for LLM, NVIDIA for SDXL.

Can I run Stable Diffusion XL in the browser?

Yes, partially. Browser-based SDXL via WebGPU (sdxl-turbo on Hugging Face Spaces, ONNX Runtime Web) works on RTX 30/40-series and Apple Silicon. Generation is 5-10× slower than native (30-90 seconds per 1024×1024 image vs 3-15 seconds native). Useful for demos and triage; not for production work.

Test Your PC for Stable Diffusion XL in 15 Seconds (No Install)

TL;DR — How fast is your PC for SDXL?

Calibrated 1024×1024 SDXL image generation times (30 steps + refiner, native ComfyUI): RTX 4090 3-5s, RTX 4070 9-14s, RTX 3060 12GB 20-32s, RX 7900 XTX 4-7s, M3 Max 10-17s, M3 Ultra 5-9s. Want yours measured automatically? Run the 9bench test (15 seconds, browser-only). Or skim the tier tables below.

Stable Diffusion XL is the most popular open-source image model in 2026. ComfyUI shipped 4M+ downloads. Automatic1111 still has its loyal user base. Forge, Fooocus, InvokeAI, SwarmUI — every flavour exists. They all run SDXL. The question for most users isn't "which UI" but "will my PC run it well?"

This article gives you calibrated answers. We'll cover the hardware test you can run in 15 seconds (no install), then break down expected SDXL performance by GPU tier with real seconds-per-image numbers from public ComfyUI benchmarks.

The 15-second hardware test (in your browser)

9bench.com runs a hardware probe via WebGPU + WebAssembly + Web Workers. Open the page, click Start, wait 15 seconds. Result: your CPU/GPU/RAM scores plus an AI Capabilities section that predicts SDXL feasibility on your hardware.

What it actually measures and how it predicts SDXL:

GPU detection via WEBGL_debug_renderer_info — extracts the actual GPU model
FP16 support check via WebGPU shader-f16 feature — required for SDXL native speed (FP32 fallback is 2× slower)
VRAM probe — checks max allocatable buffer size, infers usable VRAM
GPU class lookup — matches your detected GPU against a curated table of 50+ entries with calibrated SDXL times sourced from ComfyUI benchmarks, TechPowerUp, public Hugging Face Spaces measurements
Result — predicts seconds per 1024×1024 image (low/high range based on sampler choice)

This isn't a deep-learning benchmark — we don't actually run SDXL in your browser (it'd take minutes). It's a calibrated lookup based on your detected hardware. Honest about being a prediction, not a measurement.

📊 Want to actually run SDXL right now?

Try Hugging Face Spaces (browser-hosted, free with Hugging Face account) or Fooocus (one-click install, easiest local UI for SDXL). 9bench predicts whether it'll be fast on your machine. Those tools actually run it.

Tier-by-tier SDXL performance (1024×1024, 30 steps + refiner)

Numbers below are median seconds per image on stock SDXL (no LoRA, no ControlNet) generating a 1024×1024 image with 30 steps base + 10 steps refiner. Sampler: DPM++ 2M Karras. Sources: ComfyUI public benchmarks, Civitai user reports, Tom's Hardware AI workload tests.

Tier 1: Beast (under 6 seconds per image)

GPU	VRAM	SDXL 1024² (sec)	Batch 4 feasible?
RTX 5090	32 GB	2-4	Yes (batch 8+)
RTX 4090	24 GB	3-5	Yes (batch 4-6)
RX 7900 XTX	24 GB	4-7	Yes (batch 4)
RTX 5080	16 GB	3-5	Yes (batch 2-3)
RTX 4080 Super	16 GB	4-7	Yes (batch 2-3)
Apple M3 Ultra	up to 192 GB unified	5-9	Yes (memory-rich)

Tier 1 is "make it as fast as it can be". Suitable for: Civitai-style mass image creation, SDXL-Turbo experimentation at 60+ images/minute, ComfyUI animation workflows, training LoRAs locally.

Tier 2: Workstation (6-15 seconds per image)

GPU	VRAM	SDXL 1024² (sec)	Batch 2 feasible?
RTX 5070 Ti	16 GB	4-7	Yes
RTX 4070 Ti Super	16 GB	6-10	Yes
RTX 3090 (used)	24 GB	7-11	Yes
RX 7900 XT	20 GB	5-8	Yes
RTX 4070 Super	12 GB	8-13	Yes (tight)
RX 7800 XT	16 GB	8-12	Yes
Apple M3 Max	up to 64 GB unified	10-17	Yes
RTX 4070	12 GB	9-14	Yes (tight)

Tier 2 is the practical creator tier. Generate 4-10 images per minute. Batch generation works. LoRAs and ControlNet add ~30-50% overhead. Refiner stays enabled.

Tier 3: Mainstream (15-30 seconds per image)

GPU	VRAM	SDXL 1024² (sec)	Refiner advised?
RTX 4060 Ti 16GB	16 GB	12-20	Yes
RTX 3080	10 GB	9-14	Yes (tight VRAM)
RTX 3070 Ti	8 GB	12-19	Disable for batch
RX 7700 XT	12 GB	10-15	Yes
RTX 4060 Laptop	8 GB	18-30	Disable for batch
RTX 4060	8 GB	14-22	Disable for batch
RX 6700 XT	12 GB	22-35	Yes
Apple M3 Pro	18-36 GB unified	14-24	Yes
RTX 3060 12GB	12 GB	20-32	Yes

Tier 3 is "it works but be patient". Generate 2-4 images per minute. Use SDXL-Turbo or LCM LoRA for fast iteration; switch to full 30-step DPM++ for finals. 8 GB VRAM cards work but require --medvram and disabling the refiner for stable batch generation.

Tier 4: Working (30-90 seconds per image)

GPU	VRAM	SDXL 1024² (sec)	Tips
GTX 1080 Ti	11 GB	30-50	Use SDXL-Turbo (8 steps), no refiner
RTX 2060 / 2070	6-8 GB	35-70	--medvram, smaller resolution first
Apple M2 / M2 Pro	16+ GB unified	30-50	Use Diffusers with MPS backend
Apple M1 Max	32+ GB unified	18-30	Better than people expect
RX 6600 / 6650 XT	8 GB	28-45	ROCm or Vulkan path; --medvram
Apple M1 Pro	16+ GB unified	25-40	Diffusers MPS

Tier 4 is "you can do it but switch to SDXL-Turbo or LCM-LoRA for usable iteration". Generate 1-2 images per minute on full 30-step. Generate 6-15 images per minute on Turbo (8 steps, no refiner). Most users on this tier should default to Turbo workflows.

Tier 5: Patient (90+ seconds, or skip SDXL for SD 1.5)

On Intel Iris Xe, AMD Radeon 680M, low-end APUs, or ancient discrete GPUs (GTX 1060 6 GB, RX 580): SDXL is technically possible but punishing. Better path:

Use Stable Diffusion 1.5 instead — generates 512×512 images in 5-15s on the same hardware. Quality is lower but practical.
Use SDXL-Turbo at 1 step — yes, just 1 step. Quality drops noticeably but 90s images become 15s images.
Use cloud — Hugging Face Spaces, Replicate, or RunPod give you 5-second SDXL for $0.001-0.01 per image. Cheaper than electricity for hours of local generation.

The 8 GB VRAM trap

SDXL was designed for 12+ GB VRAM but the community has built escape hatches. If you have 8 GB:

Automatic1111 / Forge: add --medvram-sdxl to launch flags. ~30% slower but stable.
ComfyUI: enable --lowvram or use Tile VAE node + sequential CPU offload.
Disable the refiner: full SDXL pipeline = base + refiner. The refiner adds quality but doubles VRAM peak. Skip it on 8 GB cards.
Smaller initial resolution: generate at 832×832, upscale via Hires Fix to 1.25×. Final image is similar to 1024×1024 but VRAM peak is lower.
SDXL-Turbo or LCM: 8 steps instead of 30, no refiner. Half the VRAM peak.

All of these work. None are as comfortable as having 12+ GB. If you're shopping for a GPU primarily for SDXL: spend the extra $100-150 for a 12 GB+ card. RTX 3060 12GB used at $200 is the price/perf champion for budget SDXL.

⚠️ The "8 GB is fine" myth

Multiple YouTube tutorials and Reddit posts insist that 8 GB is "totally fine for SDXL". Technically correct. Practically: it means slower generation, can't run refiner with batch, can't use heavy ControlNet stacks, can't train LoRA on 1024² resolution. If SDXL is your main use case, treat 12 GB as the floor. 8 GB is for "I'll occasionally generate one image".

Apple Silicon for SDXL: better than the reputation suggests

Common misconception: "Apple is bad for image generation". Reality is more nuanced.

What's true: on a per-image basis, NVIDIA wins. M3 Max takes 10-17s vs RTX 4090's 3-5s. The 4090 is roughly 3× faster.

What's not true: Apple is unusably slow. M3 Pro at 14-24s/image is comparable to an RTX 3060 12GB at 20-32s. M3 Max at 10-17s/image beats RTX 4060 Ti and RTX 3070. M3 Ultra at 5-9s/image beats everything except RTX 4080 Super and up.

Apple specifics:

Use Diffusers with MPS backend (Hugging Face Diffusers library) — fastest on Apple. Roughly 2× faster than CoreML route.
Use DrawThings app (free, App Store) — easiest UI, well-optimised for Apple Silicon
Avoid Automatic1111 directly — works but slower than Diffusers/DrawThings on Mac
Unified memory advantage: not useful for SDXL (model fits in 24 GB anyway). It's an LLM win, not an SDXL win.

SDXL in the browser (yes, this works in 2026)

Browser-based SDXL via WebGPU is now a real option. Hugging Face Spaces hosts dozens of SDXL/Turbo demos. ONNX Runtime Web supports SDXL inference. Browser performance:

Hardware	Native SDXL (s)	Browser SDXL (s)	Slowdown
RTX 4090	3-5	20-40	~6-8×
RTX 4070	9-14	50-90	~6×
M3 Max	10-17	40-80	~4-5×
RTX 3060 12GB	20-32	90-150	~5×

The slowdown ratio is similar to browser vs native LLM: 5-8× depending on hardware. Browser SDXL works well for demos, "try this without installing anything" pages, and quick triage. Not a replacement for native ComfyUI workflows.

What 9bench specifically tells you about SDXL

Run 9bench.com, scroll to AI Capabilities. You'll see:

SDXL feasibility verdict: "Easy" / "Comfortable" / "Memory-saver mode required" / "Painful" / "Don't bother"
Calibrated time-per-image range: e.g. "8-13 seconds per 1024×1024 image" — sourced from your detected GPU's class
Browser vs Native breakdown: what the same hardware would do in ComfyUI native vs browser-only WebGPU
VRAM verdict: enough for refiner, enough for batch, enough for LoRA training
Recommended workflow: SDXL-Turbo vs full 30-step, refiner on/off, --medvram or no

All this in the same 15-second test that benchmarks your CPU/GPU/RAM. Free, no install, no signup. Open methodology — every number traceable to a public benchmark source.

Common questions

"Can I run SDXL on my GTX 1060 6GB?" Technically yes with --lowvram, but expect 60-120s/image and frequent crashes. SD 1.5 is the better fit for a 1060 6GB.

"Should I buy a used RTX 3090 for SDXL?" Yes — best price/perf VRAM at $700-900. 24 GB lets you run massive ControlNet stacks, train LoRAs, batch-generate without OOM. Faster than a new RTX 4070 for SDXL despite being 4 years old.

"Is SDXL faster on Linux or Windows?" Linux is 5-10% faster on NVIDIA (better CUDA driver overhead) and notably faster on AMD (ROCm 6.0+ matures Linux first). For most users the difference isn't worth dual-booting.

"Will SDXL ever run faster on my hardware?" Yes, every 6 months. SDXL optimisations keep landing — TensorRT for NVIDIA cuts 30%, ROCm Composable Kernels for AMD cut 20%, LCM-LoRA cuts steps by 4-8×. A GPU bought today will be ~2× faster in 2 years on the same SDXL workload.

"What about Stable Diffusion 3 / Flux / SDXL Lightning?" Flux.1 (Black Forest Labs) is heavier than SDXL — needs 12 GB minimum, 16 GB+ comfortable, ~50% slower per image. SD3 Medium is similar to SDXL. SDXL Lightning is much faster (1-step) but lower quality. The hardware tiers above scale similarly: if your tier handles SDXL, it handles SD3 and most Flux quants. Flux fp16 is the exception — that's an RTX 4080+ workload.

Test your PC for SDXL — 15 seconds

9bench detects your GPU, checks FP16 / VRAM / WebGPU support, and predicts SDXL generation time on your specific hardware. Browser-only, no install, no signup. Calibrated against ComfyUI / Automatic1111 public benchmarks.

Test my PC for SDXL →

Test Your PC for Stable Diffusion XL in 15 Seconds (No Install)

The 15-second hardware test (in your browser)

Tier-by-tier SDXL performance (1024×1024, 30 steps + refiner)

Tier 1: Beast (under 6 seconds per image)

Tier 2: Workstation (6-15 seconds per image)

Tier 3: Mainstream (15-30 seconds per image)

Tier 4: Working (30-90 seconds per image)

Tier 5: Patient (90+ seconds, or skip SDXL for SD 1.5)

The 8 GB VRAM trap

Apple Silicon for SDXL: better than the reputation suggests

SDXL in the browser (yes, this works in 2026)

What 9bench specifically tells you about SDXL

Common questions

Test your PC for SDXL — 15 seconds

Test your hardware in 15 seconds

Frequently asked

The 15-second hardware test (in your browser)

Tier-by-tier SDXL performance (1024×1024, 30 steps + refiner)

Tier 1: Beast (under 6 seconds per image)

Tier 2: Workstation (6-15 seconds per image)

Tier 3: Mainstream (15-30 seconds per image)

Tier 4: Working (30-90 seconds per image)

Tier 5: Patient (90+ seconds, or skip SDXL for SD 1.5)

The 8 GB VRAM trap

Apple Silicon for SDXL: better than the reputation suggests

SDXL in the browser (yes, this works in 2026)

What 9bench specifically tells you about SDXL

Common questions

Test your PC for SDXL — 15 seconds

Test your hardware in 15 seconds

Frequently asked

Related articles