First-time 9bench users often message me with the same observation: "My 16-core CPU only got 3.5× multi-core speedup. Geekbench shows 12×. Is your benchmark broken?"
No. The benchmark is fine. Your hardware is fine. The browser is the bottleneck. This article explains exactly why, with the math.
The three structural limits of browser benchmarking
Limit 1: Web Crypto API serialization (CPU multi-core)
9bench (and most browser benchmarks) use crypto.subtle.digest() for SHA-256
hashing. This is hardware-accelerated on modern CPUs (SHA-NI on Intel/AMD, native ARM
instructions on Apple Silicon and Snapdragon).
The catch: browser implementations of crypto.subtle are internally
serialized for security and consistency. When you call crypto.subtle.digest()
from 16 Web Workers simultaneously, the browser doesn't actually run them in parallel — it
queues them.
Why: the Web Crypto API was designed for security-critical operations where consistency matters more than throughput. Subtle quirks of parallel hardware crypto (race conditions in cache, power side-channel issues) are avoided by serializing.
Result: 9bench's multi-core SHA-256 measurement saturates at roughly the rate of 4-8 simultaneous executions, even on 16+ core CPUs. Real native multi-core SHA can be 12-16× single-core; browser caps at 3-5×.
Limit 2: Web Workers thread pool (CPU multi-core)
Even without Web Crypto serialization, Web Workers themselves have limits:
- Browser pool size: Chrome defaults to ~10-12 active worker threads regardless
of
navigator.hardwareConcurrency. You can spawn more, but they queue. - Process boundary overhead: Each Worker has its own JS context, V8 isolate, GC heap. Spinning up 16 of these takes 50-100ms — significant when each Worker only does 200ms of work.
- OS thread scheduling: Worker threads are normal OS threads, but the browser process has multiple roles (main, renderer, GPU, network). Workers compete for CPU time with all of these.
Result: Even pure JavaScript work (no Web Crypto) typically achieves 60-80% multi-core efficiency in browser, vs 90%+ in native code.
Limit 3: V8 / SpiderMonkey vectorization (RAM bandwidth)
9bench's RAM benchmark uses Float32Array operations on a 256MB working set: sequential read,
write, copy, and random-access patterns. In native C/C++, these compile to memcpy,
memset, and SIMD-vectorized loops that hit full memory bandwidth.
In V8 (Chrome's JS engine), Float32Array operations are fast but not always SIMD-vectorized. Specific issues:
- JIT inconsistency: V8's TurboFan compiler decides per-loop whether to vectorize. Sometimes it does, sometimes it doesn't.
- Bounds checking: Every TypedArray access is bounds-checked for security. Native memcpy skips this.
- GC awareness: Float32Arrays exist in JS heap, GC needs to track allocations. Adds bookkeeping.
- No explicit prefetching: Native code can prefetch upcoming memory; JS can't.
Result: Browser RAM measurements typically show 30-50% of native memcpy bandwidth. A DDR5-5600 system that achieves 45 GB/s in AIDA64 might show 12-20 GB/s in 9bench. Both are real numbers — the browser one is what your web apps actually achieve.
What about WebGPU? (GPU performance)
Good news: WebGPU is the closest browser API to native performance. WebGPU compute shaders achieve 85-95% of native Metal/Vulkan/DirectX performance in most workloads.
The remaining 5-15% gap comes from:
- Validation overhead (WebGPU is more permissive of garbage data than native APIs)
- Browser process boundary (compute shaders submitted via IPC to GPU process)
- Single-shader-language requirement (WGSL adds slight compilation cost vs native shader languages)
For 9bench's GPU GFLOPS measurement, expect ~85-95% of what Geekbench Compute reports natively. Cleaner relative comparison than CPU/RAM measurements.
The actual measurements (with real numbers)
Same machine — Ryzen 9 7950X (16 core, 32 thread) + RTX 4080 + 64GB DDR5-5600 — measured natively (Geekbench/Cinebench) vs 9bench (browser):
| Test | Native | 9bench (browser) | Browser efficiency |
|---|---|---|---|
| CPU single-core (SHA-256) | ~1.8M h/s | ~1.5M h/s | 83% |
| CPU multi-core (SHA-256) | ~22M h/s | ~6M h/s | 27% |
| GPU compute (matrix mul) | ~5500 GFLOPS | ~5000 GFLOPS | 91% |
| RAM read bandwidth | ~45 GB/s | ~14 GB/s | 31% |
| RAM random latency | ~75 ns | ~95 ns | 79% (latency, lower=better) |
This is consistent with the structural limits: GPU near-native (WebGPU mature), single-core CPU decent (no parallelism limits), multi-core CPU poor (Web Crypto serialization hits hard), RAM poor (V8 not vectorizing).
Why this is fine — actually fine
The point of a benchmark depends on what question you're asking:
Question 1: "What's the absolute peak this hardware can do?"
Use Geekbench 6, Cinebench 2024, or AIDA64. Browser benchmarks won't answer this.
Question 2: "How does my browser perform on this machine?"
Use 9bench, Speedometer 3.1, JetStream 3. They measure exactly what your web apps achieve.
Question 3: "Is machine A faster than machine B?"
Use any benchmark — relative comparison is reliable as long as you use the same one on both. 9bench excels here because it's instant + cross-platform.
Question 4: "Should I upgrade?"
Run 9bench. Bracket says "Office tier" or "Older"? Probably worth upgrading. "Mid-range" or above? Your hardware is fine; bottleneck is elsewhere (storage, RAM amount, software).
What 9bench does to be honest about limits
9bench is designed acknowledging these limits:
- Score formulas calibrated for browser-API output ranges, not native peaks. A 9bench score of 1500 represents a different absolute hardware level than a Geekbench 1500.
- Methodology page openly states all overhead expectations.
- Result snark adapts when RAM is the weakest component — instead of "upgrade your RAM," the message correctly notes "RAM scores low in all browser benchmarks, not necessarily your hardware."
- Multi-core efficiency reported as a separate metric — you can see exactly where the browser is capping your CPU.
The goal isn't to compete with Geekbench's accuracy. It's to give honest, relative numbers instantly + free + cross-platform.
Will this improve over time?
Slowly. Specifically:
- WebGPU: Already at 85-95% of native. Will probably reach 95-98% within 2-3 years as browsers optimize.
- Web Crypto: Unlikely to change much. Serialization is a security feature, not a bug. Future SHA-NI bypass paths might emerge but need careful security review.
- V8 / SpiderMonkey vectorization: Gradual improvement as TurboFan + Ion compilers improve. Maybe 50-70% native by 2028.
- Web Workers: Pool size limits unlikely to change for stability reasons. SharedArrayBuffer (already exists) lets workers share memory directly, which helps but doesn't fix Web Crypto serialization.
Realistic prediction: by 2028-2029, browser benchmarks will be 90-95% of native instead of today's 70-90%. Still not full parity, but closer.
What about Wasm SHA-256?
A frequently-suggested fix: use a WebAssembly SHA-256 implementation instead of Web Crypto. WebAssembly can run anything compiled to it without crypto.subtle's serialization.
In testing: a Wasm SHA-256 (using Rust's sha2 crate compiled to WebAssembly)
achieves ~70-80% of native multi-core throughput when run across 16 Workers.
Significantly better than Web Crypto's 25-30%.
Why 9bench currently uses Web Crypto anyway: it's hardware-accelerated (SHA-NI / ARM SHA), which Wasm can't access. So while Wasm parallelizes better, each individual hash is slower. The two effects partially cancel out.
A future 9bench version may add a Wasm-SHA mode for users who want maximum multi-core measurement. For now, the Web Crypto approach is honest about the platform's actual limits.
The honest closing
Browser benchmarks score lower than native because the web platform is intentionally constrained for security and stability. This is not a bug. Real web apps face exactly the same constraints — so what 9bench measures is what your web apps actually achieve.
If you want to know your hardware's absolute peak, use Geekbench. If you want to know what your browser actually does with your hardware, use 9bench. Both are valid answers to different questions.
The big mental shift: stop comparing browser benchmark scores to native ones. They're different measurement systems. Compare browser-to-browser, native-to-native. That's where the numbers are meaningful.