Single-Threaded vs Multi-Threaded Key-Value Stores: Redis, Valkey, Dragonfly, KeyDB (2026)
Redis executes commands on one thread on purpose; DragonflyDB and KeyDB use every core. What single-threaded design buys (free atomicity, predictable latency), how shared-nothing differs from locked multi-threading, and why thread count is the wrong axis for a disk-bound store.
Redis built its reputation on a design choice that sounds like a mistake: it executes commands on a single thread. One core, one command at a time, no matter how many cores your server has. For most of Redis's life this was a feature, not a limitation. But in 2026 the most-cited reason to leave Redis is no longer the license change, it is throughput, and the challengers all make the same pitch: we use every core, Redis uses one. DragonflyDB claims 4.5x higher throughput than Valkey on GCP. KeyDB, the Snapchat-maintained fork, was built specifically to multi-thread Redis. So is single-threaded design obsolete? Not quite. It depends entirely on where your bottleneck actually is.
Why Redis chose one thread on purpose
A single-threaded command loop is not a relic, it is a simplification that buys three concrete things.
Atomicity for free. Because only one command runs at a time, every Redis command is atomic without any locks. INCR, LPUSH, SETNX, the whole command surface, can never interleave with another command mid-operation. This is why Redis primitives are so trusted for distributed locks, counters, and rate limiters: the engine gives you a guarantee that multi-threaded systems have to work for.
No locking overhead and no contention. Multi-threaded data structures need mutexes, atomics, or lock-free algorithms, and under contention those cost real time and real engineering. A single thread never contends with itself, so on a per-operation basis it is often faster than a multi-threaded engine that is busy coordinating.
Predictability. One thread means one clear story for latency. There is no thread-scheduling jitter, no lock convoy, no surprise tail latency from a contended structure. For a cache where p99 matters more than peak throughput, that calm is worth a lot.
It is worth being precise: modern Redis is not entirely single-threaded. Since Redis 6 it can use extra threads for network I/O (reading and writing socket buffers), and it always used background threads for slow housekeeping like freeing large objects. What stays single-threaded is command execution, the part that touches your data. That is the part the challengers parallelize.
How the multi-threaded challengers do it
There are two genuinely different ways to use more cores, and they are not equally clean.
KeyDB takes the direct route: multiple threads run the Redis command loop concurrently over a shared dataset, protected by locking. It is protocol-compatible and it works, but a shared keyspace guarded by locks reintroduces exactly the contention single-threaded Redis avoided, and the gains flatten as threads fight over hot keys.
DragonflyDB takes the more interesting route: shared-nothing. The keyspace is partitioned into slices, and each thread owns a slice outright. A thread only ever touches its own data, so there are no locks on the data path at all, and the architecture scales close to linearly with cores. This is the same shared-nothing idea behind ScyllaDB, and it is why Dragonfly can post numbers like 1.1M QPS on multi-core hardware. The catch is that operations spanning multiple slices (multi-key transactions, some atomic multi-key commands) need cross-thread coordination, which is more complex than a single thread's trivial atomicity, and the headline benchmarks are vendor-run on hardware chosen to show the architecture at its best. Treat the 4.5x as "real on the right workload," not "what you will see on yours."
The question the benchmarks dodge: where is your bottleneck?
A multi-threaded engine only helps if CPU is your limit. For a pure in-memory store on a many-core box serving a firehose of small operations, it often is, and shared-nothing is a legitimately better design there. But three common situations make the thread count nearly irrelevant:
- You are network-bound, not CPU-bound. Plenty of real deployments saturate the NIC or hit client-side latency long before a single Redis core maxes out. Extra command threads do nothing for a network ceiling.
- You scale horizontally already. The classic answer to single-threaded throughput is to run more shards, one per core, via Redis Cluster. That recovers multi-core throughput while keeping each shard's simple atomic model. It is more moving parts, but it is a solved pattern.
- You are disk-bound. This is the big one for the disk-first category. If your store persists to disk and your working set does not fit in RAM, the bottleneck is storage I/O, not how many cores chew on commands. Adding command threads in front of a disk that is already the limit buys nothing.
That last point is why thread count is the wrong axis for choosing a disk-backed store. BaseKV is single-writer with many concurrent lock-free readers, inherited from its bbolt B+tree and its copy-on-write design: readers never block and never take locks because they read a consistent snapshot, while writes serialize through one writer. For a durable store whose ceiling is disk throughput and whose workload is read-heavy, that model extracts the concurrency that actually matters (read concurrency) without paying for write-side locking it would not benefit from. More command threads would be answering a question the workload is not asking.
A short decision guide
Choose a single-threaded in-memory store (Redis, Valkey) when you want maximum compatibility, free atomic primitives, predictable latency, and you are fine scaling out with shards if you outgrow one core. This is most caching and most lock/counter/rate-limit work.
Choose a multi-threaded in-memory store (DragonflyDB for shared-nothing, KeyDB for locked) when you have measured that a single core's command throughput is your actual ceiling, you want to scale up on one big box rather than out across shards, and your workload is dominated by independent single-key operations that partition cleanly.
Choose a disk-first store (BaseKV and similar) when the real constraint is RAM cost or durability rather than CPU, in which case the entire single-versus-multi-thread debate is downstream of a bottleneck you do not have. The key-value store vs Redis decision tree walks through picking the category before picking the implementation, which is almost always the higher-leverage choice.
The single-threaded design is not obsolete. It is optimal for a specific and very common shape of workload, and the multi-threaded challengers are optimal for a different one. The mistake is letting a benchmark headline pick for a bottleneck you have not measured.
Related: LSM-Tree vs B-Tree, Read, Write, and Space Amplification, Key-Value Store vs Redis in 2026, Distributed Locks with a Key-Value Store.