BaseKV Internals: Three Protocols, One File
How one BoltDB file serves Redis RESP, the Memcached text protocol, and a DynamoDB HTTP subset, all funneling through one write loop with protocol-isolated key namespaces.
This is part 5 of our series taking BaseKV apart from the inside. You are reading part 5. The earlier parts went deep on the storage layer: the disk-based Redis design bet, encoding Redis types into a B-tree, sorted sets on disk, and scanning over a live B-tree. Here we go up a layer, to the front door. BaseKV speaks three wire protocols at once: the Redis RESP protocol, the Memcached text protocol, and a subset of the DynamoDB HTTP API. They all land on the same BoltDB file. This part is about how that works, and about one honest detail in the README that is worth being precise about.
Three listeners, one process
When BaseKV boots it does not start three servers in the sense of three independent databases. It starts one Server that owns the BoltDB handle, and then wraps it in a MultiProtocolServer that attaches whatever protocol front-ends you asked for. The Redis server is always created first because it is the one that opens the file and constructs the *DB:
func NewMultiProtocolServerWithOptions(redisPort int, memcachedPort int, dataDir string, protocol Protocol, ...) (*MultiProtocolServer, error) {
// Create Redis server (always needed as it manages the DB)
redisServer, err := NewServerWithOptions(redisPort, dataDir, password, boltOptions)
if err != nil {
return nil, err
}
mps := &MultiProtocolServer{redisServer: redisServer, ...}
if protocol == ProtocolMemcached || protocol == ProtocolBoth {
mps.memcachedServer = NewMemcachedServer(redisServer.db, memcachedPort, redisServer.authManager, redisServer.rateLimiters)
}
// DynamoDB server (always enabled)
mps.dynamodbServer = NewDynamoDBServer(redisServer.db, httpPort, redisServer.authManager, redisServer.rateLimiters)
return mps, nil
}
The thing worth noticing is what gets passed to the Memcached and DynamoDB constructors: redisServer.db, redisServer.authManager, redisServer.rateLimiters. Not copies, not separate instances. The exact same *DB, the same auth manager, the same rate limiters. There is one storage engine and the protocol servers are thin adapters in front of it. Each one runs on its own port: Redis on 6380, Memcached on 11211, DynamoDB on 8000 by default, all from one binary.
Start() then launches each listener on its own goroutine and blocks on a stop channel. The three protocols are genuinely concurrent. The isolation that keeps them from corrupting each other is not in the listeners; it is in the storage layer and the namespacing, which we get to below.
Redis: RESP via redcon
The Redis side leans on redcon, Josh Baker's RESP server library. We hand it three callbacks and it handles the protocol framing, the connection lifecycle, and the goroutine-per-connection model for us:
s.ps = redcon.NewServer(addr, s.handleRedconCommand, s.handleRedconConnect, s.handleRedconClose)
When a command arrives, handleRedconCommand looks the verb up in a registry of command handlers, does an auth check, and dispatches. We did not write a RESP parser. That is the right call: RESP is fiddly, redcon is well tested, and the protocol is not where the interesting work is. The interesting work is everything behind the handler, which is the storage engine from parts 1 through 4.
Memcached: a hand-written text parser
The Memcached side is the opposite choice. There is no library; it is a raw TCP listener and a parser we wrote by hand, because the Memcached text protocol is small enough that a parser is a couple of switch statements:
func (ms *MemcachedServer) Start() error {
listener, err := net.Listen("tcp", ms.addr)
if err != nil {
return err
}
ms.listener = listener
for {
conn, err := listener.Accept()
if err != nil {
return err
}
go ms.handleConnection(conn)
}
}
Same goroutine-per-connection shape redcon gives us on the Redis side, just spelled out. Each connection gets a bufio.Scanner, and every non-empty line is a command. The parser splits on whitespace, upper-cases the verb, and dispatches GET, SET, ADD, REPLACE, DELETE, INCR, DECR, FLUSH_ALL, AUTH, and VERSION.
Here is the part that ties the protocols together. A Memcached SET does not implement storage itself. It rewrites the request into a Redis command struct and calls the same handler the RESP path calls:
func (ms *MemcachedServer) handleSet(parts []string, scanner *bufio.Scanner) string {
key := parts[1]
nsKey := NamespaceKey("memcached", key)
exptime, _ := strconv.Atoi(parts[3])
if !scanner.Scan() {
return "ERROR\r\n"
}
data := scanner.Text()
var cmd redcon.Command
if exptime > 0 {
cmd = redcon.Command{Args: [][]byte{[]byte("SET"), []byte(nsKey), []byte(data), []byte("EX"), []byte(strconv.Itoa(exptime))}}
} else {
cmd = redcon.Command{Args: [][]byte{[]byte("SET"), []byte(nsKey), []byte(data)}}
}
_, err := handleSet(ms.db, cmd)
...
return "STORED\r\n"
}
A Memcached SET key flags exptime bytes becomes a Redis SET key value EX exptime, expressed as a redcon.Command, fed straight into handleSet. The Memcached exptime reuses the Redis TTL machinery; FLUSH_ALL is a FLUSHDB; ADD and REPLACE are conditional sets done inside one db.Update so the existence check and the write are atomic. The Memcached server is a translation layer over the Redis command handlers, and through them over the one storage engine. We are honest about the simplifications: flags and the declared byte count are accepted and ignored, because the value lands in the same B-tree regardless.
DynamoDB: an HTTP subset dispatched on a header
The DynamoDB front-end is an http.Server with a single route. Every DynamoDB API call is a POST / whose operation lives in the X-Amz-Target header, so the whole router is one switch on that header:
target := r.Header.Get("X-Amz-Target")
switch target {
case "DynamoDB_20120810.PutItem":
s.handlePutItem(w, r)
case "DynamoDB_20120810.GetItem":
s.handleGetItem(w, r)
default:
http.Error(w, fmt.Sprintf("Unsupported operation: %s", target), http.StatusBadRequest)
}
We support a deliberate subset: PutItem and GetItem. That is enough for the simple item-CRUD workload that draws people to DynamoDB in the first place, and it lets a real AWS SDK point at BaseKV and have basic writes and reads work. We are not reimplementing query planning, secondary indexes, or conditional expressions. If you want the longer argument for why this subset is useful, the cheap DynamoDB alternative piece covers the pricing and scale story.
The shared core: one DB, one write loop, View for reads
All three front-ends resolve to the same *DB, and that is what makes coexistence safe rather than chaotic. Reads go through View, which is a plain read transaction over BoltDB:
func (db *DB) View(fn func(tx *Tx) error) error {
return db.bdb.View(func(btx *bbolt.Tx) error {
tx := &Tx{btx: btx, db: db}
return fn(tx)
})
}
Writes never touch BoltDB directly. Every write, no matter which protocol issued it, funnels through Update, which puts a writeRequest on a single channel and waits for a result:
func (db *DB) Update(fn func(tx *Tx) error) error {
...
req := &writeRequest{fn: fn, done: make(chan error, 1)}
db.writeMu.Lock()
db.writeCh <- req
db.writeMu.Unlock()
return <-req.done
}
A single goroutine drains writeCh and applies the requests one at a time. This is the write loop we cover in detail in part 6 on durability, the write loop, and TTL. The point for this part is the consequence: a Redis SET, a Memcached ADD, and a DynamoDB PutItem issued at the same moment from three different clients on three different ports do not race. They serialize onto the same write loop. There is one durability story, one fsync policy, one TTL scheduler, one compaction path, because there is one engine. That is the whole reason we did not build three databases.
Namespacing: the honest part
The README markets BaseKV as shared storage, one dataset, three protocols. That framing is mostly right and slightly too generous, and the precise version is more interesting than the marketing version.
Keys are prefixed per protocol before they hit the store. The prefixes live in namespace.go:
const (
RedisNamespace = "redis:"
MemcachedNamespace = "memcached:"
DynamoDBNamespace = "dynamodb:"
)
func NamespaceKey(protocol, key string) string {
switch protocol {
case "redis":
return RedisNamespace + key
case "memcached":
return MemcachedNamespace + key
case "dynamodb":
return DynamoDBNamespace + key
default:
return key
}
}
On the Redis path, namespaceRedisCommand rewrites the key arguments of each command in flight, prepending redis: to the key position for GET, SET, HSET, multi-key commands like MGET, and so on. On the Memcached path you saw NamespaceKey("memcached", key) in handleSet. On the DynamoDB path the composite key gets NamespaceKey("dynamodb", ...).
So what actually happens is this: all three protocols write into one BoltDB file, in one keyspace, but each protocol lives in its own prefix-delimited region of that keyspace. A Memcached client that does SET user 42 stores memcached:user. A Redis client that does SET user 99 stores redis:user. These do not collide, and they also do not read each other. A Redis GET user will never see the value a Memcached client wrote, because it is looking at redis:user and the Memcached value is at memcached:user.
The accurate way to describe BaseKV is therefore: one database file with protocol-isolated namespaces, not magic cross-protocol data sharing. The protocols coexist in one file with isolation. They do not, by default, read each other's keys. If you want them to share a value you would have to deliberately bridge the namespaces yourself, which BaseKV does not do for you. We think the isolation is the right default. A Redis client and a Memcached client both reaching for the obvious key name session should not silently clobber each other, and the prefix guarantees they cannot.
What you do share is everything underneath the keyspace: one file to back up, one TTL scheduler expiring keys regardless of which protocol set them, one compaction job, one auth manager, one set of rate limiters. The unification is real at the operational layer. It is the data-sharing-across-protocols claim that needs the asterisk.
How a DynamoDB item is stored
DynamoDB items are the most interesting mapping because an item is not a single value, it is a bag of typed attributes. BaseKV flattens an item into one namespaced composite key per attribute. PutItem walks the item, picks the first string attribute as the primary key, and writes each field under a table:<table>:<pk>:<field> key, all inside one db.Update so the whole item lands atomically:
err := s.db.Update(func(tx *Tx) error {
for field, value := range req.Item {
var strValue string
if value.S != nil {
strValue = *value.S
} else if value.N != nil {
strValue = *value.N
} else {
strValue = string(value.B)
}
fieldKey := fmt.Sprintf("table:%s:%s:%s", req.TableName, primaryKey, field)
namespacedKey := NamespaceKey("dynamodb", fieldKey)
if err := tx.Set([]byte(namespacedKey), []byte(strValue), 0); err != nil {
return err
}
}
return nil
})
GetItem is the inverse. It rebuilds the prefix dynamodb:table:<table>:<pk>: and iterates every key under it, stripping the prefix to recover the field name and collecting the attributes back into an item:
prefix := NamespaceKey("dynamodb", fmt.Sprintf("table:%s:%s:", req.TableName, primaryKey))
err := s.db.View(func(tx *Tx) error {
iter := tx.NewIterator(IteratorOptions{Prefix: []byte(prefix)})
defer iter.Close()
for iter.Valid() {
fieldName := strings.TrimPrefix(string(iter.Key()), prefix)
if fieldName != "" {
value := string(iter.Value())
item[fieldName] = AttributeValue{S: &value}
}
iter.Next()
}
return nil
})
This is where the prefix scan from part 4 earns its keep: reassembling an item is a single ordered range read over a contiguous run of keys, which is exactly what a B-tree is good at. Be aware of the trade-offs we are not hiding. The primary key is whatever the first string attribute happens to be, numbers and binary are read back out as strings, and on read every attribute comes back typed as S. This is a faithful enough PutItem/GetItem for small item-CRUD, not a drop-in for DynamoDB's full type system.
Why this is useful anyway
A skeptic could say BaseKV is three protocols glued to a key-value store, and the protocols cannot even see each other's data. Fair. Here is why the design still pays off.
It is one binary, one file, one process, one ops story. You run a single thing, you back up a single file, you reason about durability once. And you get to keep three mature client ecosystems. Your cache layer can keep talking the Memcached protocol with its existing client. Your application can use a Redis client for sessions and queues. A service that was written against the DynamoDB SDK can point at the HTTP port and keep working for the basic operations. None of them needs a rewrite, because BaseKV meets each one at its own wire protocol.
That also makes it a migration path. You can move a workload off managed Redis, off Memcached, or off DynamoDB one client at a time, pointing each at BaseKV without touching application code, and consolidate three backing services into one. The isolation that we were careful to be honest about is exactly what makes that safe: migrating your Memcached traffic cannot accidentally stomp on the data your Redis traffic already wrote. If you want the broader framing on where a persistent KV store fits relative to Redis, Key-Value vs Redis is the companion read.
Three protocols, one file. Isolated namespaces, shared engine. That is the trade we made, and being precise about it is more useful than the one-line pitch.
Continue the series: previous is part 4, scanning over a live B-tree; next is part 6, durability, the write loop, and TTL; the hub is part 1, the disk-based Redis design bet. Related: Key-Value vs Redis, A Cheap DynamoDB Alternative.