The Catalog Wars

Unity, Polaris, Gravitino, Nessie. The catalog wars nobody outside the industry knows about, and why the strategic asset is the metadata, not the table.

Backpressure

Backpressure is the polite word for 'we're overwhelmed.' What the four strategies are, why Kafka doesn't really do it, and what works in production.

What’s a Reranker?

Everyone talks about the embedding model. Almost nobody talks about the reranker. In any retrieval system that gets to acceptable quality, the reranker is doing more of the work.

Semantic Models

The word semantic is rather heavily used (often incorrectly) when discussing data models. The word semantic itself is an adjective relating to meaning in language or logic. When we think…

Embeddings = Lossy Compression

An embedding isn't meaning. It's a lossy compression with a similarity-preserving objective. Once you see it that way, every weird vector search behaviour makes sense.