Embeddings · Innotalent

Embeddings are the substrate underneath semantic search, recommendation and the retrieval step in retrieval-augmented generation. A model reads a chunk of content and emits a list of numbers; a vector database stores those numbers and finds the nearest neighbours to a query at speed. Two pieces of text that mean the same thing end up close together even when they share no keywords, which is what makes the technique feel like magic on a first demo.

The numbers themselves carry no meaning outside the model that produced them. An embedding from one foundation model is not comparable to one from another, and embeddings from an older version of the same model are not directly comparable to a newer version either. That has practical consequences once a system is in production.

The honest take is that an embedding is only as good as the model that produced it, and switching models means re-embedding everything you have stored. On a small corpus this is a weekend job. On a corpus with millions of documents it is a real migration with cost, downtime and re-indexing implications, so the choice of embedding model is a decision worth making with the same care as the choice of LLM on top.

Need a team that ships on your clock?