Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

Not only does it sound cool, but a vector database is a specialized type of database designed to store, index, and search high-dimensional vectors. These are numerical representations of data such as text, images, audio, or video. In the AI era, these vectors are usually embeddings generated by machine learning models, where each dimension captures a specific semantic feature. Instead of matching data via exact keywords (like SQL text search), vector databases allow semantic similarity search, finding items that are conceptually close, even if they don’t share exact words or identifiers.
Feature Relational DB Vector DB Query Type Exact match, filtering, joins Approximate nearest neighbor (ANN) search Indexing B-trees, hash indexes Graph-based (HNSW), quantization (IVF-PQ) Data Type Text, numbers, structured data Dense vectors (float32/float16) Best For Transactional and structured queries Similarity search across unstructured data
Vector databases are to AI what SQL databases were to business software in the 1980s — the foundational infrastructure that turns theory into production at scale.
As AI continues to integrate into enterprise search, recommendation, and decision-making systems, vector databases will underpin how machines “remember” and “understand” information.
Scenario
["red running shoes", "blue hiking boots", "black leather sandals", "green trail sneakers"]
We want a user to be able to search for scarlet jogging trainers and get the red running shoes result, even though no keywords match.
We use a pre-trained embedding model, e.g. sentence-transformers/all-MiniLM-L6-v2 (384-dimensional).
red running shoes [0.12, -0.03, 0.45, 0.67, -0.22, ...]
blue hiking boots [0.02, 0.48, 0.35, -0.41, 0.19, ...]
black leather sandals [-0.33, 0.12, 0.14, 0.55, 0.08, ...]
green trail sneakers [0.15, -0.01, 0.51, 0.60, -0.20, ...]
The embedding model transforms text into points in high-dimensional space where semantically similar items are close together.
We insert these vectors into a vector database like Milvus, Pinecone, or Weaviate.
from pymilvus import CollectionSchema, FieldSchema, DataType, Collection
id_field = FieldSchema(name="product_id", dtype=DataType.INT64, is_primary=True)
vector_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=384)
schema = CollectionSchema(fields=[id_field, vector_field], description="Product catalog")
collection = Collection("products", schema)We build an Approximate Nearest Neighbor (ANN) index — e.g., HNSW (Hierarchical Navigable Small World graph).
index_params = {
"metric_type": "cosine",
"index_type": "HNSW",
"params": {"M": 16, "efConstruction": 200}
}
collection.create_index("embedding", index_params)M: number of connections per node (graph density)
efConstruction: size of candidate list during indexing (accuracy vs. speed)
We embed the user query “scarlet jogging trainers”:
query_vec = [0.11, -0.05, 0.46, 0.68, -0.23, ...]
# We run ANN search:
search_params = {"metric_type": "cosine", "params": {"ef": 64}}
results = collection.search([query_vec], "embedding", search_params, limit=2)
The database computes cosine similarity:
Query vs. “red running shoes”: 0.94
Query vs. “green trail sneakers”: 0.88
Query vs. others: <0.60
Result: “red running shoes” is top-ranked.