Embedding Model Comparison

The embedding model is the foundation of semantic search. Choosing the right model dramatically impacts retrieval quality.

What Embeddings Do

Embeddings convert text into dense vectors that capture semantic meaning:

from openai import OpenAI

client = OpenAI()

# Same meaning, different words → similar vectors
text1 = "The cat sat on the mat"
text2 = "A feline rested on the rug"

emb1 = client.embeddings.create(input=text1, model="text-embedding-3-small")
emb2 = client.embeddings.create(input=text2, model="text-embedding-3-small")

# Cosine similarity will be high (~0.85+)

Model Categories

Category	Examples	Best For
Commercial APIs	OpenAI, Cohere, Voyage	Production, ease of use
Open Source	BGE, E5, GTE	Privacy, cost control, customization
Domain-Specific	Legal-BERT, BioBERT	Specialized domains

Commercial API Models

OpenAI Embeddings

from openai import OpenAI

client = OpenAI()

def embed_openai(texts: list[str], model: str = "text-embedding-3-small"):
    response = client.embeddings.create(input=texts, model=model)
    return [item.embedding for item in response.data]

# Models available:
# text-embedding-3-small: 1536 dims, $0.02/1M tokens
# text-embedding-3-large: 3072 dims, $0.13/1M tokens
# text-embedding-ada-002: 1536 dims (legacy)

Cohere Embed

import cohere

co = cohere.Client()

def embed_cohere(texts: list[str], input_type: str = "search_document"):
    response = co.embed(
        texts=texts,
        model="embed-english-v3.0",
        input_type=input_type  # "search_document" or "search_query"
    )
    return response.embeddings

# Separate modes for documents vs queries improves retrieval

Voyage AI

Note: Voyage AI was acquired by MongoDB in February 2025.

import voyageai

vo = voyageai.Client()

def embed_voyage(texts: list[str]):
    response = vo.embed(
        texts,
        model="voyage-3-large",
        input_type="document"  # or "query"
    )
    return response.embeddings

# Known for excellent code and legal domain performance

Open Source Models

BGE (BAAI General Embedding)

from sentence_transformers import SentenceTransformer

# BGE models - excellent multilingual support
model = SentenceTransformer('BAAI/bge-large-en-v1.5')

def embed_bge(texts: list[str]):
    # BGE recommends adding instruction for queries
    return model.encode(texts, normalize_embeddings=True)

def embed_bge_query(query: str):
    # Prefix for queries improves retrieval
    instruction = "Represent this sentence for searching relevant passages: "
    return model.encode(instruction + query, normalize_embeddings=True)

E5 (EmbEddings from bidirEctional Encoder rEpresentations)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('intfloat/e5-large-v2')

def embed_e5(texts: list[str], is_query: bool = False):
    # E5 requires prefixes
    if is_query:
        texts = [f"query: {t}" for t in texts]
    else:
        texts = [f"passage: {t}" for t in texts]
    return model.encode(texts, normalize_embeddings=True)

MTEB Benchmark Comparison

The Massive Text Embedding Benchmark (MTEB) provides standardized comparisons:

Model	MTEB Score	Dimensions	Speed
voyage-3-large	68.28	1536	Fast (API)
text-embedding-3-large	64.59	3072	Fast (API)
bge-large-en-v1.5	64.23	1024	Medium
e5-large-v2	62.25	1024	Medium
text-embedding-3-small	62.26	1536	Fast (API)

(Scores as of mid-2025; check MTEB leaderboard for latest)

Note: MTEB scores vary by task type. Check retrieval-specific benchmarks for RAG.

Choosing the Right Model

START
  │
  ▼
Data must stay on-premise?
  │
  ├─ YES → Open source (BGE, E5)
  │
  ▼
Specialized domain (legal, medical, code)?
  │
  ├─ YES → Domain-specific or Voyage
  │
  ▼
Multilingual requirements?
  │
  ├─ YES → BGE-M3 or Cohere multilingual
  │
  ▼
Budget constrained?
  │
  ├─ YES → text-embedding-3-small or open source
  │
  ▼
Default → text-embedding-3-small (best cost/quality ratio)

Implementation Tips

class EmbeddingManager:
    """Manage embeddings with batching and caching."""

    def __init__(self, model_name: str, batch_size: int = 100):
        self.model_name = model_name
        self.batch_size = batch_size
        self.cache = {}

    def embed(self, texts: list[str]) -> list[list[float]]:
        # Check cache
        uncached = [t for t in texts if t not in self.cache]

        if uncached:
            # Batch for efficiency
            for i in range(0, len(uncached), self.batch_size):
                batch = uncached[i:i + self.batch_size]
                embeddings = self._embed_batch(batch)
                for text, emb in zip(batch, embeddings):
                    self.cache[text] = emb

        return [self.cache[t] for t in texts]

    def _embed_batch(self, texts: list[str]) -> list[list[float]]:
        # Implementation depends on model
        pass

Cost Tip: OpenAI's text-embedding-3-small offers the best cost-to-quality ratio for most use cases. Only upgrade to larger models if retrieval quality demonstrably improves on your data.

Next, let's explore vector database options for storing and searching embeddings. :::