pgvector HNSW Postgres 18 Production Tuning Tutorial 2026

May 16, 2026

#pgvector #postgres #postgresql #hnsw #vector-search #rag #embeddings #halfvec #iterative-scan #backend

pgvector HNSW Postgres 18 Production Tuning Tutorial 2026

TL;DR

A production HNSW index in pgvector 0.8.2 on PostgreSQL 18 is six SQL statements, three GUCs, and one quantization trick away — but the defaults are tuned for a demo, not your workload. This tutorial walks through a Dockerised setup, builds an HNSW index in parallel, sweeps ef_search against measured recall, halves storage with halfvec, and uses pgvector 0.8's iterative scans to keep recall high under selective WHERE filters. Every code block has been written against pgvector/pgvector:0.8.2-pg18; copy-paste them in order and the EXPLAIN plans will match the structure described in each step.

The default hnsw.ef_search is 40, the default m is 16, and the default ef_construction is 64. Those are the right starting point — they are also why most of the "pgvector is slow" posts are wrong: nobody touched them. By the end of this guide, you'll know exactly how to move each one for your dataset and what it costs in latency, memory, and recall.

What you'll learn

How to install pgvector 0.8.2 on PostgreSQL 18 with a pinned Docker image and why CVE-2026-3172 makes that version floor non-negotiable.
How HNSW's m, ef_construction, and ef_search parameters map to recall, build time, and query latency.
How to build an HNSW index in parallel with max_parallel_maintenance_workers and a properly sized maintenance_work_mem.
How to run a reproducible recall-vs-ef_search sweep and pick a value for your traffic.
How halfvec quantization halves index storage with negligible recall loss on 1536-dimensional OpenAI embeddings.
How hnsw.iterative_scan = 'relaxed_order' rescues filtered vector queries that would otherwise return empty result sets.
How to maintain HNSW indexes in production with REINDEX INDEX CONCURRENTLY and VACUUM.

Prerequisites

Docker Desktop 4.30+ (or Docker Engine 24+) for the Postgres container.
psql 16+ on the host (the client ships with most Postgres installs and Homebrew's libpq formula).
Roughly 4 GB of free RAM if you follow the parallel-build section.
About 15 minutes, and either an OpenAI API key for text-embedding-3-small (used here as the canonical 1536-dim example, ~$0.02 per 1M input tokens¹) or any other source of 1536-dimensional float vectors.

The pinned versions used throughout:

Component	Version	Notes
PostgreSQL	18.3	Out-of-cycle release dated 2026-02-26²
pgvector	0.8.2	Released 2026-02-26; fixes CVE-2026-3172³
Docker image	`pgvector/pgvector:0.8.2-pg18`	Multi-arch (amd64/arm64) on Docker Hub⁴

Step 1: Run PostgreSQL 18 + pgvector 0.8.2 in Docker

The official pgvector/pgvector image is a pre-built PostgreSQL base image with the extension already compiled inside. Use the exact 0.8.2-pg18 tag — pg18 alone tracks "latest" and will drift the day pgvector 0.8.3 ships.

Create docker-compose.yml in an empty directory:

services:
  db:
    image: pgvector/pgvector:0.8.2-pg18
    container_name: pgvector-tuning
    environment:
      POSTGRES_USER: pgv
      POSTGRES_PASSWORD: pgv
      POSTGRES_DB: vectors
    ports:
      - "5433:5432"
    shm_size: 2gb
    command: >
      postgres
        -c maintenance_work_mem=2GB
        -c max_parallel_maintenance_workers=4
        -c shared_buffers=1GB
        -c effective_cache_size=3GB
    volumes:
      - pgvector_data:/var/lib/postgresql/data

volumes:
  pgvector_data:

Three things worth flagging before you docker compose up:

Port 5433 avoids a clash with any local Postgres you already have on 5432.
shm_size: 2gb matches maintenance_work_mem. Parallel HNSW builds share memory through /dev/shm; without a matching shm_size the workers crash with cryptic OOM errors at the very end of an hours-long build.
max_parallel_maintenance_workers=4 raises the per-build cap from the Postgres default of 2. The actual ceiling is also bounded by max_worker_processes (default 8) and your vCPU count.

Start the container and enable the extension:

docker compose up -d
docker exec -it pgvector-tuning psql -U pgv -d vectors -c "CREATE EXTENSION IF NOT EXISTS vector;"
docker exec -it pgvector-tuning psql -U pgv -d vectors -c "SELECT extname, extversion FROM pg_extension WHERE extname='vector';"

Expected output:

 extname | extversion
---------+------------
 vector  | 0.8.2
(1 row)

If you see 0.7.x or 0.8.0/0.8.1, you are on a vulnerable build. CVE-2026-3172 is a CVSS 8.1 buffer overflow that triggers during parallel HNSW index builds and can leak data from unrelated relations or crash the server.³ The fix is the upgrade to 0.8.2; the documented temporary workaround if you cannot upgrade immediately is to disable parallel maintenance workers entirely (SET max_parallel_maintenance_workers = 0), which removes the trigger at the cost of much slower index builds. This tutorial uses parallel builds deliberately, so version 0.8.2 is the floor.

Step 2: Schema, embeddings, and the right operator class

For the rest of the tutorial we'll use a documents table modeled after a real RAG corpus: a primary key, a metadata column for filtered search, and a 1536-dimensional vector column matching OpenAI's text-embedding-3-small output.¹

CREATE TABLE documents (
  id          bigserial PRIMARY KEY,
  source      text NOT NULL,
  language    text NOT NULL,
  content     text NOT NULL,
  embedding   vector(1536) NOT NULL,
  created_at  timestamptz NOT NULL DEFAULT now()
);

For local benchmarking, generate a synthetic 100k-row dataset of random 1536-dimensional unit vectors. Real embeddings are not random, but uniformly distributed unit vectors are a fine stress test for index structure — they produce a worst-case "no clusters" graph that exposes tuning mistakes immediately:

INSERT INTO documents (source, language, content, embedding)
SELECT
  'synthetic-' || (gs % 10),
  CASE WHEN gs % 4 = 0 THEN 'ar' ELSE 'en' END,
  'doc ' || gs,
  (
    SELECT array_agg((random() - 0.5)::real)::vector
    FROM generate_series(1, 1536)
  )
FROM generate_series(1, 100000) AS gs;

That insert takes one to two minutes on a modern laptop and produces a few hundred megabytes of data — the embedding column dominates the table size.

Now the critical decision: which distance operator? pgvector exposes four for vector and four matching ones for halfvec:

Operator	Distance	Index op-class for `vector`	When to use
`<=>`	Cosine	`vector_cosine_ops`	Modern LLM/embedding APIs (OpenAI, Cohere, Voyage) — their outputs are normalized; cosine is the canonical choice.
`<->`	L2 (Euclidean)	`vector_l2_ops`	Physical sensor data, anywhere magnitude matters.
`<#>`	Negative inner product	`vector_ip_ops`	If your vectors are already L2-normalized, this is the fastest equivalent of cosine because the normalization step is skipped.
`<+>`	L1 (Manhattan)	`vector_l1_ops`	Rare; useful for sparse-feature dictionaries and a handful of clustering algorithms.

There is a subtle, expensive failure mode here: if the index op-class and the query operator don't agree, the planner silently falls back to a sequential scan. A million-row table that returns in single-digit milliseconds with a matching index can take tens of seconds with a mismatched one — the slowdown is dramatic enough that it usually surfaces as a production page rather than a code-review comment. Pick the operator before you build the index.

For this tutorial we'll use cosine, the right default for OpenAI-style embeddings.

Step 3: Build the HNSW index with parallel workers

The naive single-statement build does the right thing but blocks writes on the table for the duration of the build, which can be tens of minutes for a few million rows. In production, always use CREATE INDEX CONCURRENTLY — pgvector supports it for both HNSW and IVFFlat.

The default HNSW parameters from the pgvector README⁵ are:

Parameter	Default	What it controls
`m`	16	Maximum number of bidirectional connections per layer in the graph. Larger = higher recall, larger index, slower build.
`ef_construction`	64	Size of the dynamic candidate list during build. Larger = better-quality graph, slower build.
`hnsw.ef_search`	40	Query-time candidate list size. Larger = higher recall, higher query latency.

For 1536-dimensional embeddings the community has converged on m=16 (don't change it unless you have a recall problem after exhausting ef_search) and ef_construction=128–200 as a reasonable production starting point.

Build the index:

SET maintenance_work_mem = '2GB';
SET max_parallel_maintenance_workers = 4;

CREATE INDEX CONCURRENTLY documents_embedding_hnsw_idx
  ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 128);

CREATE INDEX CONCURRENTLY cannot run inside a transaction block, which means you can't open a psql BEGIN first. Run it directly. On the 100k-row synthetic dataset above, the build completes in low single-digit minutes on a modern laptop with max_parallel_maintenance_workers=4; raising the worker count further (and max_worker_processes to match) typically halves or better that time. Neon's parallel-build benchmark reports up to ~30× speedup at 1M vectors with 8 workers on a generously-sized maintenance_work_mem.⁶

Two failure modes to plan for:

maintenance_work_mem too small: once the in-memory graph outgrows the budget, pgvector emits a NOTICE: hnsw graph no longer fits into maintenance_work_mem and switches from in-memory to on-disk graph construction, which is dramatically slower. Watch pg_stat_progress_create_index.tuples_done for the per-second rate collapsing — that's the symptom. The fix is to raise maintenance_work_mem before retrying.
/dev/shm too small: parallel workers communicate through shared memory. On Docker, the default /dev/shm is 64 MB. Without a matching shm_size, the build aborts at the merge phase with could not resize shared memory segment. The docker-compose.yml above sets shm_size: 2gb, which matches maintenance_work_mem. Keep them in lockstep.

Verify the index exists and check its size:

SELECT
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid))
FROM pg_indexes
JOIN pg_class c ON c.relname = indexname
JOIN pg_index ON pg_index.indexrelid = c.oid
WHERE tablename = 'documents';

Expected for 100k rows at 1536 dims with m=16: the index lands in the high-hundreds-of-megabytes range — the bulk of the size is the embedding payload duplicated into HNSW graph nodes plus the per-node connection lists. Extrapolated to 1M vectors at 1536 dims, an HNSW index on vector typically lands near 8 GB.⁷ That number drives every memory decision you'll make from here — the index has to fit in shared_buffers (or at least in the OS page cache reflected by effective_cache_size) or queries fall off a cliff as soon as graph pages get evicted.

Step 4: Tune ef_search — the recall-vs-latency sweep

The single most useful tuning exercise in pgvector is sweeping hnsw.ef_search against recall at fixed m and ef_construction. It is also the one almost no tutorial actually walks through.

The mechanic: hnsw.ef_search controls how many candidate vectors the search keeps in its priority queue while traversing the graph. The default of 40 is conservative; you can raise it to 1000.⁵ Recall rises monotonically with ef_search; query latency rises faster than linearly because each candidate costs a distance computation against the query vector.

Pick a query vector, define "ground truth" with a brute-force sequential scan, and measure HNSW recall@K at each ef_search:

-- A reference query: take row 12345's embedding and find its 50 nearest neighbours.
\set query_id 12345

-- Ground truth: exact scan, no index.
SET enable_indexscan = off;
CREATE TEMP TABLE gt_top50 AS
SELECT id FROM documents
ORDER BY embedding <=> (SELECT embedding FROM documents WHERE id = :query_id)
LIMIT 50;
SET enable_indexscan = on;

-- Recall@50 at ef_search = 10.
SET LOCAL hnsw.ef_search = 10;
SELECT COUNT(*) AS hits FROM (
  SELECT id FROM documents
  ORDER BY embedding <=> (SELECT embedding FROM documents WHERE id = :query_id)
  LIMIT 50
) ann
WHERE id IN (SELECT id FROM gt_top50);

Repeat the second block with SET LOCAL hnsw.ef_search set to 20, 40, 80, 160, 320, and 640. Plot recall on one axis and EXPLAIN (ANALYZE, BUFFERS) execution time on the other. The shape you will see, on every dataset large enough for HNSW to be worth using, is the same: recall climbs sharply between ef_search=10 and the default of 40, climbs more slowly to ~95–99% by ef_search in the 80–160 range, and asymptotes after that. Latency rises faster than linearly with ef_search because each additional candidate is a fresh distance computation against the query vector.

The practical takeaway: there is almost always a "knee" somewhere in the 80–200 range where each extra point of recall costs disproportionately more latency. Pick the smallest ef_search value that hits your recall target on representative queries; that's your production setting. Real embeddings often have clustered structure that HNSW navigates well, which typically pushes recall up at moderate ef_search values — your knee may land lower than a synthetic random-vector test suggests.

To make the change cluster-wide (requires superuser), use ALTER SYSTEM plus a config reload — ALTER ROLE ... SET hnsw.ef_search and ALTER DATABASE ... SET hnsw.ef_search are both rejected with permission denied to set parameter 'hnsw.ef_search' because of how pgvector registers the GUC.⁸

ALTER SYSTEM SET hnsw.ef_search = 80;
SELECT pg_reload_conf();

Or — preferred for application code — set it per-query inside a transaction:

BEGIN;
SET LOCAL hnsw.ef_search = 160;
SELECT id, content
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;
COMMIT;

SET LOCAL is the right pattern for an application that has mixed traffic — a precise search endpoint with ef_search = 160 and a "more like this" sidebar with ef_search = 40 can coexist on the same connection pool without leaking state across requests.

Step 5: halfvec quantization — half the storage, equal recall

The single highest-ROI tuning move on most modern embedding workloads is switching the column type from vector to halfvec. halfvec stores each dimension in 16-bit IEEE half-precision instead of 32-bit float. Storage drops by half, the HNSW dimension limit doubles from 2000 to 4000, and on 1536-dim OpenAI-style embeddings the recall change is reported as near-identical to full precision.⁹

Add a halfvec column and a matching index:

ALTER TABLE documents ADD COLUMN embedding_h halfvec(1536);

UPDATE documents
SET embedding_h = embedding::halfvec(1536);

SET maintenance_work_mem = '2GB';
SET max_parallel_maintenance_workers = 4;

CREATE INDEX CONCURRENTLY documents_embedding_h_hnsw_idx
  ON documents
  USING hnsw (embedding_h halfvec_cosine_ops)
  WITH (m = 16, ef_construction = 128);

Two important details:

The op-class name is halfvec_cosine_ops, not vector_cosine_ops — using the wrong one against a halfvec column will fail at index creation rather than silently fall back, which is the friendly failure mode.
The cast ::halfvec(1536) happens row by row; on a small dataset it completes in seconds, but on millions of rows it can dominate the migration window. For larger datasets, batch the conversion (or do it inside a single transaction with a LIMIT/OFFSET loop) or apply it as part of the initial ingestion path so new rows arrive in halfvec form.

Re-measure storage:

SELECT
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid))
FROM pg_indexes
JOIN pg_class c ON c.relname = indexname
JOIN pg_index ON pg_index.indexrelid = c.oid
WHERE tablename = 'documents';

You should see the halfvec index come in meaningfully smaller than its vector counterpart. Jonathan Katz's reference figures place the gap at roughly 8 GB vs 3 GB for a 1M × 1536-dim corpus⁷ — about a 60% reduction on the index, driven by 50% smaller vector payloads inside the graph nodes plus better cache density once the index pages are smaller.¹⁰ If you're running multi-tenant on a single Postgres and storing tens of millions of embeddings, this is the difference between "needs a beefier instance" and "fits in shared_buffers."

If your embeddings come from a model that already outputs lower-precision values (e.g., some open-source Matryoshka-style models), you can go further to scalar (int8) or binary quantization with bit_hamming_ops — pgvector 0.7 introduced indexing on the bit type up to 64,000 dimensions.¹¹ For most modern dense-embedding workloads, halfvec is the sweet spot.

Step 6: Filtered vector search and iterative scans

The hardest pgvector problem to debug in production isn't latency — it's empty result sets when you add a WHERE clause to a vector query. Pre-0.8, the planner fetched the top-ef_search candidates from the HNSW index and then applied your filter on top, which meant a query like "find the 10 nearest Arabic-language documents" could return 0 or 2 rows even though hundreds of matching documents existed in the corpus.¹² pgvector 0.8.0 introduced iterative index scans specifically to fix this — the index keeps fetching candidates until the filter is satisfied or a tuple-scan budget is exhausted.¹³

Three modes, controlled by the hnsw.iterative_scan GUC (default off):

Mode	Semantics	When to use
`off` (default)	Original behaviour: take `ef_search` candidates, filter afterwards.	Backwards-compatible default; fine when filters are unselective (>50%).
`strict_order`	Iteratively expand the candidate set while preserving exact distance ordering.	When you must preserve the exact nearest-first order even under filtering.
`relaxed_order`	Iteratively expand with approximate ordering. Faster than `strict_order`.	The right default for most filtered RAG queries: a few near-equal candidates may swap positions, but recall is preserved.

The candidate ceiling is governed by hnsw.max_scan_tuples (default 20000).¹⁴ If your filter is so selective that 20k tuples doesn't surface enough matches, raise it — or, more usefully, add a B-tree index on the filter column so the planner can pick a different strategy.

Run the same nearest-neighbour query, restricted to Arabic-language documents:

-- Without iterative scans (default).
SET hnsw.iterative_scan = 'off';
EXPLAIN ANALYZE
SELECT id, source
FROM documents
WHERE language = 'ar'
ORDER BY embedding_h <=> (SELECT embedding_h FROM documents WHERE id = 12345)
LIMIT 10;

On our synthetic dataset, 25% of rows are language = 'ar', so the default behaviour returns ten results — but on a corpus where only 1% match the filter, you would typically get 2 or 3. Switch on iterative scans:

SET hnsw.iterative_scan = 'relaxed_order';
SET hnsw.max_scan_tuples = 20000;
EXPLAIN ANALYZE
SELECT id, source
FROM documents
WHERE language = 'ar'
ORDER BY embedding_h <=> (SELECT embedding_h FROM documents WHERE id = 12345)
LIMIT 10;

The EXPLAIN output now shows additional buffer reads (the index is being walked further) and consistently 10 rows returned. The cost is variable: filters that succeed cheaply add a small overhead; filters that require deep traversal cost meaningfully more. Measure on representative queries before flipping the global default.

The honest production advice: enable relaxed_order at the session level for the endpoints that perform filtered vector search, leave off for the unfiltered "more like this" endpoints, and reach for strict_order only when downstream code is sensitive to exact ranking.

Step 7: Maintaining HNSW in production

An HNSW index that performed well on day one can be meaningfully slower three months later, and the failure pattern is almost always the same: heavy updates and deletes have left the graph bloated, and the working set has spilled out of shared_buffers. The fix is mechanical:

REINDEX INDEX CONCURRENTLY documents_embedding_h_hnsw_idx;
VACUUM (ANALYZE) documents;

REINDEX INDEX CONCURRENTLY builds a new index alongside the old one and atomically swaps them, so concurrent queries are unaffected. It needs roughly the same maintenance_work_mem as the original build; if you tuned that GUC up for the initial build and then reverted, bump it back before reindexing.

Schedule the reindex during a known low-traffic window. Weekly is a common cadence; rate-limit it by index size so 50 GB tables don't all rebuild on the same Saturday. Pair it with VACUUM (ANALYZE) so the planner's row estimates stay accurate — pgvector 0.8.0 specifically improved how the planner picks between HNSW and B-tree indexes during filtered queries, and stale pg_class.reltuples will sabotage that.¹³

Verification

If you have followed every step, the following queries should all succeed and produce the indicated output.

-- 1. Extension version is 0.8.2 (CVE-2026-3172 fixed).
SELECT extversion FROM pg_extension WHERE extname = 'vector';
-- Expected: 0.8.2

-- 2. Both indexes are HNSW with the right op-class.
SELECT
  indexname,
  indexdef
FROM pg_indexes
WHERE tablename = 'documents';
-- Expected: documents_embedding_hnsw_idx uses vector_cosine_ops;
--           documents_embedding_h_hnsw_idx uses halfvec_cosine_ops.

-- 3. The halfvec index is roughly half the size of the vector index.
SELECT
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid))
FROM pg_indexes
JOIN pg_class c ON c.relname = indexname
JOIN pg_index ON pg_index.indexrelid = c.oid
WHERE tablename = 'documents';

-- 4. The planner picks the HNSW index for an unfiltered ORDER BY <=>.
EXPLAIN
SELECT id FROM documents
ORDER BY embedding_h <=> (SELECT embedding_h FROM documents WHERE id = 1)
LIMIT 10;
-- Expected: an "Index Scan using documents_embedding_h_hnsw_idx" node.

-- 5. Iterative scans are off by default in a fresh session.
SHOW hnsw.iterative_scan;
-- Expected: off

If any of those produce different output, the most likely culprits are: a stale Docker image (re-pull pgvector/pgvector:0.8.2-pg18), a maintenance_work_mem that was too small (the index still exists but is suboptimal — REINDEX it), or a query operator that doesn't match the index op-class (the EXPLAIN will say Seq Scan).

Troubleshooting

could not resize shared memory segment during CREATE INDEX CONCURRENTLY. Cause: Docker's /dev/shm is smaller than maintenance_work_mem. Fix: raise shm_size in docker-compose.yml to at least match the GUC, then restart the container.

ERROR: column cannot have more than 2000 dimensions for hnsw index when running CREATE INDEX. Cause: pgvector caps HNSW (and IVFFlat) on the vector type at 2000 dimensions because every index tuple has to fit inside PostgreSQL's 8 KB page. Fix: switch the column to halfvec, which doubles the limit to 4000 dimensions at half the bytes per dim; for the rare model that outputs >4000 dims, indexing on the bit type (binary quantization) extends the limit to 64,000.

Queries do an EXPLAIN ANALYZE Seq Scan despite the index existing. Cause: the query operator doesn't match the index op-class. A vector_cosine_ops index requires the <=> operator in the ORDER BY. Fix: change either the operator in the query or the op-class in the index — never assume the planner will pick "close enough."

Filtered queries return fewer rows than LIMIT. Cause: the default hnsw.iterative_scan = off plus a selective filter. Fix: SET hnsw.iterative_scan = 'relaxed_order' at the session level (or ALTER SYSTEM SET hnsw.iterative_scan = 'relaxed_order'; SELECT pg_reload_conf(); cluster-wide — pgvector's HNSW GUCs reject the role-level ALTER ROLE … SET form⁸), and raise hnsw.max_scan_tuples if you still see truncated results.

extversion shows 0.8.1 or earlier. Cause: a hosted Postgres (RDS, Aurora, Cloud SQL, Supabase) that hasn't promoted 0.8.2 yet, or a stale local image. Fix on hosted: check the provider's "available extensions" list; most rolled out 0.8.2 within weeks of the 2026-02-26 release. Fix locally: docker compose pull && docker compose up -d.

HNSW build hangs for hours on a few million rows. Cause: maintenance_work_mem is too small to hold the in-progress graph, so the build is paging to disk. Fix: bump maintenance_work_mem to at least 2× the eventual index size you measured at 100k rows (interpolated); if you can't fit, fall back to building the index outside the hot path on a logical replica and promoting it via pg_createsubscriber.¹⁵

When to pick something else

pgvector with HNSW is the right default for any workload where (a) your data already lives in Postgres or you want one operational system instead of two, (b) your vector count is at or below the low tens of millions, and (c) recall is the metric your product cares about. For a thoughtful comparison of the embedding models that feed into this pipeline, see our embedding models compared post.

If you're sitting on hundreds of millions of vectors with strict p99 latency targets in the single-millisecond range, a purpose-built ANN engine (Qdrant, Milvus, Vespa) will give you a tighter speed-recall curve at the cost of running a second database. If you're under 10 million vectors and value simplicity over the last 10% of performance, IVFFlat in pgvector is faster to build and uses less memory, at the cost of more tuning to keep recall up — pick HNSW unless you have a reason not to.

Next steps and further reading

The companion post Zero-downtime Postgres 18 upgrade with pg_createsubscriber covers migrating from Postgres 17 to 18 in case you're not already on the target version.
For connection-level concerns once pgvector is in production, Production Postgres pooling with PgBouncer and Supavisor walks through the pooler choices that interact with prepared-statement-heavy vector queries.
If you also need real-time invalidation when embeddings change, Postgres LISTEN/NOTIFY for realtime presence shows the channel pattern that complements vector search nicely.
The pgvector README⁵ is short, dense, and authoritative — re-read it after working through this tutorial; everything will land differently.

OpenAI text-embedding-3-small product page — 1536-dimensional output, $0.02 per 1M input tokens (Standard) / $0.01 (Batch), as of May 2026. https://openai.com/index/new-embedding-models-and-api-updates/ ↩ ↩²
PostgreSQL 18.3 out-of-cycle release, February 26 2026 — addressed regressions introduced in the February 12, 2026 update for PostgreSQL 18.2 and earlier. https://www.postgresql.org/about/news/out-of-cycle-release-scheduled-for-february-26-2026-3241/ ↩
pgvector 0.8.2 release announcement (2026-02-26) — fixes CVE-2026-3172, a CVSS 8.1 buffer overflow with parallel HNSW index builds; affects pgvector 0.6.0 through 0.8.1. https://www.postgresql.org/about/news/pgvector-082-released-3245/ ; NVD entry: https://nvd.nist.gov/vuln/detail/CVE-2026-3172 ↩ ↩²
pgvector Docker Hub — pgvector/pgvector:0.8.2-pg18 (multi-arch, last pushed February 2026). https://hub.docker.com/r/pgvector/pgvector/tags ↩
pgvector README — canonical reference for m (default 16), ef_construction (default 64), hnsw.ef_search (default 40, max 1000), supported operators (<=>, <->, <#>, <+>), and op-classes (vector_cosine_ops, vector_l2_ops, vector_ip_ops, vector_l1_ops, plus matching halfvec_* and bit_*). https://github.com/pgvector/pgvector/blob/master/README.md ↩ ↩² ↩³
Neon — "pgvector: 30× faster index build for your vector embeddings" — benchmark of parallel HNSW builds at 1M vectors × 1536 dims with max_parallel_maintenance_workers=8. https://neon.com/blog/pgvector-30x-faster-index-build-for-your-vector-embeddings ↩
Jonathan Katz — "Scalar and binary quantization for pgvector" — reports approximately 8 GB for a 1M × 1536-dim HNSW index built on vector, dropping to ~3 GB with halfvec. https://jkatz05.com/post/postgres/pgvector-scalar-binary-quantization/ ↩ ↩²
pgvector issue #726, "options to specify hnsw.ef_search" — documents that ALTER ROLE … SET hnsw.ef_search and ALTER DATABASE … SET hnsw.ef_search are rejected with "permission denied to set parameter," and that ALTER SYSTEM SET plus pg_reload_conf() is the supported cluster-wide path. https://github.com/pgvector/pgvector/issues/726 ↩ ↩²
Supabase — "What's new in pgvector 0.7.0" — halfvec reduces vector and index storage space by half with negligible recall change at typical embedding dimensions. https://supabase.com/blog/pgvector-0-7-0 ↩
AWS Database Blog — "Load vector embeddings up to 67× faster with pgvector and Amazon Aurora" — 67× HNSW build speedup with binary quantization on Aurora vs pgvector 0.5.1, and the 50%-storage figure for halfvec. https://aws.amazon.com/blogs/database/load-vector-embeddings-up-to-67x-faster-with-pgvector-and-amazon-aurora/ ↩
pgvector 0.7.0 release announcement — added halfvec, sparsevec, scalar/binary quantization helpers, and indexing on the bit type up to 64,000 dimensions. https://www.postgresql.org/about/news/pgvector-070-released-2852/ ↩
pgvector 0.8.0 release notes — describes the pre-0.8 filter-recall failure where ef_search=40 and a 10%-selective filter typically yields only ~4 surviving rows. https://www.postgresql.org/about/news/pgvector-080-released-2952/ ↩
pgvector 0.8.0 — added iterative index scans (hnsw.iterative_scan / ivfflat.iterative_scan), improved cost estimation for HNSW vs B-tree planning under filters, and dropped support for PostgreSQL 12 (minimum is now PostgreSQL 13). https://www.postgresql.org/about/news/pgvector-080-released-2952/ ↩ ↩²
hnsw.max_scan_tuples — pgvector 0.8 introduced this GUC to bound the iterative-scan budget; default 20000. https://docs.pgedge.com/pgvector/v0-8-0/iterative-index-scans/ ↩
PostgreSQL 18 pg_createsubscriber — bootstraps a logical replica from a physical replica, useful for zero-downtime upgrades and for offloading expensive index builds. https://www.postgresql.org/docs/18/app-pgcreatesubscriber.html ↩