← Ferrosa Suite home · Database home

Vector Indexes

Ferrosa runs approximate nearest-neighbour search over embeddings directly in the database. Choose the full-precision HNSW index for maximum recall, or the quantized HVQ index to read far fewer bytes per query when the index outgrows memory.

Beta: Vector indexing is under active development. HVQ (hybrid vector quantization) is a developer-preview path; the numbers below are reproducible from the in-tree evaluation harness.

Index strategies

A vector index answers "find the rows whose embedding is closest to this query vector". Ferrosa offers three strategies, all created through ordinary CQL DDL.

Strategy	CQL	Best for
HNSW	`USING 'vector'` (default)	Highest recall. A navigable small-world graph; stores every vector in a sidecar.
IVFFlat	engine internal	k-means clustered lists; faster builds than HNSW.
HVQ	`USING 'vector' WITH OPTIONS = {'method':'hvq'}`	Near-HNSW recall while reading far fewer bytes per query — when the index is larger than memory or lives in object storage.

Beyond Cassandra: HVQ stores vectors as page-addressable quantized artifacts. A query routes to a few centroid lists and reads only the pages it needs, instead of materializing the whole index — the foundation for serving indexes that live in S3.

Quantization & staged rerank

HVQ compresses each vector with scalar quantization, trading a little precision for a large reduction in size and bytes moved. Multiple code widths are available:

Codec	Bits / dim	Role
Q8	8	Refinement tier — 1 byte per dimension.
Q4	4	Candidate tier — 2 dimensions per byte.
Q2	2	Coarse routing (behind a benchmark gate).
Q1	1	Experimental ultra-low-bit.
F32	32	Optional exact-rerank tier for survivors.

Search is staged: cheap quantized codes narrow the candidate set, then an exact rerank over the survivors restores ranking quality. Because the reader only fetches the pages for the probed lists, the bytes it moves scale with the query, not the index.

CQL reference

Vector columns

-- A fixed-dimension float vector column
CREATE TABLE documents (
  id int PRIMARY KEY,
  embedding vector<float, 4>
);

Creating a vector index

-- Default: full-precision HNSW
CREATE INDEX docs_ann ON documents (embedding) USING 'vector';

-- Quantized HVQ — select the method explicitly
CREATE INDEX docs_ann ON documents (embedding)
  USING 'vector' WITH OPTIONS = {'method': 'hvq'};

Note: method accepts 'hnsw' (the default) or 'hvq'. Any other value is rejected at DDL time — there is no silent fallback.

Nearest-neighbour query

-- Return the 3 rows closest to the query vector
SELECT id, title FROM documents
  ORDER BY embedding ANN OF [0.90, 0.10, 0.00, 0.00] LIMIT 3;

Evaluation: HNSW vs HVQ

Measured by the in-tree harness ferrosa-index/tests/eval_comparison.rs on a shared clustered corpus of 192 vectors (16 dimensions, 18 queries, 4 of 12 lists probed), against exact brute-force truth. Reproduce with:

cargo test -p ferrosa-index --test eval_comparison -- --nocapture

Index	Size (bytes)	Bytes read / query	p50	p95	recall@10
HNSW (full sidecar)	48,515	48,515	2102 µs	2166 µs	1.000
HVQ (staged quantized IVF)	45,089	15,068	609 µs	614 µs	1.000

On this corpus HVQ reads 3.2× fewer bytes per query and answers about 3.5× faster at p50/p95, with identical recall. The win comes from staged reads: HVQ fetches only the probed pages, while the HNSW path decodes the whole sidecar per query.

Honest caveats: this is a small single-artifact microbenchmark. The bytes-read advantage grows with corpus size and with multi-sidecar reads — the design target is ≥5× on larger corpora. The on-disk size is near parity here because the developer-preview staged format still retains full-precision vectors for exact rerank; the larger storage win comes from the production binary .qvec container with quantized-only tiers.

Runnable example

The Vector Indexes example is a complete, CI-executed walkthrough: it creates a BTree secondary index, an HNSW vector index, and an HVQ vector index, loads clustered embeddings, and runs the same ANN query against each — all from plain CQL.