← Ferrosa Suite home · Database home

Vector Indexes

Ferrosa runs approximate nearest-neighbour search over embeddings directly in the database. Choose the full-precision HNSW index for maximum recall, or the quantized HVQ index to read far fewer bytes per query when the index outgrows memory.

Beta: Vector indexing is under active development. HVQ (hybrid vector quantization) is a developer-preview path; the numbers below are reproducible from the in-tree evaluation harness.

On this page

Index strategies

A vector index answers "find the rows whose embedding is closest to this query vector". Ferrosa offers three strategies, all created through ordinary CQL DDL.

StrategyCQLBest for
HNSWUSING 'vector' (default)Highest recall. A navigable small-world graph; stores every vector in a sidecar.
IVFFlatengine internalk-means clustered lists; faster builds than HNSW.
HVQUSING 'vector' WITH OPTIONS = {'method':'hvq'}Near-HNSW recall while reading far fewer bytes per query — when the index is larger than memory or lives in object storage.
Beyond Cassandra: HVQ stores vectors as page-addressable quantized artifacts. A query routes to a few centroid lists and reads only the pages it needs, instead of materializing the whole index — the foundation for serving indexes that live in S3.

Quantization & staged rerank

HVQ compresses each vector with scalar quantization, trading a little precision for a large reduction in size and bytes moved. Multiple code widths are available:

CodecBits / dimRole
Q88Refinement tier — 1 byte per dimension.
Q44Candidate tier — 2 dimensions per byte.
Q22Coarse routing (behind a benchmark gate).
Q11Experimental ultra-low-bit.
F3232Optional exact-rerank tier for survivors.

Search is staged: cheap quantized codes narrow the candidate set, then an exact rerank over the survivors restores ranking quality. Because the reader only fetches the pages for the probed lists, the bytes it moves scale with the query, not the index.

CQL reference

Vector columns

-- A fixed-dimension float vector column
CREATE TABLE documents (
  id int PRIMARY KEY,
  embedding vector<float, 4>
);

Creating a vector index

-- Default: full-precision HNSW
CREATE INDEX docs_ann ON documents (embedding) USING 'vector';

-- Quantized HVQ — select the method explicitly
CREATE INDEX docs_ann ON documents (embedding)
  USING 'vector' WITH OPTIONS = {'method': 'hvq'};
Note: method accepts 'hnsw' (the default) or 'hvq'. Any other value is rejected at DDL time — there is no silent fallback.

Nearest-neighbour query

-- Return the 3 rows closest to the query vector
SELECT id, title FROM documents
  ORDER BY embedding ANN OF [0.90, 0.10, 0.00, 0.00] LIMIT 3;

Evaluation: HNSW vs HVQ

Measured by the in-tree harness ferrosa-index/tests/eval_comparison.rs on a shared clustered corpus of 192 vectors (16 dimensions, 18 queries, 4 of 12 lists probed), against exact brute-force truth. Reproduce with:

cargo test -p ferrosa-index --test eval_comparison -- --nocapture
IndexSize (bytes)Bytes read / queryp50p95recall@10
HNSW (full sidecar)48,51548,5152102 µs2166 µs1.000
HVQ (staged quantized IVF)45,08915,068609 µs614 µs1.000

On this corpus HVQ reads 3.2× fewer bytes per query and answers about 3.5× faster at p50/p95, with identical recall. The win comes from staged reads: HVQ fetches only the probed pages, while the HNSW path decodes the whole sidecar per query.

Honest caveats: this is a small single-artifact microbenchmark. The bytes-read advantage grows with corpus size and with multi-sidecar reads — the design target is ≥5× on larger corpora. The on-disk size is near parity here because the developer-preview staged format still retains full-precision vectors for exact rerank; the larger storage win comes from the production binary .qvec container with quantized-only tiers.

Runnable example

The Vector Indexes example is a complete, CI-executed walkthrough: it creates a BTree secondary index, an HNSW vector index, and an HVQ vector index, loads clustered embeddings, and runs the same ANN query against each — all from plain CQL.