Ferrosa runs approximate nearest-neighbour search over embeddings directly in the database. Choose the full-precision HNSW index for maximum recall, or the quantized HVQ index to read far fewer bytes per query when the index outgrows memory.
A vector index answers "find the rows whose embedding is closest to this query vector". Ferrosa offers three strategies, all created through ordinary CQL DDL.
| Strategy | CQL | Best for |
|---|---|---|
| HNSW | USING 'vector' (default) | Highest recall. A navigable small-world graph; stores every vector in a sidecar. |
| IVFFlat | engine internal | k-means clustered lists; faster builds than HNSW. |
| HVQ | USING 'vector' WITH OPTIONS = {'method':'hvq'} | Near-HNSW recall while reading far fewer bytes per query — when the index is larger than memory or lives in object storage. |
HVQ compresses each vector with scalar quantization, trading a little precision for a large reduction in size and bytes moved. Multiple code widths are available:
| Codec | Bits / dim | Role |
|---|---|---|
| Q8 | 8 | Refinement tier — 1 byte per dimension. |
| Q4 | 4 | Candidate tier — 2 dimensions per byte. |
| Q2 | 2 | Coarse routing (behind a benchmark gate). |
| Q1 | 1 | Experimental ultra-low-bit. |
| F32 | 32 | Optional exact-rerank tier for survivors. |
Search is staged: cheap quantized codes narrow the candidate set, then an exact rerank over the survivors restores ranking quality. Because the reader only fetches the pages for the probed lists, the bytes it moves scale with the query, not the index.
-- A fixed-dimension float vector column CREATE TABLE documents ( id int PRIMARY KEY, embedding vector<float, 4> );
-- Default: full-precision HNSW CREATE INDEX docs_ann ON documents (embedding) USING 'vector'; -- Quantized HVQ — select the method explicitly CREATE INDEX docs_ann ON documents (embedding) USING 'vector' WITH OPTIONS = {'method': 'hvq'};
method accepts 'hnsw' (the default) or 'hvq'. Any other value is rejected at DDL time — there is no silent fallback.
-- Return the 3 rows closest to the query vector SELECT id, title FROM documents ORDER BY embedding ANN OF [0.90, 0.10, 0.00, 0.00] LIMIT 3;
Measured by the in-tree harness ferrosa-index/tests/eval_comparison.rs on a shared clustered corpus of 192 vectors (16 dimensions, 18 queries, 4 of 12 lists probed), against exact brute-force truth. Reproduce with:
cargo test -p ferrosa-index --test eval_comparison -- --nocapture
| Index | Size (bytes) | Bytes read / query | p50 | p95 | recall@10 |
|---|---|---|---|---|---|
| HNSW (full sidecar) | 48,515 | 48,515 | 2102 µs | 2166 µs | 1.000 |
| HVQ (staged quantized IVF) | 45,089 | 15,068 | 609 µs | 614 µs | 1.000 |
On this corpus HVQ reads 3.2× fewer bytes per query and answers about 3.5× faster at p50/p95, with identical recall. The win comes from staged reads: HVQ fetches only the probed pages, while the HNSW path decodes the whole sidecar per query.
.qvec container with quantized-only tiers.
The Vector Indexes example is a complete, CI-executed walkthrough: it creates a BTree secondary index, an HNSW vector index, and an HVQ vector index, loads clustered embeddings, and runs the same ANN query against each — all from plain CQL.