Search, vectors, graph, SPARQL, transactions, and real-time CDC on the same tables — not separate stores. CQL compatible. Cypher + Bolt v5. SPARQL 1.1. Strict-serializable transaction work via Accord, CQL compatibility, and S3-backed storage are under active development. Local NVMe acts as cache while object storage is the durable layer. Start on a single node, then evaluate clustering and operational behavior with your workload before production.
The Problem
In many Cassandra-style deployments, snapshots, cross-region replicas, and backup retention can become material cost centers. Exact numbers depend on retention policy, region, cloud provider, request volume, and recovery objectives, so Ferrosa frames object storage as a design option to evaluate rather than a guaranteed savings claim.
Snapshot and retention costs vary widely by provider and workload. For some deployments, keeping recovery data in object storage with lifecycle policies may reduce duplicated block-storage footprint; for others, request volume, cache hit rate, and operational constraints will dominate. Treat the preview as something to benchmark against your own recovery model.
Replica factor, snapshots, and cross-region copies all multiply storage footprint. Ferrosa's storage model is intended to reduce some duplicate durable copies by making object storage the primary durable layer, but the economics need workload-specific validation.
Java's garbage collector causes unpredictable tail latency spikes. Rust removes JVM GC from Ferrosa itself, but tail latency still depends on workload, I/O, cache behavior, and clustering.
When a node dies, streaming hundreds of gigabytes from replicas takes hours. Your cluster runs degraded the entire time.
Compaction tuning, repair scheduling, bootstrap orchestration, heap sizing — Snapshot model demands a dedicated team to keep it healthy.
Core Capabilities
One database for AI-native workloads. Search, vectors, graph, SPARQL, transactions, and CDC — unified on S3. Cypher + Bolt and SPARQL are documented as database query surfaces, but they do not need to crowd the top navigation.
Ferrosa is designed so object storage is the durable layer rather than a secondary backup target. That can reduce duplicated block-storage snapshots in some deployments, especially when lifecycle policies move older recovery data to colder classes. The actual cost profile depends on region, retention window, request volume, cache hit rate, and restore expectations.
A planned control surface sets retention policy:
ferrosa-ctl storage retention set --years 1.
Ferrosa can use S3 Lifecycle rules to tier archives through Standard, Infrequent Access, and Glacier-style classes. Preview documentation should treat published cloud prices as inputs to your own model rather than fixed Ferrosa savings.
| 10 TB Recovery | Snapshot model | Ferrosa |
|---|---|---|
| 7 days | extra block-storage copies | lifecycle-priced archives |
| 30 days | provider + RF dependent | request + tier dependent |
| 1 year | retention-policy dependent | restore-SLA dependent |
| 3 years | usually archival policy | cold-tier policy |
Illustrative only: compare your current snapshot/replication policy with object-storage lifecycle pricing and expected request volume.
Durable storage lives in S3. Local NVMe is a hot cache, not a commitment. Nodes can be treated more ephemerally than block-storage-bound designs, but recovery time depends on schema, cache warmth, object-store latency, and cluster state.
NVMe is a bounded cache layer, S3 is the unbounded durable store. Data that doesn't
fit on NVMe lives in S3 and gets pulled through on demand. The
local_cache_max_bytes config controls how much
NVMe is used. Size your local disks for your working set, not your total dataset.
Point-in-time recovery retention is one command:
ferrosa-ctl storage retention set --years 1.
Ferrosa configures S3 Lifecycle rules that tier your commit log archives through
Standard, Infrequent Access, and Glacier automatically. A full year of point-in-time
recovery cost can be modeled from your lifecycle policy. Cold SSTables may transition to
cheaper storage classes, but restore latency, request charges, and cache misses must be considered before choosing a tiering policy.
Node retirement is just as simple: decommission a node and its S3 objects are
tagged for cleanup.
Works with AWS S3, MinIO, Cloudflare R2, or any S3-compatible endpoint. No vendor lock-in.
S3-backed durability · lifecycle-aware retention · benchmark with your workloadNo JVM. No garbage collector. No stop-the-world pauses. Lock-free reads via ArcSwap, sharded memtables with nanosecond-level write locks, async I/O on Tokio. Predictable latency from the first request to the billionth.
P99 latency: no GC tailCQL protocol v4 — the same wire protocol your drivers already speak. Python, Java, Go, Rust (cdrs-tokio), and Node.js drivers connect without code changes. DDL, DML, prepared statements, batch operations, lightweight transactions (IF NOT EXISTS, IF conditions), system keyspaces, ALLOW FILTERING, toJson(), counter increment/decrement, collection types (map, set, list) with full UPDATE +/- operators and collection bind values in prepared statements — all there.
CQL v4/v5 · Bolt v5 · SPARQL 1.1 · 0 driver changes requiredCypher and SPARQL endpoints run against CQL-backed graph/RDF tables in the developer preview. Vertices and edges are normal tables with schema extensions. Variable-length paths, aggregations, expression evaluation, and worst-case optimal joins via leapfrog triejoin for cyclic pattern matching. Bolt v5 wire protocol for Neo4j drivers, HTTP/JSON for Cypher, and W3C SPARQL Protocol for semantic web and RDF workloads.
Cypher · SPARQL 1.1 · Bolt v5 · HTTP/JSON · RDF* · property pathsFerrosa includes parser and storage-observer work for table subscriptions. Treat arbitrary-query CQL/Cypher streaming as proposed or partially covered until the verification plans land.
The developer-preview docs keep these examples visible as design targets, not production CDC guarantees.
SUBSCRIBE table-name EVERY 5s · experimental-- Poll for changes every 5 seconds SUBSCRIBE social.users EVERY 5s; -- Push changes as they happen SUBSCRIBE social.users DELTA; -- Subscribe to a graph traversal SUBSCRIBE MATCH (a:Person)-[:FOLLOWS]->(b) RETURN b.name EVERY 10s;
Eleven index types behind a single trait: B-tree for range scans, hash for O(1) lookups,
composite for multi-column queries, phonetic for fuzzy string matching (Soundex,
Metaphone, Double Metaphone, Caverphone), three vector methods for approximate nearest
neighbor search, filtered indexes with predicates, and full-text search with
BM25 ranked retrieval. Native vector<float, N> CQL type
for embedding storage — store and query high-dimensional vectors directly.
For vector search, choose full-precision HNSW or the quantized
HVQ method (WITH OPTIONS = {'method':'hvq'}). In the in-tree
evaluation, HVQ reads 3.2× fewer bytes per query and answers
~3.5× faster at equal recall@10 vs the full HNSW sidecar.
See the Vector Indexes reference & evaluation.
Indexes are storage-attached — built asynchronously after SSTable flush to keep
write-path impact bounded. Full-text indexes use inverted-index sidecar files with
a pluggable analyzer pipeline (tokenizer, stop words, Porter stemmer).
Per-index staleness tracking and operational metrics
via system_views.secondary_indexes let you monitor build progress
and lag in real time. CQL-compatible DDL with standard CREATE INDEX ... USING 'type' syntax.
-- B-tree index for range queries CREATE INDEX idx_email ON users (email) USING 'btree'; -- Phonetic index for fuzzy name matching CREATE INDEX idx_name ON users (last_name) USING 'phonetic' WITH OPTIONS = {'algorithm': 'double_metaphone'}; -- Vector index for ANN search CREATE INDEX idx_embed ON documents (embedding) USING 'vector' WITH OPTIONS = {'method': 'hnsw', 'metric': 'cosine', 'dimensions': '768'}; -- Full-text search with BM25 ranking CREATE INDEX idx_body ON articles (body) USING 'fulltext'; SELECT * FROM articles WHERE body = fts_match('distributed AND database'); -- Nearest neighbor query SELECT * FROM documents ORDER BY embedding ANN OF [0.1, 0.2, ...] LIMIT 10;
Distributed transactions with strict serializability via the Accord consensus protocol — not Paxos, not 2PC. Multi-partition atomic operations with all-or-nothing semantics. INSERT IF NOT EXISTS, UPDATE/DELETE IF conditions, and batch compare-and-set all work out of the box. Cross-shard transactions coordinate automatically.
1-RTT fast path through leaseholder for low-latency commits. 2-RTT slow path when coordination is needed. Crash recovery via protocol log replay with .accord sidecar files. Dynamic electorate reconfiguration with a 4-gate join protocol for safe membership changes.
Accord protocol · transaction components · Jepsen verification plannedAccord transaction components and Jepsen-style harness code exist, but public Jepsen evidence is still a tracked verification item.
-- Lightweight transactions (Snapshot model-compatible) INSERT INTO accounts (id, balance) VALUES ('acct-1', 1000) IF NOT EXISTS; -- Conditional update with compare-and-set UPDATE accounts SET balance = 900 WHERE id = 'acct-1' IF balance = 1000; -- Multi-statement transaction BEGIN TRANSACTION UPDATE accounts SET balance = balance - 100 WHERE id = 'acct-1' IF balance >= 100; UPDATE accounts SET balance = balance + 100 WHERE id = 'acct-2'; COMMIT TRANSACTION;
Pin hot tables to local NVMe for latency-sensitive reads. A single table attribute
(storage.pin = nvme) keeps SSTables on local disk, bypassing S3 entirely.
Ideal for session caches, materialized views, and lookup tables that tolerate node loss.
Optional pin_max_bytes cap evicts oldest SSTables when the budget is exceeded.
ALTER TABLE toggles pin mode on live tables — unpinning automatically uploads
existing SSTables to S3.
Built-in inverted index with BM25 ranked retrieval — no external search engine needed. Create a full-text index on any text column and query with boolean operators (AND, OR, NOT), phrase matching, and prefix wildcards. Pluggable analyzer pipeline: tokenizer, stop-word filter, Porter stemmer. Sidecar index files are built on flush and merged during compaction — off the foreground write acknowledgement path.
BM25 ranking · AND/OR/NOT/prefix · pluggable analyzers · async index buildReaders never block. ArcSwap provides wait-free atomic view loading. Writers touch sharded memtables for nanoseconds. No global locks, no contention, no coordination overhead on the hot path.
read-optimized metadata path
Self-hosted distributed tracing, metrics, and query analysis — no external
tools required. 25+ tracing spans across CQL, consensus, storage, and network
paths. Slow query detection with parameterized logging. Query fingerprint
tracking for automatic optimization recommendations. Per-client billing
metering. On-demand flame charts for CPU hot path analysis. All data in
system_observability virtual tables, queryable via standard CQL.
Optional OTLP export for enterprise monitoring stacks.
5-node clusters with Raft consensus, 8 tunable consistency levels (ONE through ALL, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM), and hinted handoff for replica failure recovery. Operator-controlled node join with approval gate, graceful decommission with streaming protocol, and skew-aware token rebalancing. Background maintenance handles auto-flush, compaction, and commit log GC. Ships as a systemd service with .deb packaging and Docker Compose support.
Raft · tunable CL · hinted handoff · token rebalancingAutomatic reconnection with exponential backoff for internode links. Graceful drain with configurable timeout ensures in-flight requests complete before shutdown. Connection drop detection and per-connection request limiting provide backpressure under load. Production-quality code built to handle real-world failure scenarios.
auto-reconnect · graceful drain · backpressureWhen a replica is temporarily unavailable, writes are stored as hints and replayed automatically when the node recovers. Configurable hint window and TTL cover the transient-failure path; for divergence that outlives the hint window, operator-initiated anti-entropy repair is the building block to reconcile replicas.
auto hint storage · replay on recovery
Operator-initiated Merkle-tree-based repair via the HTTP /repair endpoint
and the ferrosa-ctl repair CLI reconciles replicas that have diverged
beyond the hinted-handoff window. A bounded-memory streaming digest builds the
Merkle comparison incrementally, so the repair path scales to tables larger than
a node's memory. Repair is manual today — there is no scheduler — and consistency
under repair has not yet been Jepsen-verified.
Memtable backpressure, configurable via FERROSA_MEMTABLE_BACKPRESSURE_BYTES,
bounds in-memory write buffers under sustained write workloads. When the configured
threshold is reached, writers slow rather than letting memtables grow without limit,
keeping memory pressure predictable while flush and compaction drain the queue.
Operator-controlled join with approval gate prevents accidental cluster changes. Graceful decommission streams data to remaining nodes before removal. Token rebalancing with skew-aware algorithm redistributes load evenly across the ring.
join · decommission · rebalance
Native /metrics endpoint exposes cluster health, query latencies,
storage utilization, and connection counts in Prometheus format. No sidecar exporter
needed — scrape directly from each Ferrosa node. Integrates with Grafana, Datadog,
and any Prometheus-compatible monitoring stack.
User-defined functions execute as sandboxed WebAssembly — no Java, no JavaScript. Upload a compiled WASM binary to a table, reference it in CREATE FUNCTION with an AS clause. Write functions in Rust (preferred), C, Go, or any language that compiles to WASM. Full Wasmtime integration with WIT contract, compilation caching, and sandbox enforcement. Memory-limited, CPU-time-limited, no network or filesystem access. Designed for deterministic execution under configured limits.
WASM sandbox · WIT contract · Wasmtime compilation · moka cache · UDF + UDASee It In Action
Your existing application code works unchanged. Just point your driver at Ferrosa.
-- Create a keyspace with S3-backed replication CREATE KEYSPACE social WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': 3 }; -- Tables work exactly like Snapshot model CREATE TABLE social.users ( user_id uuid PRIMARY KEY, name text, email text, created_at timestamp ); -- Inserts, selects, updates — all standard CQL INSERT INTO social.users (user_id, name, email, created_at) VALUES (uuid(), 'Alice', 'alice@example.com', toTimestamp(now())); SELECT * FROM social.users WHERE user_id = ?;
-- Graph queries on the same data — no separate database -- Mark tables as graph entities via schema extensions ALTER TABLE social.users WITH extensions = {'graph.type': 'vertex', 'graph.label': 'Person'}; ALTER TABLE social.follows WITH extensions = { 'graph.type': 'edge', 'graph.label': 'FOLLOWS', 'graph.source': 'Person', 'graph.target': 'Person' }; -- Then query with Cypher via HTTP/JSON endpoint MATCH (a:Person {name: 'Alice'})-[:FOLLOWS]->(b:Person) RETURN b.name, b.email; -- Multi-hop: friends of friends MATCH (a:Person)-[:FOLLOWS*2]->(c:Person) WHERE a.name = 'Alice' RETURN DISTINCT c.name;
# SPARQL 1.1 — semantic queries on the same data # W3C standard for RDF, knowledge graphs, and provenance PREFIX foaf: <http://xmlns.com/foaf/0.1/> # Find all people and who they know SELECT ?person ?friend WHERE { ?person foaf:name ?name . ?person foaf:knows ?friend . } ORDER BY ?name LIMIT 20 # Property paths — transitive closure (who can Alice reach?) SELECT DISTINCT ?reachable WHERE { <http://example.org/alice> foaf:knows+ ?reachable . } # RDF* annotations — provenance on edges SELECT ?who ?confidence WHERE { << ?s foaf:knows ?o >> <http://example.org/confidence> ?confidence . FILTER(?confidence > 0.8) } # INSERT DATA — write triples via SPARQL Update INSERT DATA { <http://example.org/bob> foaf:name "Bob" . <http://example.org/bob> foaf:knows <http://example.org/alice> . }
-- B-tree index for sorted range queries CREATE INDEX idx_email ON social.users (email) USING 'btree'; -- Hash index for O(1) equality lookups CREATE INDEX idx_uid ON social.users (user_id) USING 'hash'; -- Composite index on multiple columns CREATE INDEX idx_name ON social.users (last_name, first_name) USING 'composite'; -- Phonetic index — fuzzy name matching CREATE INDEX idx_snd ON social.users (last_name) USING 'phonetic' WITH OPTIONS = {'algorithm': 'double_metaphone'}; SELECT * FROM social.users WHERE last_name SOUNDS LIKE 'Smith'; -- Filtered index — partial index over a row subset CREATE INDEX idx_active ON social.users (email) USING 'btree' WHERE status = 'active'; -- Vector index with HNSW for ANN search CREATE INDEX idx_embed ON docs.articles (embedding) USING 'vector' WITH OPTIONS = {'method': 'hnsw', 'metric': 'cosine', 'dimensions': '768', 'm': '16', 'ef_construction': '200'}; SELECT * FROM docs.articles ORDER BY embedding ANN OF [0.1, 0.2, ...] LIMIT 10; -- Monitor index health SELECT index_name, status, lag_seconds FROM system_views.secondary_indexes;
# Your existing Python code — just change the contact point from cassandra.cluster import Cluster # Before: Snapshot model # cluster = Cluster(['cassandra-node-1.prod']) # After: Ferrosa — same driver, same API cluster = Cluster(['ferrosa-node-1.prod']) session = cluster.connect('social') # Prepared statements work identically stmt = session.prepare(""" SELECT name, email FROM users WHERE user_id = ? """) user = session.execute(stmt, [user_id]) print(user.one().name) # Batch operations, async queries, retry policies — # everything your driver supports works unchanged.
-- Self-hosted observability via CQL virtual tables SELECT * FROM system_observability.cql_stats; SELECT * FROM system_observability.slow_queries; SELECT * FROM system_observability.query_fingerprints; -- Find queries causing full table scans SELECT * FROM system_observability.full_scan_reasons; -- Per-client billing metering SELECT * FROM system_observability.client_usage; -- Distributed traces (self-hosted, no Jaeger needed) SELECT * FROM system_observability.spans WHERE trace_id = ?; -- Prometheus metrics at /metrics, CLI: ferrosa-ctl monitor (TUI) -- WebSocket push: ws://host:9090/api/ws (subscribe/unsubscribe)
How We Compare
Search, vectors, graph, SPARQL, transactions, CDC. One database. AI native.
| Ferrosa | Snapshot model | ScyllaDB | DynamoDB | Keyspaces | |
|---|---|---|---|---|---|
| Language | Rust | Java | C++ | Proprietary | Managed |
| GC Pauses | None | Yes (JVM) | None | N/A | N/A |
| Memory Safety | Guaranteed | GC-managed | Manual (C++) | N/A | N/A |
| CQL Compatible | ✓ v4/v5 | ✓ | ✓ | ✗ | Partial |
| S3-Native Storage | ✓ | ✗ | ✗ | ✗ | ✗ |
| Graph Queries | ✓ Cypher + Bolt | ✗ | ✗ | ✗ | ✗ |
| Vector Search | ✓ HNSW + IVFFlat | SAI (limited) | ✗ | ✗ | ✗ |
| Full-Text Search | ✓ BM25 + analyzers | SASI (deprecated) | ✗ | ✗ | ✗ |
| NVMe Table Pinning | ✓ per-table | ✗ | ✗ | DAX (cache) | ✗ |
| Secondary Indexes | 11 types (incl. FTS) | SAI / 2i | SI | GSI / LSI | GSI |
| Real-Time Pub/Sub | ✓ SUBSCRIBE | CDC (external) | CDC (external) | DynamoDB Streams | ✗ |
| UDF Language | ✓ WASM | Java | Lua / WASM | N/A | N/A |
| Built-in Observability | ✓ CQL + TUI + Web + WS | JMX/nodetool | Prometheus | CloudWatch | CloudWatch |
| Production Ops | Hinted handoff, auto-reconnect, graceful drain, token rebalance, systemd | Manual (nodetool) | Manual + Operator | Managed | Managed |
| Transactions | ✓ Accord (strict serializable) | LWT (Paxos) | LWT (Paxos) | ACID (single-table) | LWT (Paxos) |
| Consensus | Raft + Accord | Paxos (Accord in 5.x) | Paxos (Pegasus) | N/A (managed) | N/A (managed) |
| Node Recovery | Seconds | Hours | Hours | Automatic | Automatic |
| Storage Cost Model | Object-store-first; tiering depends on workload and retention policy | Local / block storage; snapshot and replica costs depend on deployment | Local / block storage; cloud tier varies by operator and workload | Managed service pricing; depends on provisioned/on-demand capacity and region | Managed service pricing; depends on capacity mode, region, and traffic shape |
| Recovery Cost Model (30d) | Commit-log + object lifecycle model; validate against recovery objectives | Snapshot / replica retention model | Snapshot / replica retention model | Managed backup / restore pricing model | Managed backup / point-in-time recovery pricing model |
| Long-Term Recovery Model | Lifecycle-tiered object storage; cost depends on archive class and restore plan | Long-retention snapshot / archive model | Long-retention snapshot / archive model | Service retention limits and archive strategy vary by setup | Service retention limits and archive strategy vary by setup |
| Vendor Lock-In | None | None | Cloud tier | Full (AWS) | Full (AWS) |
Under the Hood
Each layer is an independent, tested crate. Use the full stack or embed individual components.
Built For
Migration
Same CQL protocol, same drivers, same consistency model. Import existing SSTables directly. Migrate one keyspace at a time with dual-read verification. Your application never knows the difference.
Scale Seamlessly
Start with a single node on your laptop. Add a second node for high availability with automatic pair mode. Scale to 3+ nodes and Ferrosa forms a multi-node cluster with Raft consensus, tunable consistency levels, hinted handoff, and node lifecycle management. Same binary, same config format — just add nodes.
Graph + Relational
Stop running separate databases for each workload. Ferrosa speaks CQL for your transactional workloads and Cypher for graph traversals — same tables, same cluster, same operational surface.
Grow With You
Start on your laptop, then evaluate cluster features as they harden. The same Ferrosa binary exposes single-node, pair-mode, and cluster-mode paths, but production hardening is still in progress.
1 Node
Run a single Ferrosa node locally. Common CQL driver paths and graph endpoints are available for developer-preview testing without cluster setup or coordination overhead. Validate compatibility before moving an application between tiers.
2 Nodes
Add a second node and Ferrosa automatically enters pair mode — synchronous replication with operator-driven failover paths. All 11 DDL operations replicate automatically: CREATE, ALTER, and DROP for keyspaces, tables, and roles, plus GRANT and REVOKE. Schema catch-up on rejoin, operator switchover, and force-promote for split-brain recovery.
3+ Nodes
Add a third node to evaluate cluster mode: Raft consensus for metadata (openraft), Murmur3 token ring for data sharding, and a coordinator pattern with tunable consistency levels (ONE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM). Hinted handoff stores writes for temporarily failed replicas and replays on recovery. Streaming protocol handles node bootstrap and decommission. Skew-aware token rebalancing keeps load evenly distributed. Deploy with Docker Compose or bare metal.
Download
Ferrosa is under active development. Use the hosted setup script for the current developer-preview install path.
See the getting started guide for architecture and design details.
Install, configure, and connect your first CQL driver in minutes.
Step-by-step walkthroughs for IoT, analytics, e-commerce, and 10 more use cases — with runnable CQL scripts.
CQL compatibility notes with supported statements, types, and known gaps.
Connect with psql/psycopg2 over the Postgres protocol — SELECT, DML, transactions; differential-tested vs real PostgreSQL.
CQL, Cypher, SPARQL, and vector search — one database for all your workloads.
Ferrosa is in active development. Developer-preview binary releases are available via the hosted setup script.