Vetora logo

Storage Engines

LSM trees, B-trees, and the internal mechanics of how databases persist data.

Concepts

B-Tree and B+ TreeP0

B-trees and B+ trees are self-balancing tree data structures that maintain sorted data and allow O(log n) lookups, insertions, and deletions. They are the default index structure in most relational databases because their page-oriented design aligns perfectly with how disks and SSDs read and write data in fixed-size blocks.

LSM Tree (Log-Structured Merge Tree)P0

The Log-Structured Merge Tree (LSM tree) is a write-optimized data structure that converts random writes into sequential I/O by buffering changes in an in-memory sorted structure (memtable) and periodically flushing them to immutable sorted files (SSTables) on disk. Background compaction merges these files to maintain read performance.

LSM Tree vs B-TreeP0

A comprehensive comparison of the two dominant storage engine architectures: LSM trees (write-optimized, sequential I/O, compaction-based) and B-trees (read-optimized, in-place updates, page-based). Understanding this trade-off is fundamental to choosing the right database engine for a given workload.

Write-Ahead Log (WAL)P0

The Write-Ahead Log (WAL) is the fundamental crash-recovery mechanism used by virtually every database. The core rule is simple: log the change to durable storage BEFORE applying it to the actual data pages. This ensures that committed transactions can always be recovered after a crash, even if the data pages were not yet written to disk.

Bloom Filters and Cuckoo FiltersP0

Bloom filters are space-efficient probabilistic data structures for set membership testing. They can definitively say an element is NOT in a set (zero false negatives) but may incorrectly say an element IS in the set (false positives). Cuckoo filters extend this concept with support for deletion and better space efficiency at low false positive rates.