Every Delta Lake table is two things at once: a directory of Parquet data files, and a transaction log that describes what those files mean. The Parquet files are dumb storage. All the intelligence — schema, history, ACID guarantees — lives in the log.

Understanding the log is understanding Delta. Most docs explain what Delta does. This explains how.

The _delta_log/ directory

When you create a Delta table, Delta writes a _delta_log/ directory alongside your data:

my_table/
├── _delta_log/
│   ├── 00000000000000000000.json
│   ├── 00000000000000000001.json
│   ├── 00000000000000000002.json
│   └── 00000000000000000010.checkpoint.parquet
└── part-00000-a1b2c3.snappy.parquet
    part-00000-d4e5f6.snappy.parquet
    ...

Each .json file is one commit — a complete, atomic record of one write operation. The files are named with 20-digit zero-padded version numbers. Version 0 is the table creation. Version 1 is the first write. And so on, monotonically, forever.

The checkpoint file is a compacted snapshot — more on that shortly.

Anatomy of a commit file

A commit JSON file is newline-delimited JSON. Each line is an action, and a commit is a sequence of actions applied atomically. The main action types:

add — a new Parquet file is part of the table:

{
  "add": {
    "path": "part-00000-a1b2c3.snappy.parquet",
    "partitionValues": {"date": "2024-01-15"},
    "size": 102400,
    "modificationTime": 1705276800000,
    "dataChange": true,
    "stats": "{\"numRecords\":50000,\"minValues\":{\"id\":1},\"maxValues\":{\"id\":50000}}"
  }
}

remove — a file is no longer part of the table (it still exists on disk until VACUUM):

{
  "remove": {
    "path": "part-00000-old.snappy.parquet",
    "deletionTimestamp": 1705276800000,
    "dataChange": true
  }
}

metaData — schema or configuration change:

{
  "metaData": {
    "id": "3f7a2b91-...",
    "schemaString": "{\"type\":\"struct\",\"fields\":[...]}",
    "partitionColumns": ["date"],
    "configuration": {"delta.autoOptimize.optimizeWrite": "true"}
  }
}

protocol — minimum reader/writer versions required to interact with this table. When Delta enables a new feature (deletion vectors, column mapping), it bumps the protocol version so older readers fail fast rather than silently misread data.

A full INSERT into a partitioned table produces one commit with several add actions — one per output file. An UPDATE that rewrites three files produces three remove actions and three add actions in the same commit. All-or-nothing.

How reads reconstruct table state

When you run SELECT * FROM my_table, Delta doesn’t just scan Parquet files — it first reconstructs the current table snapshot. The algorithm:

  1. Find the latest checkpoint file (if any)
  2. Load that checkpoint as the base state
  3. Replay all JSON commits after the checkpoint, in order
  4. The result is the set of currently-active add actions = the set of files to scan

This is why Delta reads have a small overhead even before touching data: the log replay. For a table with a recent checkpoint and a handful of commits since, this is fast. For a table with thousands of commits and no checkpoint, it’s a full log scan — which is why checkpointing matters.

The stats field on each add action is where data skipping comes from. Delta stores per-file min/max values and null counts. When your query has a filter like WHERE id > 40000, Delta can rule out files whose maxValues.id < 40000 without opening them. This is entirely driven by log metadata — no index files, no separate statistics store.

Checkpoint files

Every 10 commits (configurable via delta.checkpointInterval), Delta writes a checkpoint. A checkpoint is a Parquet file that encodes the same information as “replay all JSON commits from version 0” — the complete set of currently-active add and remove actions, plus current metadata and protocol.

A checkpoint for version 10 means: to read the current state of the table, load 00000000000000000010.checkpoint.parquet, then replay commits 11, 12, 13… You never need to go back further.

Delta also writes a _last_checkpoint file:

{"version": 10, "size": 5}

This tells readers where to start — no need to list the entire _delta_log/ directory to find the latest checkpoint.

For large tables with millions of files, a single checkpoint Parquet can itself become large. Delta 2.0+ supports multi-part checkpoints: the checkpoint is split into multiple Parquet files that can be read in parallel.

Optimistic concurrency and conflict detection

Delta uses optimistic concurrency: multiple writers proceed in parallel and only conflict at commit time. The protocol:

  1. Writer reads the current table version (say, version 5)
  2. Writer performs its work, prepares a set of add/remove actions
  3. Writer attempts to write version 6 (the next JSON file)
  4. If someone else already wrote version 6, the writer re-reads version 6, checks for conflicts, and retries at version 7

Conflict detection isn’t “did anyone else write?” — it’s “did anyone else write in a way that invalidates my assumptions?” Delta tracks which files a transaction read and checks whether any of those files were modified by intervening commits. An INSERT into a non-overlapping partition doesn’t conflict with a concurrent INSERT into a different partition. A blind UPDATE on the same files does.

This is what makes concurrent writes to the same Delta table possible without a distributed lock manager — but it’s also why high-concurrency, overlapping writes produce retries, and why very high write concurrency eventually needs careful partitioning strategy.

Time travel and VACUUM

Because remove actions only mark files as logically deleted (they never delete the Parquet file itself), old table versions remain readable. SELECT * FROM my_table VERSION AS OF 3 replays the log up to version 3 and reads those files. This is time travel — zero copy, no separate snapshot storage.

VACUUM is what actually reclaims disk space. It:

  1. Scans the log to find all files referenced in the current snapshot and any version within the retention window (default: 7 days)
  2. Lists all physical Parquet files in the table directory
  3. Deletes any physical files not referenced by any retained version

The critical implication: VACUUM with the default retention window means you can time-travel up to 7 days back. Lowering the retention window frees disk space faster but shortens how far back you can query. Setting it to zero and running VACUUM is irreversible — those old snapshots are gone.

One subtlety: VACUUM doesn’t touch the _delta_log/ directory. Old JSON commits stay forever (or until you use VACUUM --dry-run equivalents against the log itself, which is a separate, newer operation called log retention cleanup).

What this means in practice

A few things that fall out of understanding the log:

Small file compaction is a log operation. OPTIMIZE rewrites N small Parquet files into fewer large ones. From the log’s perspective, it’s just a commit with N remove actions and M add actions. No data is logically changed — the files just got bigger. The old small files still exist on disk until VACUUM.

Schema evolution is a metadata action. Adding a nullable column is a single metaData action in the log, zero bytes of Parquet rewritten. Renaming a column (with column mapping enabled) is similarly log-only — the Parquet files stay untouched, and Delta uses the column mapping in the metadata to reconcile names.

The log is the source of truth, not the filesystem. If a Parquet file exists in the table directory but has no add action in the log, Delta ignores it. This is how partial write failures are handled — the failed writer never committed its JSON file, so the orphaned Parquet files are invisible to readers and get cleaned up by VACUUM.