-
Building a Real-Time Data Platform on a Raspberry Pi: Architecture Overview
A self-contained streaming platform on a Raspberry Pi 400 — Zigbee sensors, MQTT, Kafka, Spark Structured Streaming, Delta Lake, and Grafana, no cloud required. This post covers the architecture and what we are building across the series.
raspberry-pi zigbee mqtt kafka spark delta-lake prometheus grafana series -
Inside the Delta Lake Transaction Log
How Delta Lake uses a sequence of JSON commit files and Parquet checkpoints to turn a directory of Parquet files into an ACID-compliant table — and what that means for reads, writes, and time travel.
delta-lake internals storage -
Spark's Physical Execution Model: From DAG to Task
The DAG is a logical plan. This post covers what actually happens between calling .count() and getting a number back — how stages are cut, what the task scheduler does, and where the shuffle lands on disk.
spark internals execution