Series: Building a Real-Time Data Platform from the Edge — Post 2 of 7. The full source is on GitHub: rpi-data-platform


This post is infrastructure. No Kafka, no Delta Lake, no interesting problems yet. But every subsequent post in this series assumes a specific baseline: 64-bit OS, SSH access, a locked-down user setup, and Docker Engine running on arm64. Get this right once and never think about it again.

If you already have a Pi running and Docker installed, skip the sections on cgroup v2 configuration and swap — these are the two settings most tutorials skip that will cause silent failures later when Spark is under memory pressure.


Why 64-bit OS Lite

The Pi 400 ships with a 64-bit CPU (Cortex-A72). You want to run a 64-bit OS.

The reason is not performance in the abstract — it is container compatibility. The Docker images used in this series (Kafka, Spark, MinIO, Grafana) publish linux/arm64 builds. If you run a 32-bit OS, Docker pulls linux/arm/v7 images where they exist, silently degrades where they don’t, and fails outright for images with no 32-bit variant. Debugging that three posts from now is not fun.

Verify after flashing:

uname -m
# aarch64  ← correct
# armv7l   ← wrong, reflash

OS Lite, not Desktop. The full desktop environment costs roughly 200MB of RAM at idle — RAM that belongs to Kafka and Spark. Lite boots to a shell and nothing else.


Flash the SD card

Download Raspberry Pi Imager and flash Raspberry Pi OS Lite (64-bit) to a 32GB+ card.

Before writing, open the Advanced options (⚙️ or Ctrl+Shift+X) and configure everything there. This is the right place to do it — not after first boot.

SettingValue
Hostnamerpi-data-platform
Enable SSH✓ — public key auth only
SSH public keypaste your ~/.ssh/id_ed25519.pub
Usernamesomething other than pi
Passwordstrong, but SSH will be key-only anyway
Locale / timezoneset correctly
WiFi credentialsoptional — wired is better for a stationary Pi

Do not enable SSH with password auth. It is off-limits if the Pi is on any network with internet access.

Write the card, insert it into the Pi 400, and power on. The Pi 400 has the SD slot on the back-left edge.


First SSH connection

From your laptop:

ssh yourusername@rpi-data-platform.local

If .local resolution fails (common on some networks), find the IP from your router’s DHCP table or run arp -a on your laptop.

Once in, immediately update:

sudo apt update && sudo apt full-upgrade -y
sudo reboot

full-upgrade instead of upgrade because it handles kernel and bootloader updates that upgrade skips. Reboot before touching anything else.


System configuration

Expand the filesystem

The Imager handles this automatically since 2023, but verify:

df -h /
# Filesystem should reflect the full SD card capacity

If it shows only a few GB, run sudo raspi-config → Advanced Options → Expand Filesystem.

Configure swap

The Pi 400 has 4GB RAM. Kafka and Spark together will push that limit. Swap is not a performance solution — it is a safety net that prevents the OOM killer from taking down a container mid-job.

First, check what you already have:

free -h

Raspberry Pi OS Bookworm defaults to a 2GB swapfile. If the Swap row shows 2.0Gi, you are done — nothing to change.

If you are on an older image and Swap shows 100Mi, bump it up:

sudo dphys-swapfile swapoff

sudo nano /etc/dphys-swapfile
# Change: CONF_SWAPSIZE=100
# To:     CONF_SWAPSIZE=2048

sudo dphys-swapfile setup
sudo dphys-swapfile swapon

free -h
# Swap: should now show ~2.0G

cgroup v2, Docker, and why this matters

This is the step most Pi tutorials skip. Docker uses Linux control groups (cgroups) to enforce memory and CPU limits on containers. Spark’s executor memory caps (spark.executor.memory) only work if Docker is actually enforcing those limits — which requires cgroup v2 and the right kernel flags.

Raspberry Pi OS Bookworm enables cgroup v2 by default, but the memory controller is often inactive. Check:

cat /sys/fs/cgroup/cgroup.controllers
# Should include: memory cpuset cpu io

If memory is missing, add the kernel parameters:

sudo nano /boot/firmware/cmdline.txt

Append to the single line (do not add a newline):

cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory

Reboot, then verify:

cat /sys/fs/cgroup/cgroup.controllers
# memory cpuset cpu io pids

Without this, docker stats will show -- for memory usage and Spark’s memory limits will be silently ignored. You will not notice until Spark runs out of heap and the OOM killer starts terminating containers with no useful error message.

Set the GPU memory split

The Pi 400’s GPU shares system RAM. The default allocation is 76MB. Since there is no display attached, reclaim it:

sudo nano /boot/firmware/config.txt
# Add under [all]:
gpu_mem=16

16MB is the minimum. This frees ~60MB back to the OS — not a huge amount, but on a 4GB machine every bit counts when Spark is running.


Harden SSH

/etc/ssh/sshd_config — make these changes:

PasswordAuthentication no
PermitRootLogin no
PubkeyAuthentication yes

Restart:

sudo systemctl restart ssh

Test by opening a second terminal and confirming you can still connect before closing the first session. Locking yourself out of a headless Pi is annoying.


Install Docker Engine

Do not use the docker.io package from the Raspberry Pi OS repositories — it lags significantly behind upstream and lacks Docker Compose v2. Install from Docker’s official apt repository:

# Remove any old versions
sudo apt remove docker docker-engine docker.io containerd runc

# Install prerequisites
sudo apt install -y ca-certificates curl gnupg

# Add Docker's GPG key
sudo install -m 0755 -d /usr/share/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | \
  sudo gpg --dearmor -o /usr/share/keyrings/docker.gpg
sudo chmod a+r /usr/share/keyrings/docker.gpg

# Add the repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker.gpg] \
  https://download.docker.com/linux/debian \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Add your user to the docker group so you can run Docker without sudo:

sudo usermod -aG docker $USER
newgrp docker

Verify:

docker run --rm hello-world
docker compose version
# Docker Compose version v2.x.x or later

Note: the command is docker compose (subcommand), not docker-compose (standalone binary). The standalone v1 binary is end-of-life. Any script in this series uses the v2 syntax.

Verify arm64 image pulls correctly

docker run --rm --platform linux/arm64 alpine uname -m
# aarch64

If this returns armv7l, your OS is 32-bit regardless of what the imager said. Reflash.


Configure Docker daemon

Two settings worth making now before the stack grows:

sudo nano /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 65536,
      "Soft": 65536
    }
  }
}

Log rotation: by default Docker accumulates container logs indefinitely. On a Pi with a 32GB SD card, a chatty Kafka broker will fill the card. max-size: 10m and max-file: 3 caps each container at 30MB of logs.

File descriptor limit: Kafka and Spark open many file descriptors. The Linux default (1024) is too low. 65536 is the standard production value — set it now rather than debug Too many open files errors in post 4.

Restart Docker:

sudo systemctl restart docker

Verify the baseline

Before moving on, confirm everything is in order:

# 64-bit OS
uname -m                          # aarch64

# cgroup memory controller active
cat /sys/fs/cgroup/cgroup.controllers   # includes memory

# Swap configured
free -h                           # ~2G swap

# Docker running
sudo systemctl status docker      # active (running)

# Docker Compose v2
docker compose version            # Docker Compose version v2.x.x

# arm64 containers work
docker run --rm --platform linux/arm64 alpine uname -m  # aarch64

If all six checks pass, the Pi is ready.


Permanent ethernet setup

If you plan to run the Pi 24/7 on a wired connection — which is the intended setup for this series — do these three things once the cable is in.

Assign a static IP at the router level. Find the Pi’s MAC address (ip link show eth0), then create a DHCP reservation in your router’s admin panel. The hostname rpi-data-platform.local continues to work, but you also get a predictable IP that survives reboots. Do it at the router, not on the Pi — it survives OS reinstalls and is easier to manage.

Disable WiFi and Bluetooth. Nothing in this stack uses either, and leaving them active on a 24/7 machine is unnecessary attack surface:

sudo nano /boot/firmware/config.txt

Add under [all]:

dtoverlay=disable-wifi
dtoverlay=disable-bt

Enable unattended security upgrades. You want security patches applied automatically, but not full upgrades that could reboot the Pi mid-job:

sudo apt install unattended-upgrades
sudo dpkg-reconfigure unattended-upgrades

Select yes when prompted. This installs security patches only — kernel and package upgrades that require a reboot are deferred until you choose to reboot manually.

Reboot once to apply the config.txt changes:

sudo reboot

After this, WiFi is off, the IP is stable, and SSH latency is gone.


What this baseline gives you

Nothing visible yet. The Pi boots to a shell, accepts SSH connections, and runs Docker. That is exactly right.

What the next post adds is the first layer of the actual stack: Mosquitto as the MQTT broker and Zigbee2MQTT to pair the Sonoff sensors and translate their radio messages into JSON. Both run as Docker containers, both are configured via files mounted into the container — which means the configuration is version-controlled and reproducible.

The Zigbee dongle (/dev/ttyUSB0) is the one thing that cannot be containerised — it gets passed through to the Zigbee2MQTT container via the devices: key in Compose. That is also the first place where running on arm64 vs x86 matters at the hardware level: the Sonoff ZBDONGLE-E uses a Silicon Labs EFR32 chip with a CP210x USB-to-serial adapter, which the Pi kernel supports natively. No driver installation required.


Next: Post 3 — Zigbee2MQTT + Mosquitto: pairing sensors, understanding MQTT payloads