From dbeda276e14117219b637ddb0adf927b043a0088 Mon Sep 17 00:00:00 2001 From: marthsincemelee Date: Sun, 10 May 2026 15:29:21 +0200 Subject: [PATCH] docs(home-assistant): design spec for ZBT-2 Thread + OTBR setup Captures the architecture, operator workflow, and verification for running the Connect ZBT-2 as an OpenThread Border Router on jupiter (via nixos-unstable's services.openthread-border-router module), with HA's otbr + thread integrations driving the Thread network and the existing matter-server consuming credentials for Matter-over-Thread device commissioning. Supersedes the ZHA-direction commit on this branch (e8d09f4), which will be reverted at the start of implementation. Co-Authored-By: Claude Opus 4.7 --- .../2026-05-10-zbt2-thread-otbr-design.md | 241 ++++++++++++++++++ 1 file changed, 241 insertions(+) create mode 100644 docs/superpowers/specs/2026-05-10-zbt2-thread-otbr-design.md diff --git a/docs/superpowers/specs/2026-05-10-zbt2-thread-otbr-design.md b/docs/superpowers/specs/2026-05-10-zbt2-thread-otbr-design.md new file mode 100644 index 0000000..83cd144 --- /dev/null +++ b/docs/superpowers/specs/2026-05-10-zbt2-thread-otbr-design.md @@ -0,0 +1,241 @@ +# ZBT-2 as a Thread Border Router for Home Assistant on `jupiter` + +**Date:** 2026-05-10 +**Branch:** `feature/ha-zbt-2-thread` +**Status:** Design — pending implementation plan + +## Context + +Home Assistant on `jupiter` already runs natively (`services.home-assistant`) with the Matter integration and `services.matter-server` enabled, but has no Zigbee or Thread radio. The user has acquired a **Home Assistant Connect ZBT-2** (Nabu Casa's Silicon Labs EFR32MG24‑based USB Zigbee/Thread radio). + +The user wants the dongle running as an **OpenThread Border Router (OTBR)** — Thread only, not Zigbee — so Matter-over-Thread devices can be onboarded through the existing HA Matter integration. + +A previous iteration of this work shipped `zha` enablement on the same branch (commit `e8d09f4`). That commit will be reverted as part of implementation; this design supersedes it. + +## Goals + +- Bring up `otbr-agent` on jupiter against the ZBT-2. +- Have Home Assistant auto-discover the OTBR via mDNS and use its REST API to manage the Thread network. +- Have `services.matter-server` (already enabled) consume Thread credentials from HA so Matter-over-Thread devices commission through the ZBT-2. +- One-time, manual firmware flash from Zigbee NCP to OpenThread RCP via `universal-silabs-flasher` (option B from brainstorming — no HA-driven update flow). + +## Non-goals + +- **Multipan / multiprotocol** (Zigbee + Thread on one radio). Out of scope; the dongle will be Thread-only. +- **Falling back to ZHA** if Thread misbehaves. Thread-only by choice; if it fails the response is to debug, not to dual-stack. +- **HA-UI-driven firmware updates.** The HAOS "Silicon Labs Multiprotocol" add-on workflow doesn't translate to native NixOS without faking a supervisor; the user explicitly accepted CLI-only flashing. +- **Thread network credential backups.** HA owns the dataset; standard HA backup hygiene (separate concern) covers it. + +## Architecture + +``` + ┌────────────────────────── jupiter (NixOS) ──────────────────────────┐ + │ │ + ZBT-2 USB ──►│ /dev/serial/by-id/usb-Nabu_Casa_..._ZBT-2_-... │ + │ │ │ + │ │ spinel+hdlc+uart, 115200 baud │ + │ ▼ │ + │ ┌───────────────┐ REST :8081 (loopback) ┌──────────────────┐ │ + │ │ otbr-agent │ ◄─────────────────────►│ home-assistant │ │ + │ │ (systemd) │ │ + matter-server │ │ + │ │ wpan0 ───────┼── advertises via ─┐ │ extraComponents:│ │ + │ └───────────────┘ avahi (_meshcop) │ │ matter, │ │ + │ ▼ │ mobile_app, │ │ + │ enp3s0 (LAN — backbone) │ otbr, thread │ │ + │ └──────────────────┘ │ + └────────────────────────────────────┬──────────────────────────────────┘ + │ + home LAN ◄─┘ + (Matter-over-Thread devices join here) +``` + +### Components + +1. **The radio.** ZBT-2, USB-attached, running OpenThread RCP firmware after a one-time flash. +2. **`otbr-agent`** (systemd). Managed by the unstable `services.openthread-border-router` NixOS module imported via `inputs.nixpkgs-unstable`. Owns `wpan0`, talks Spinel to the dongle, exposes the OTBR REST API on `127.0.0.1:8081`, advertises `_meshcop._udp` over `enp3s0` via avahi. +3. **Home Assistant** (already running). Gains the `otbr` and `thread` extra components. Discovers OTBR via mDNS, drives the REST API, supplies Thread operational datasets to `matter-server` during Matter commissioning. + +### Data flows + +- **OTBR ↔ ZBT-2:** Spinel-over-HDLC over UART. Built automatically by the module from `radio.device` as `spinel+hdlc+uart://?uart-baudrate=115200`. +- **HA ↔ OTBR:** mDNS discovery (`_meshcop._udp`) → REST calls to `127.0.0.1:8081` for network management. +- **Matter commissioning:** HA scans QR → `matter-server` does BLE commissioning → asks HA for Thread dataset → HA fetches from OTBR → ships to device → device joins Thread mesh through the ZBT-2. + +HA never opens the serial port directly; `matter-server` never talks to OTBR directly. HA brokers between them — that's why all four extra components are needed. + +## NixOS-side changes + +All changes live in **`modules/environments/home-assistant/default.nix`**. No host-level changes in `machines/jupiter/` (the existing profile activation handles that), no flake-level changes (the existing `_module.args.self = self;` wiring is sufficient). + +### Edited module sketch + +```nix +{ config, lib, pkgs, self, ... }: +let + cfg = config.my.profiles.home-assistant; + hostName = config.networking.hostName; +in +{ + imports = [ + # OTBR module isn't in 25.11 yet; use unstable's directly. Package + # comes from the existing `unstable` overlay. + "${self.inputs.nixpkgs-unstable}/nixos/modules/services/home-automation/openthread-border-router.nix" + ]; + + options.my.profiles.home-assistant.enable = lib.mkEnableOption "Home Automation"; + + config = lib.mkIf cfg.enable { + services.matter-server.enable = true; + + services.home-assistant = { + enable = true; + openFirewall = true; + extraComponents = [ + "matter" + "mobile_app" + "otbr" + "thread" + ]; + }; + + services.home-assistant.config = { + name = "Home - Rechberg"; + unit_system = "metric"; + mobile_app = { }; + }; + + services.openthread-border-router = { + enable = true; + package = pkgs.unstable.openthread-border-router; + openFirewall = true; + backboneInterfaces = [ "enp3s0" ]; # verify with `ip link` post-deploy + radio.device = "/dev/serial/by-id/usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_-..."; + # web.enable left default (off) — HA UI is the management surface + }; + + my.homepage.services = [ + { + group = "Services"; + name = "Home Assistant"; + description = "Home automation"; + href = "http://${hostName}:8123"; + icon = "si-homeassistant"; + } + ]; + }; +} +``` + +### Reverts of the prior ZHA commit + +Drop both lines from commit `e8d09f4`: + +- `"zha"` from `extraComponents` (replaced by `"otbr"` + `"thread"`). +- `users.users.hass.extraGroups = [ "dialout" ];` — `otbr-agent` runs as root and owns the device directly; HA never opens the serial port itself. + +Done by `git revert e8d09f4` at the start of implementation, before applying the new diff. + +### Decisions captured + +- **No `universal-silabs-flasher` in `environment.systemPackages`.** Flashing is a once-or-twice-a-year operation; `nix shell nixpkgs#python313Packages.universal-silabs-flasher` is sufficient when needed and avoids a perma-dep on a tool that's idle most of the time. +- **No firmware pinning in the flake.** Consistent with option B (CLI-only manual flashing). The user fetches the `.gbl` from at update time. +- **`backboneInterfaces = [ "enp3s0" ]`** as a starting value (per `machines/jupiter/hardware-configuration.nix:64`). To be verified against `ip link` after first deploy; correctable in a follow-up commit if the actual primary interface differs. + +## Operator workflow + +All commands the user runs themselves; nothing is SSH'd from the dev session. + +### Step 0 — branch hygiene (dev Mac) +``` +git switch feature/ha-zbt-2-thread # already renamed +git revert --no-edit e8d09f4 # drops ZHA + dialout commit +``` + +### Step 1 — apply the module changes (dev Mac) +Edit `modules/environments/home-assistant/default.nix` per the sketch above. Leave `` as a placeholder; fill after Step 3. + +### Step 2 — eval-only sanity check (dev Mac) +``` +nix flake check +``` +or, equivalently, +``` +nixos-rebuild dry-build --flake .#jupiter +``` +Catches: bad import path, option typos, version skew between unstable and stable. + +### Step 3 — plug ZBT-2 into jupiter (still on stock Zigbee firmware) +On jupiter: +``` +ls -l /dev/serial/by-id/ +``` +Then on dev Mac: copy the full `usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_-...` path into `radio.device`, commit on the feature branch. + +### Step 4 — flash OpenThread RCP firmware (one-time, on jupiter) +``` +nix shell nixpkgs#python313Packages.universal-silabs-flasher -c \ + universal-silabs-flasher \ + --device /dev/serial/by-id/usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_-... \ + flash --firmware ~/ot-rcp-zbt-2-.gbl +``` +Firmware download: latest ZBT-2 OpenThread RCP `.gbl` from . + +OTBR isn't running yet at this point, so there's no contention on the device. + +### Step 5 — rebuild (on jupiter) +``` +sudo nixos-rebuild switch --flake .#jupiter +``` +Brings up `otbr-agent.service`, opens TCP/8081, loads `otbr` + `thread` integrations in HA. + +### Step 6 — confirm HA discovered it +- `http://jupiter:8123` → Settings → Devices & Services → "Open Thread Border Router" appears as auto-discovered within ~30 s. +- Click "Configure", form a new Thread network (or import an existing dataset). +- "Matter" integration page now shows Thread credentials available. + +### Step 7 — Matter-over-Thread smoke test +Pair one Matter-over-Thread device end-to-end via the HA Companion app. Pairing should complete in 30–90 s. If it does, merge `feature/ha-zbt-2-thread` into `master`. + +### Future updates +Identical to Step 4: stop `otbr-agent.service`, run the flasher with a new `.gbl`, start the service. + +## Failure modes + +| Symptom | Likely cause | Mitigation | +|---|---|---| +| `otbr-agent.service` fails: "Failed to open device" | Dongle unplugged or `radio.device` path stale (e.g. after replacement) | Module sets `Restart = "on-failure"`; check `systemctl status otbr-agent`, re-check `/dev/serial/by-id/`, update path. | +| OTBR up but HA never discovers it | mDNS not propagating on `enp3s0` (most often: `backboneInterfaces` wrong) | `avahi-browse -r _meshcop._udp` should show one entry. If not: `ip link`, fix `backboneInterfaces`, rebuild. | +| HA shows OTBR but Matter pairing times out | Thread mesh prefix not routed to LAN, or matter-server can't reach the device's IPv6 ULA | `nft list ruleset` should show OTBR's forwarding rules; `ip -6 route` should include the Thread mesh prefix. | +| Dongle stuck after a half-completed flash | Flasher interrupted mid-write | Re-run the flash; bootloader stays addressable even if RCP firmware is corrupt. The tool detects bootloader-mode automatically. | +| `nixos-rebuild` fails: "option `services.openthread-border-router` does not exist" | Unstable module import path wrong / not in scope | Caught by Step 2 (eval-only). Fix before deploy. | + +## Verification + +### Eval-only (dev Mac, before deploy) +``` +nix flake check +nix eval --json .#nixosConfigurations.jupiter.config.services.openthread-border-router.radio.url +nix eval --json .#nixosConfigurations.jupiter.config.services.home-assistant.extraComponents +``` +Expected: flake check passes; `radio.url` is a `spinel+hdlc+uart://...` string built from the by-id path; `extraComponents` includes `"otbr"` and `"thread"`. + +### Service-level (jupiter, after rebuild) +``` +systemctl status otbr-agent.service +journalctl -u otbr-agent.service -n 50 --no-pager +ip link show wpan0 +avahi-browse -r -t _meshcop._udp +curl -s http://127.0.0.1:8081/node/state +``` +Expected: service active; `wpan0` exists (DOWN until HA forms a network — correct); one `_meshcop._udp` entry; REST returns a JSON state string. + +### Functional (HA UI) +- "Open Thread Border Router" appears under auto-discovered integrations. +- Forming a Thread network from the integration UI succeeds. +- Pairing one Matter-over-Thread device end-to-end succeeds. + +## Open questions / risks + +- **Unstable module ABI.** The `services.openthread-border-router` module is in `nixos-unstable` and may change shape before landing in 26.05. If options rename, the eval-only step catches it before deploy. Acceptable risk; we can pin the unstable input revision if churn becomes annoying. +- **Backbone interface name.** `enp3s0` is a best guess from `hardware-configuration.nix:64`'s commented-out line. Definitive answer comes from `ip link` on the actual host. Trivial to correct if wrong. +- **First-flash chicken-and-egg.** Deferred to `nix shell` rather than baked into the system, because the dongle must be flashed *before* `otbr-agent` claims it. This is documented in Step 4.