docs(home-assistant): design spec for ZBT-2 Thread + OTBR setup

Captures the architecture, operator workflow, and verification for
running the Connect ZBT-2 as an OpenThread Border Router on jupiter
(via nixos-unstable's services.openthread-border-router module),
with HA's otbr + thread integrations driving the Thread network
and the existing matter-server consuming credentials for
Matter-over-Thread device commissioning.

Supersedes the ZHA-direction commit on this branch (e8d09f4),
which will be reverted at the start of implementation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
marthsincemelee
2026-05-10 15:29:21 +02:00
parent e8d09f40f6
commit dbeda276e1
@@ -0,0 +1,241 @@
# ZBT-2 as a Thread Border Router for Home Assistant on `jupiter`
**Date:** 2026-05-10
**Branch:** `feature/ha-zbt-2-thread`
**Status:** Design — pending implementation plan
## Context
Home Assistant on `jupiter` already runs natively (`services.home-assistant`) with the Matter integration and `services.matter-server` enabled, but has no Zigbee or Thread radio. The user has acquired a **Home Assistant Connect ZBT-2** (Nabu Casa's Silicon Labs EFR32MG24based USB Zigbee/Thread radio).
The user wants the dongle running as an **OpenThread Border Router (OTBR)** — Thread only, not Zigbee — so Matter-over-Thread devices can be onboarded through the existing HA Matter integration.
A previous iteration of this work shipped `zha` enablement on the same branch (commit `e8d09f4`). That commit will be reverted as part of implementation; this design supersedes it.
## Goals
- Bring up `otbr-agent` on jupiter against the ZBT-2.
- Have Home Assistant auto-discover the OTBR via mDNS and use its REST API to manage the Thread network.
- Have `services.matter-server` (already enabled) consume Thread credentials from HA so Matter-over-Thread devices commission through the ZBT-2.
- One-time, manual firmware flash from Zigbee NCP to OpenThread RCP via `universal-silabs-flasher` (option B from brainstorming — no HA-driven update flow).
## Non-goals
- **Multipan / multiprotocol** (Zigbee + Thread on one radio). Out of scope; the dongle will be Thread-only.
- **Falling back to ZHA** if Thread misbehaves. Thread-only by choice; if it fails the response is to debug, not to dual-stack.
- **HA-UI-driven firmware updates.** The HAOS "Silicon Labs Multiprotocol" add-on workflow doesn't translate to native NixOS without faking a supervisor; the user explicitly accepted CLI-only flashing.
- **Thread network credential backups.** HA owns the dataset; standard HA backup hygiene (separate concern) covers it.
## Architecture
```
┌────────────────────────── jupiter (NixOS) ──────────────────────────┐
│ │
ZBT-2 USB ──►│ /dev/serial/by-id/usb-Nabu_Casa_..._ZBT-2_<serial>-... │
│ │ │
│ │ spinel+hdlc+uart, 115200 baud │
│ ▼ │
│ ┌───────────────┐ REST :8081 (loopback) ┌──────────────────┐ │
│ │ otbr-agent │ ◄─────────────────────►│ home-assistant │ │
│ │ (systemd) │ │ + matter-server │ │
│ │ wpan0 ───────┼── advertises via ─┐ │ extraComponents:│ │
│ └───────────────┘ avahi (_meshcop) │ │ matter, │ │
│ ▼ │ mobile_app, │ │
│ enp3s0 (LAN — backbone) │ otbr, thread │ │
│ └──────────────────┘ │
└────────────────────────────────────┬──────────────────────────────────┘
home LAN ◄─┘
(Matter-over-Thread devices join here)
```
### Components
1. **The radio.** ZBT-2, USB-attached, running OpenThread RCP firmware after a one-time flash.
2. **`otbr-agent`** (systemd). Managed by the unstable `services.openthread-border-router` NixOS module imported via `inputs.nixpkgs-unstable`. Owns `wpan0`, talks Spinel to the dongle, exposes the OTBR REST API on `127.0.0.1:8081`, advertises `_meshcop._udp` over `enp3s0` via avahi.
3. **Home Assistant** (already running). Gains the `otbr` and `thread` extra components. Discovers OTBR via mDNS, drives the REST API, supplies Thread operational datasets to `matter-server` during Matter commissioning.
### Data flows
- **OTBR ↔ ZBT-2:** Spinel-over-HDLC over UART. Built automatically by the module from `radio.device` as `spinel+hdlc+uart://<device>?uart-baudrate=115200`.
- **HA ↔ OTBR:** mDNS discovery (`_meshcop._udp`) → REST calls to `127.0.0.1:8081` for network management.
- **Matter commissioning:** HA scans QR → `matter-server` does BLE commissioning → asks HA for Thread dataset → HA fetches from OTBR → ships to device → device joins Thread mesh through the ZBT-2.
HA never opens the serial port directly; `matter-server` never talks to OTBR directly. HA brokers between them — that's why all four extra components are needed.
## NixOS-side changes
All changes live in **`modules/environments/home-assistant/default.nix`**. No host-level changes in `machines/jupiter/` (the existing profile activation handles that), no flake-level changes (the existing `_module.args.self = self;` wiring is sufficient).
### Edited module sketch
```nix
{ config, lib, pkgs, self, ... }:
let
cfg = config.my.profiles.home-assistant;
hostName = config.networking.hostName;
in
{
imports = [
# OTBR module isn't in 25.11 yet; use unstable's directly. Package
# comes from the existing `unstable` overlay.
"${self.inputs.nixpkgs-unstable}/nixos/modules/services/home-automation/openthread-border-router.nix"
];
options.my.profiles.home-assistant.enable = lib.mkEnableOption "Home Automation";
config = lib.mkIf cfg.enable {
services.matter-server.enable = true;
services.home-assistant = {
enable = true;
openFirewall = true;
extraComponents = [
"matter"
"mobile_app"
"otbr"
"thread"
];
};
services.home-assistant.config = {
name = "Home - Rechberg";
unit_system = "metric";
mobile_app = { };
};
services.openthread-border-router = {
enable = true;
package = pkgs.unstable.openthread-border-router;
openFirewall = true;
backboneInterfaces = [ "enp3s0" ]; # verify with `ip link` post-deploy
radio.device = "/dev/serial/by-id/usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial>-...";
# web.enable left default (off) — HA UI is the management surface
};
my.homepage.services = [
{
group = "Services";
name = "Home Assistant";
description = "Home automation";
href = "http://${hostName}:8123";
icon = "si-homeassistant";
}
];
};
}
```
### Reverts of the prior ZHA commit
Drop both lines from commit `e8d09f4`:
- `"zha"` from `extraComponents` (replaced by `"otbr"` + `"thread"`).
- `users.users.hass.extraGroups = [ "dialout" ];``otbr-agent` runs as root and owns the device directly; HA never opens the serial port itself.
Done by `git revert e8d09f4` at the start of implementation, before applying the new diff.
### Decisions captured
- **No `universal-silabs-flasher` in `environment.systemPackages`.** Flashing is a once-or-twice-a-year operation; `nix shell nixpkgs#python313Packages.universal-silabs-flasher` is sufficient when needed and avoids a perma-dep on a tool that's idle most of the time.
- **No firmware pinning in the flake.** Consistent with option B (CLI-only manual flashing). The user fetches the `.gbl` from <https://github.com/NabuCasa/silabs-firmware-builder/releases> at update time.
- **`backboneInterfaces = [ "enp3s0" ]`** as a starting value (per `machines/jupiter/hardware-configuration.nix:64`). To be verified against `ip link` after first deploy; correctable in a follow-up commit if the actual primary interface differs.
## Operator workflow
All commands the user runs themselves; nothing is SSH'd from the dev session.
### Step 0 — branch hygiene (dev Mac)
```
git switch feature/ha-zbt-2-thread # already renamed
git revert --no-edit e8d09f4 # drops ZHA + dialout commit
```
### Step 1 — apply the module changes (dev Mac)
Edit `modules/environments/home-assistant/default.nix` per the sketch above. Leave `<serial>` as a placeholder; fill after Step 3.
### Step 2 — eval-only sanity check (dev Mac)
```
nix flake check
```
or, equivalently,
```
nixos-rebuild dry-build --flake .#jupiter
```
Catches: bad import path, option typos, version skew between unstable and stable.
### Step 3 — plug ZBT-2 into jupiter (still on stock Zigbee firmware)
On jupiter:
```
ls -l /dev/serial/by-id/
```
Then on dev Mac: copy the full `usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial>-...` path into `radio.device`, commit on the feature branch.
### Step 4 — flash OpenThread RCP firmware (one-time, on jupiter)
```
nix shell nixpkgs#python313Packages.universal-silabs-flasher -c \
universal-silabs-flasher \
--device /dev/serial/by-id/usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial>-... \
flash --firmware ~/ot-rcp-zbt-2-<version>.gbl
```
Firmware download: latest ZBT-2 OpenThread RCP `.gbl` from <https://github.com/NabuCasa/silabs-firmware-builder/releases>.
OTBR isn't running yet at this point, so there's no contention on the device.
### Step 5 — rebuild (on jupiter)
```
sudo nixos-rebuild switch --flake .#jupiter
```
Brings up `otbr-agent.service`, opens TCP/8081, loads `otbr` + `thread` integrations in HA.
### Step 6 — confirm HA discovered it
- `http://jupiter:8123` → Settings → Devices & Services → "Open Thread Border Router" appears as auto-discovered within ~30 s.
- Click "Configure", form a new Thread network (or import an existing dataset).
- "Matter" integration page now shows Thread credentials available.
### Step 7 — Matter-over-Thread smoke test
Pair one Matter-over-Thread device end-to-end via the HA Companion app. Pairing should complete in 3090 s. If it does, merge `feature/ha-zbt-2-thread` into `master`.
### Future updates
Identical to Step 4: stop `otbr-agent.service`, run the flasher with a new `.gbl`, start the service.
## Failure modes
| Symptom | Likely cause | Mitigation |
|---|---|---|
| `otbr-agent.service` fails: "Failed to open device" | Dongle unplugged or `radio.device` path stale (e.g. after replacement) | Module sets `Restart = "on-failure"`; check `systemctl status otbr-agent`, re-check `/dev/serial/by-id/`, update path. |
| OTBR up but HA never discovers it | mDNS not propagating on `enp3s0` (most often: `backboneInterfaces` wrong) | `avahi-browse -r _meshcop._udp` should show one entry. If not: `ip link`, fix `backboneInterfaces`, rebuild. |
| HA shows OTBR but Matter pairing times out | Thread mesh prefix not routed to LAN, or matter-server can't reach the device's IPv6 ULA | `nft list ruleset` should show OTBR's forwarding rules; `ip -6 route` should include the Thread mesh prefix. |
| Dongle stuck after a half-completed flash | Flasher interrupted mid-write | Re-run the flash; bootloader stays addressable even if RCP firmware is corrupt. The tool detects bootloader-mode automatically. |
| `nixos-rebuild` fails: "option `services.openthread-border-router` does not exist" | Unstable module import path wrong / not in scope | Caught by Step 2 (eval-only). Fix before deploy. |
## Verification
### Eval-only (dev Mac, before deploy)
```
nix flake check
nix eval --json .#nixosConfigurations.jupiter.config.services.openthread-border-router.radio.url
nix eval --json .#nixosConfigurations.jupiter.config.services.home-assistant.extraComponents
```
Expected: flake check passes; `radio.url` is a `spinel+hdlc+uart://...` string built from the by-id path; `extraComponents` includes `"otbr"` and `"thread"`.
### Service-level (jupiter, after rebuild)
```
systemctl status otbr-agent.service
journalctl -u otbr-agent.service -n 50 --no-pager
ip link show wpan0
avahi-browse -r -t _meshcop._udp
curl -s http://127.0.0.1:8081/node/state
```
Expected: service active; `wpan0` exists (DOWN until HA forms a network — correct); one `_meshcop._udp` entry; REST returns a JSON state string.
### Functional (HA UI)
- "Open Thread Border Router" appears under auto-discovered integrations.
- Forming a Thread network from the integration UI succeeds.
- Pairing one Matter-over-Thread device end-to-end succeeds.
## Open questions / risks
- **Unstable module ABI.** The `services.openthread-border-router` module is in `nixos-unstable` and may change shape before landing in 26.05. If options rename, the eval-only step catches it before deploy. Acceptable risk; we can pin the unstable input revision if churn becomes annoying.
- **Backbone interface name.** `enp3s0` is a best guess from `hardware-configuration.nix:64`'s commented-out line. Definitive answer comes from `ip link` on the actual host. Trivial to correct if wrong.
- **First-flash chicken-and-egg.** Deferred to `nix shell` rather than baked into the system, because the dongle must be flashed *before* `otbr-agent` claims it. This is documented in Step 4.