docs(home-assistant): design spec for ZBT-2 Thread + OTBR setup
Captures the architecture, operator workflow, and verification for
running the Connect ZBT-2 as an OpenThread Border Router on jupiter
(via nixos-unstable's services.openthread-border-router module),
with HA's otbr + thread integrations driving the Thread network
and the existing matter-server consuming credentials for
Matter-over-Thread device commissioning.
Supersedes the ZHA-direction commit on this branch (e8d09f4),
which will be reverted at the start of implementation.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,241 @@
|
||||
# ZBT-2 as a Thread Border Router for Home Assistant on `jupiter`
|
||||
|
||||
**Date:** 2026-05-10
|
||||
**Branch:** `feature/ha-zbt-2-thread`
|
||||
**Status:** Design — pending implementation plan
|
||||
|
||||
## Context
|
||||
|
||||
Home Assistant on `jupiter` already runs natively (`services.home-assistant`) with the Matter integration and `services.matter-server` enabled, but has no Zigbee or Thread radio. The user has acquired a **Home Assistant Connect ZBT-2** (Nabu Casa's Silicon Labs EFR32MG24‑based USB Zigbee/Thread radio).
|
||||
|
||||
The user wants the dongle running as an **OpenThread Border Router (OTBR)** — Thread only, not Zigbee — so Matter-over-Thread devices can be onboarded through the existing HA Matter integration.
|
||||
|
||||
A previous iteration of this work shipped `zha` enablement on the same branch (commit `e8d09f4`). That commit will be reverted as part of implementation; this design supersedes it.
|
||||
|
||||
## Goals
|
||||
|
||||
- Bring up `otbr-agent` on jupiter against the ZBT-2.
|
||||
- Have Home Assistant auto-discover the OTBR via mDNS and use its REST API to manage the Thread network.
|
||||
- Have `services.matter-server` (already enabled) consume Thread credentials from HA so Matter-over-Thread devices commission through the ZBT-2.
|
||||
- One-time, manual firmware flash from Zigbee NCP to OpenThread RCP via `universal-silabs-flasher` (option B from brainstorming — no HA-driven update flow).
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **Multipan / multiprotocol** (Zigbee + Thread on one radio). Out of scope; the dongle will be Thread-only.
|
||||
- **Falling back to ZHA** if Thread misbehaves. Thread-only by choice; if it fails the response is to debug, not to dual-stack.
|
||||
- **HA-UI-driven firmware updates.** The HAOS "Silicon Labs Multiprotocol" add-on workflow doesn't translate to native NixOS without faking a supervisor; the user explicitly accepted CLI-only flashing.
|
||||
- **Thread network credential backups.** HA owns the dataset; standard HA backup hygiene (separate concern) covers it.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌────────────────────────── jupiter (NixOS) ──────────────────────────┐
|
||||
│ │
|
||||
ZBT-2 USB ──►│ /dev/serial/by-id/usb-Nabu_Casa_..._ZBT-2_<serial>-... │
|
||||
│ │ │
|
||||
│ │ spinel+hdlc+uart, 115200 baud │
|
||||
│ ▼ │
|
||||
│ ┌───────────────┐ REST :8081 (loopback) ┌──────────────────┐ │
|
||||
│ │ otbr-agent │ ◄─────────────────────►│ home-assistant │ │
|
||||
│ │ (systemd) │ │ + matter-server │ │
|
||||
│ │ wpan0 ───────┼── advertises via ─┐ │ extraComponents:│ │
|
||||
│ └───────────────┘ avahi (_meshcop) │ │ matter, │ │
|
||||
│ ▼ │ mobile_app, │ │
|
||||
│ enp3s0 (LAN — backbone) │ otbr, thread │ │
|
||||
│ └──────────────────┘ │
|
||||
└────────────────────────────────────┬──────────────────────────────────┘
|
||||
│
|
||||
home LAN ◄─┘
|
||||
(Matter-over-Thread devices join here)
|
||||
```
|
||||
|
||||
### Components
|
||||
|
||||
1. **The radio.** ZBT-2, USB-attached, running OpenThread RCP firmware after a one-time flash.
|
||||
2. **`otbr-agent`** (systemd). Managed by the unstable `services.openthread-border-router` NixOS module imported via `inputs.nixpkgs-unstable`. Owns `wpan0`, talks Spinel to the dongle, exposes the OTBR REST API on `127.0.0.1:8081`, advertises `_meshcop._udp` over `enp3s0` via avahi.
|
||||
3. **Home Assistant** (already running). Gains the `otbr` and `thread` extra components. Discovers OTBR via mDNS, drives the REST API, supplies Thread operational datasets to `matter-server` during Matter commissioning.
|
||||
|
||||
### Data flows
|
||||
|
||||
- **OTBR ↔ ZBT-2:** Spinel-over-HDLC over UART. Built automatically by the module from `radio.device` as `spinel+hdlc+uart://<device>?uart-baudrate=115200`.
|
||||
- **HA ↔ OTBR:** mDNS discovery (`_meshcop._udp`) → REST calls to `127.0.0.1:8081` for network management.
|
||||
- **Matter commissioning:** HA scans QR → `matter-server` does BLE commissioning → asks HA for Thread dataset → HA fetches from OTBR → ships to device → device joins Thread mesh through the ZBT-2.
|
||||
|
||||
HA never opens the serial port directly; `matter-server` never talks to OTBR directly. HA brokers between them — that's why all four extra components are needed.
|
||||
|
||||
## NixOS-side changes
|
||||
|
||||
All changes live in **`modules/environments/home-assistant/default.nix`**. No host-level changes in `machines/jupiter/` (the existing profile activation handles that), no flake-level changes (the existing `_module.args.self = self;` wiring is sufficient).
|
||||
|
||||
### Edited module sketch
|
||||
|
||||
```nix
|
||||
{ config, lib, pkgs, self, ... }:
|
||||
let
|
||||
cfg = config.my.profiles.home-assistant;
|
||||
hostName = config.networking.hostName;
|
||||
in
|
||||
{
|
||||
imports = [
|
||||
# OTBR module isn't in 25.11 yet; use unstable's directly. Package
|
||||
# comes from the existing `unstable` overlay.
|
||||
"${self.inputs.nixpkgs-unstable}/nixos/modules/services/home-automation/openthread-border-router.nix"
|
||||
];
|
||||
|
||||
options.my.profiles.home-assistant.enable = lib.mkEnableOption "Home Automation";
|
||||
|
||||
config = lib.mkIf cfg.enable {
|
||||
services.matter-server.enable = true;
|
||||
|
||||
services.home-assistant = {
|
||||
enable = true;
|
||||
openFirewall = true;
|
||||
extraComponents = [
|
||||
"matter"
|
||||
"mobile_app"
|
||||
"otbr"
|
||||
"thread"
|
||||
];
|
||||
};
|
||||
|
||||
services.home-assistant.config = {
|
||||
name = "Home - Rechberg";
|
||||
unit_system = "metric";
|
||||
mobile_app = { };
|
||||
};
|
||||
|
||||
services.openthread-border-router = {
|
||||
enable = true;
|
||||
package = pkgs.unstable.openthread-border-router;
|
||||
openFirewall = true;
|
||||
backboneInterfaces = [ "enp3s0" ]; # verify with `ip link` post-deploy
|
||||
radio.device = "/dev/serial/by-id/usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial>-...";
|
||||
# web.enable left default (off) — HA UI is the management surface
|
||||
};
|
||||
|
||||
my.homepage.services = [
|
||||
{
|
||||
group = "Services";
|
||||
name = "Home Assistant";
|
||||
description = "Home automation";
|
||||
href = "http://${hostName}:8123";
|
||||
icon = "si-homeassistant";
|
||||
}
|
||||
];
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Reverts of the prior ZHA commit
|
||||
|
||||
Drop both lines from commit `e8d09f4`:
|
||||
|
||||
- `"zha"` from `extraComponents` (replaced by `"otbr"` + `"thread"`).
|
||||
- `users.users.hass.extraGroups = [ "dialout" ];` — `otbr-agent` runs as root and owns the device directly; HA never opens the serial port itself.
|
||||
|
||||
Done by `git revert e8d09f4` at the start of implementation, before applying the new diff.
|
||||
|
||||
### Decisions captured
|
||||
|
||||
- **No `universal-silabs-flasher` in `environment.systemPackages`.** Flashing is a once-or-twice-a-year operation; `nix shell nixpkgs#python313Packages.universal-silabs-flasher` is sufficient when needed and avoids a perma-dep on a tool that's idle most of the time.
|
||||
- **No firmware pinning in the flake.** Consistent with option B (CLI-only manual flashing). The user fetches the `.gbl` from <https://github.com/NabuCasa/silabs-firmware-builder/releases> at update time.
|
||||
- **`backboneInterfaces = [ "enp3s0" ]`** as a starting value (per `machines/jupiter/hardware-configuration.nix:64`). To be verified against `ip link` after first deploy; correctable in a follow-up commit if the actual primary interface differs.
|
||||
|
||||
## Operator workflow
|
||||
|
||||
All commands the user runs themselves; nothing is SSH'd from the dev session.
|
||||
|
||||
### Step 0 — branch hygiene (dev Mac)
|
||||
```
|
||||
git switch feature/ha-zbt-2-thread # already renamed
|
||||
git revert --no-edit e8d09f4 # drops ZHA + dialout commit
|
||||
```
|
||||
|
||||
### Step 1 — apply the module changes (dev Mac)
|
||||
Edit `modules/environments/home-assistant/default.nix` per the sketch above. Leave `<serial>` as a placeholder; fill after Step 3.
|
||||
|
||||
### Step 2 — eval-only sanity check (dev Mac)
|
||||
```
|
||||
nix flake check
|
||||
```
|
||||
or, equivalently,
|
||||
```
|
||||
nixos-rebuild dry-build --flake .#jupiter
|
||||
```
|
||||
Catches: bad import path, option typos, version skew between unstable and stable.
|
||||
|
||||
### Step 3 — plug ZBT-2 into jupiter (still on stock Zigbee firmware)
|
||||
On jupiter:
|
||||
```
|
||||
ls -l /dev/serial/by-id/
|
||||
```
|
||||
Then on dev Mac: copy the full `usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial>-...` path into `radio.device`, commit on the feature branch.
|
||||
|
||||
### Step 4 — flash OpenThread RCP firmware (one-time, on jupiter)
|
||||
```
|
||||
nix shell nixpkgs#python313Packages.universal-silabs-flasher -c \
|
||||
universal-silabs-flasher \
|
||||
--device /dev/serial/by-id/usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial>-... \
|
||||
flash --firmware ~/ot-rcp-zbt-2-<version>.gbl
|
||||
```
|
||||
Firmware download: latest ZBT-2 OpenThread RCP `.gbl` from <https://github.com/NabuCasa/silabs-firmware-builder/releases>.
|
||||
|
||||
OTBR isn't running yet at this point, so there's no contention on the device.
|
||||
|
||||
### Step 5 — rebuild (on jupiter)
|
||||
```
|
||||
sudo nixos-rebuild switch --flake .#jupiter
|
||||
```
|
||||
Brings up `otbr-agent.service`, opens TCP/8081, loads `otbr` + `thread` integrations in HA.
|
||||
|
||||
### Step 6 — confirm HA discovered it
|
||||
- `http://jupiter:8123` → Settings → Devices & Services → "Open Thread Border Router" appears as auto-discovered within ~30 s.
|
||||
- Click "Configure", form a new Thread network (or import an existing dataset).
|
||||
- "Matter" integration page now shows Thread credentials available.
|
||||
|
||||
### Step 7 — Matter-over-Thread smoke test
|
||||
Pair one Matter-over-Thread device end-to-end via the HA Companion app. Pairing should complete in 30–90 s. If it does, merge `feature/ha-zbt-2-thread` into `master`.
|
||||
|
||||
### Future updates
|
||||
Identical to Step 4: stop `otbr-agent.service`, run the flasher with a new `.gbl`, start the service.
|
||||
|
||||
## Failure modes
|
||||
|
||||
| Symptom | Likely cause | Mitigation |
|
||||
|---|---|---|
|
||||
| `otbr-agent.service` fails: "Failed to open device" | Dongle unplugged or `radio.device` path stale (e.g. after replacement) | Module sets `Restart = "on-failure"`; check `systemctl status otbr-agent`, re-check `/dev/serial/by-id/`, update path. |
|
||||
| OTBR up but HA never discovers it | mDNS not propagating on `enp3s0` (most often: `backboneInterfaces` wrong) | `avahi-browse -r _meshcop._udp` should show one entry. If not: `ip link`, fix `backboneInterfaces`, rebuild. |
|
||||
| HA shows OTBR but Matter pairing times out | Thread mesh prefix not routed to LAN, or matter-server can't reach the device's IPv6 ULA | `nft list ruleset` should show OTBR's forwarding rules; `ip -6 route` should include the Thread mesh prefix. |
|
||||
| Dongle stuck after a half-completed flash | Flasher interrupted mid-write | Re-run the flash; bootloader stays addressable even if RCP firmware is corrupt. The tool detects bootloader-mode automatically. |
|
||||
| `nixos-rebuild` fails: "option `services.openthread-border-router` does not exist" | Unstable module import path wrong / not in scope | Caught by Step 2 (eval-only). Fix before deploy. |
|
||||
|
||||
## Verification
|
||||
|
||||
### Eval-only (dev Mac, before deploy)
|
||||
```
|
||||
nix flake check
|
||||
nix eval --json .#nixosConfigurations.jupiter.config.services.openthread-border-router.radio.url
|
||||
nix eval --json .#nixosConfigurations.jupiter.config.services.home-assistant.extraComponents
|
||||
```
|
||||
Expected: flake check passes; `radio.url` is a `spinel+hdlc+uart://...` string built from the by-id path; `extraComponents` includes `"otbr"` and `"thread"`.
|
||||
|
||||
### Service-level (jupiter, after rebuild)
|
||||
```
|
||||
systemctl status otbr-agent.service
|
||||
journalctl -u otbr-agent.service -n 50 --no-pager
|
||||
ip link show wpan0
|
||||
avahi-browse -r -t _meshcop._udp
|
||||
curl -s http://127.0.0.1:8081/node/state
|
||||
```
|
||||
Expected: service active; `wpan0` exists (DOWN until HA forms a network — correct); one `_meshcop._udp` entry; REST returns a JSON state string.
|
||||
|
||||
### Functional (HA UI)
|
||||
- "Open Thread Border Router" appears under auto-discovered integrations.
|
||||
- Forming a Thread network from the integration UI succeeds.
|
||||
- Pairing one Matter-over-Thread device end-to-end succeeds.
|
||||
|
||||
## Open questions / risks
|
||||
|
||||
- **Unstable module ABI.** The `services.openthread-border-router` module is in `nixos-unstable` and may change shape before landing in 26.05. If options rename, the eval-only step catches it before deploy. Acceptable risk; we can pin the unstable input revision if churn becomes annoying.
|
||||
- **Backbone interface name.** `enp3s0` is a best guess from `hardware-configuration.nix:64`'s commented-out line. Definitive answer comes from `ip link` on the actual host. Trivial to correct if wrong.
|
||||
- **First-flash chicken-and-egg.** Deferred to `nix shell` rather than baked into the system, because the dongle must be flashed *before* `otbr-agent` claims it. This is documented in Step 4.
|
||||
Reference in New Issue
Block a user