docs(home-assistant): implementation plan for ZBT-2 Thread + OTBR

Task-by-task plan covering: revert of prior ZHA commit, unstable
OTBR module import, OTBR enablement against the ZBT-2, firmware
flash via universal-silabs-flasher, rebuild on jupiter, and
end-to-end smoke test through the HA UI.

Designed for execution via superpowers:subagent-driven-development
or superpowers:executing-plans, with operator handoffs marked
explicitly (per the 'no SSH' workflow rule).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
marthsincemelee
2026-05-10 15:36:12 +02:00
parent dbeda276e1
commit 6d12940205
@@ -0,0 +1,555 @@
# ZBT-2 Thread + OTBR Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Run the Home Assistant Connect ZBT-2 as an OpenThread Border Router on `jupiter`, fully integrated with the existing native `services.home-assistant` + `services.matter-server` stack so Matter-over-Thread devices commission through the dongle.
**Architecture:** Single NixOS module file (`modules/environments/home-assistant/default.nix`) is edited to import the `services.openthread-border-router` module from `nixos-unstable` (not yet in 25.11 stable), enable it against the ZBT-2's `/dev/serial/by-id/...` path, and add HA's `otbr` + `thread` extra components. The previous ZHA-direction commit on this branch is reverted first. The dongle is one-time-flashed from Zigbee NCP firmware to OpenThread RCP firmware via `universal-silabs-flasher` outside the NixOS lifecycle (per design decision: option B, CLI-only).
**Tech Stack:** Nix flakes (flake-parts), NixOS 25.11 stable + nixos-unstable, `services.openthread-border-router`, `services.home-assistant`, `services.matter-server`, `python313Packages.universal-silabs-flasher`.
**Spec:** [`docs/superpowers/specs/2026-05-10-zbt2-thread-otbr-design.md`](../specs/2026-05-10-zbt2-thread-otbr-design.md) — read this before starting.
**User feedback rules in force:**
- Never commit to `master`; this branch is `feature/ha-zbt-2-thread`. Final merge happens at the end via PR or operator-driven merge.
- Do not SSH to `jupiter`. All commands targeting jupiter are operator handoffs — present the command, the user runs it and pastes output back.
---
## File Map
| Action | File | Responsibility |
|--------|------|----------------|
| Modify | `modules/environments/home-assistant/default.nix` | Import unstable OTBR module; enable OTBR for the ZBT-2; add `otbr` + `thread` HA components |
| Create (auto) | _(no new files)_ | All work fits in the one module |
The `git revert` of `e8d09f4` automatically un-modifies the same file (drops `"zha"` and the `dialout` line). No host-level (`machines/jupiter/`) changes; no flake-level changes (the existing `_module.args.self = self;` in `machines/configuration.nix:21` already exposes `self.inputs.nixpkgs-unstable` to every module).
---
## Validation Approach (instead of unit tests)
This is a NixOS configuration change; there's no test framework. We use `nix eval` against `nixosConfigurations.jupiter.config.*` as the equivalent of unit tests — assert option resolution **before** the change (red), then **after** the change (green). Functional / smoke tests happen post-`nixos-rebuild` on jupiter via systemctl, mDNS, and the HA UI.
All `nix eval` commands run on the dev Mac. All `systemctl` / `journalctl` / `nixos-rebuild` commands run on jupiter (operator handoff).
---
### Task 1: Revert the prior ZHA commit
**Files:**
- Modify: `modules/environments/home-assistant/default.nix` (via `git revert`)
- [ ] **Step 1: Verify pre-state**
On dev Mac, in the repo root:
```bash
git log --oneline -3
```
Expected: `dbeda27` (design spec) on top of `e8d09f4` (the ZHA commit) on top of `098e632`.
Also confirm current `extraComponents` includes `"zha"`:
```bash
nix eval --json .#nixosConfigurations.jupiter.config.services.home-assistant.extraComponents
```
Expected: `["matter","mobile_app","zha"]`
- [ ] **Step 2: Revert**
```bash
git revert --no-edit e8d09f4
```
Expected: revert commit created cleanly (no merge conflicts), single file changed.
- [ ] **Step 3: Verify post-state**
```bash
nix eval --json .#nixosConfigurations.jupiter.config.services.home-assistant.extraComponents
```
Expected: `["matter","mobile_app"]``zha` is gone.
```bash
nix eval --json .#nixosConfigurations.jupiter.config.users.users.hass.extraGroups
```
Expected: `[]``dialout` is gone.
```bash
git log --oneline -4
```
Expected: revert commit on top of `dbeda27` on top of `e8d09f4`.
(No explicit `git commit` step — `git revert` produced its own commit.)
---
### Task 2: Wire the unstable OTBR module import (still disabled)
This task gets the module into scope so options become available, but leaves `services.openthread-border-router.enable = false` (the default). The point is to confirm the import path works before adding device-specific config.
**Files:**
- Modify: `modules/environments/home-assistant/default.nix`
- [ ] **Step 1: Write the failing eval check**
On dev Mac:
```bash
nix eval --json .#nixosConfigurations.jupiter.options.services.openthread-border-router.enable.description 2>&1 | head -3
```
Expected: error containing `attribute 'openthread-border-router' missing` or similar — the option doesn't exist yet because the module isn't imported.
- [ ] **Step 2: Add `self` to the module's argument list and add the `imports` block**
Current header (`modules/environments/home-assistant/default.nix` lines 111):
```nix
# manages home automations
{
config,
lib,
pkgs,
...
}:
let
cfg = config.my.profiles.home-assistant;
hostName = config.networking.hostName;
in
```
Replace lines 111 with:
```nix
# manages home automations
{
config,
lib,
pkgs,
self,
...
}:
let
cfg = config.my.profiles.home-assistant;
hostName = config.networking.hostName;
in
```
Then, immediately after the opening brace on line 12 of the modified file (i.e. at the top of the attribute set body, before `options.my.profiles.home-assistant`), add:
```nix
imports = [
# services.openthread-border-router isn't in nixos-25.11; pull from
# nixpkgs-unstable. Package comes from the existing unstable overlay.
"${self.inputs.nixpkgs-unstable}/nixos/modules/services/home-automation/openthread-border-router.nix"
];
```
- [ ] **Step 3: Re-run the eval check**
```bash
nix eval --json .#nixosConfigurations.jupiter.options.services.openthread-border-router.enable.description 2>&1 | head -3
```
Expected: a JSON string describing the option (e.g. `"Whether to enable the OpenThread Border Router."`).
- [ ] **Step 4: Verify the service is currently disabled**
```bash
nix eval --json .#nixosConfigurations.jupiter.config.services.openthread-border-router.enable
```
Expected: `false`.
- [ ] **Step 5: Verify whole config still evaluates**
```bash
nix eval .#nixosConfigurations.jupiter.config.system.build.toplevel.drvPath
```
Expected: a `/nix/store/...drv` path. Pre-existing trace warnings (the `*.service ordered after network-online.target` ones) are fine; no errors.
- [ ] **Step 6: Commit**
```bash
git add modules/environments/home-assistant/default.nix
git commit -m "$(cat <<'EOF'
feat(home-assistant): import openthread-border-router module from unstable
Pulls the services.openthread-border-router NixOS module directly from
nixpkgs-unstable since it isn't in 25.11 yet. Service stays disabled
in this commit; configuration follows.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
EOF
)"
```
---
### Task 3: Operator handoff — get the ZBT-2 device path from jupiter
This task has no code. It collects the runtime parameter (USB serial number) that Task 4 needs.
**Files:** _(none)_
- [ ] **Step 1: Hand off**
Tell the operator:
> "Plug the ZBT-2 into a USB-2 port on jupiter (it's still on stock Zigbee firmware — that's fine for this step). Then run `ls -l /dev/serial/by-id/` on jupiter and paste the full output back. We're after the line that contains `Nabu_Casa_Home_Assistant_Connect_ZBT-2`."
- [ ] **Step 2: Wait for the operator's pasted output**
Expected shape: a line like
`lrwxrwxrwx 1 root root 13 May 10 14:30 usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial-string>-if00 -> ../../ttyACM0`
- [ ] **Step 3: Record the by-id path**
Capture the value `/dev/serial/by-id/usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial-string>-if00` for use in Task 4. Use the **by-id** path (not `/dev/ttyACM0`) so USB renumbering can't break OTBR.
---
### Task 4: Enable OTBR + add HA otbr/thread components
**Files:**
- Modify: `modules/environments/home-assistant/default.nix`
- [ ] **Step 1: Write the failing eval checks**
On dev Mac:
```bash
nix eval --json .#nixosConfigurations.jupiter.config.services.openthread-border-router.enable
```
Expected: `false` (still disabled from Task 2).
```bash
nix eval --json .#nixosConfigurations.jupiter.config.services.home-assistant.extraComponents
```
Expected: `["matter","mobile_app"]` — no `otbr`, no `thread` yet.
- [ ] **Step 2: Add `"otbr"` and `"thread"` to `extraComponents`**
In `modules/environments/home-assistant/default.nix`, locate the `extraComponents` list (currently `[ "matter" "mobile_app" ]`) and replace it with:
```nix
extraComponents = [
"matter"
"mobile_app"
"otbr"
"thread"
];
```
- [ ] **Step 3: Add the `services.openthread-border-router` block**
In the same file, **after** the `services.home-assistant.config = { ... };` block and **before** `my.homepage.services`, add:
```nix
services.openthread-border-router = {
enable = true;
package = pkgs.unstable.openthread-border-router;
openFirewall = true;
backboneInterfaces = [ "enp3s0" ];
radio.device = "<PASTE-BY-ID-PATH-FROM-TASK-3>";
};
```
Replace `<PASTE-BY-ID-PATH-FROM-TASK-3>` with the literal string captured in Task 3 step 3 (e.g. `"/dev/serial/by-id/usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_AB12CD34-if00"`).
- [ ] **Step 4: Run the green eval checks**
```bash
nix eval --json .#nixosConfigurations.jupiter.config.services.home-assistant.extraComponents
```
Expected: `["matter","mobile_app","otbr","thread"]`.
```bash
nix eval --json .#nixosConfigurations.jupiter.config.services.openthread-border-router.enable
```
Expected: `true`.
```bash
nix eval --raw .#nixosConfigurations.jupiter.config.services.openthread-border-router.radio.url
```
Expected: a string like `spinel+hdlc+uart:///dev/serial/by-id/usb-Nabu_Casa_..._ZBT-2_<serial>-if00?uart-baudrate=115200` (the module composes this from `radio.device` automatically).
- [ ] **Step 5: Full eval — system derivation must build**
```bash
nix eval .#nixosConfigurations.jupiter.config.system.build.toplevel.drvPath
```
Expected: a `/nix/store/...drv` path with no eval errors.
- [ ] **Step 6: `nix flake check` for good measure**
```bash
nix flake check
```
Expected: no errors. (Same pre-existing trace warnings as before are acceptable.)
- [ ] **Step 7: Commit**
```bash
git add modules/environments/home-assistant/default.nix
git commit -m "$(cat <<'EOF'
feat(home-assistant): enable OTBR for ZBT-2 + add HA otbr/thread components
Brings up otbr-agent against the ZBT-2 over Spinel/UART, opens the
REST API on :8081, and wires HA's otbr + thread integrations so
Matter-over-Thread devices can commission through the existing
matter-server.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
EOF
)"
```
---
### Task 5: Operator handoff — flash OpenThread RCP firmware on the dongle
The dongle is currently running Zigbee NCP firmware and won't speak Spinel until reflashed. This must happen **before** Task 6's rebuild (otherwise `otbr-agent` will try to talk to a Zigbee-firmware dongle and fail).
**Files:** _(none on dev Mac)_
- [ ] **Step 1: Hand off — fetch firmware**
Tell the operator:
> "On any machine with a browser: download the latest **ZBT-2 OpenThread RCP** `.gbl` from <https://github.com/NabuCasa/silabs-firmware-builder/releases>. The asset name will look like `ot-rcp-zbt-2-<version>.gbl`. Get it onto jupiter — `scp` it over, or just `curl` from jupiter's shell. Confirm by running `ls ~/ot-rcp-zbt-2-*.gbl` on jupiter and pasting the result."
- [ ] **Step 2: Wait for confirmation**
Expected: a single matching path, e.g. `/home/finn/ot-rcp-zbt-2-2025.10.0.gbl`.
- [ ] **Step 3: Hand off — flash**
Tell the operator:
> "On jupiter, run (substituting the actual by-id path from Task 3 and the actual `.gbl` filename):
>
> ```bash
> nix shell nixpkgs#python313Packages.universal-silabs-flasher -c \
> universal-silabs-flasher \
> --device /dev/serial/by-id/usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial>-if00 \
> flash --firmware ~/ot-rcp-zbt-2-<version>.gbl
> ```
>
> Paste the full output. Expected duration: ~30 seconds. The tool detects the running firmware, drops the dongle into bootloader mode, writes the `.gbl`, and reboots back to RCP."
- [ ] **Step 4: Verify the flash succeeded**
Expected output ends with something like `Firmware update complete` (or equivalent success message). If the tool reports CRC failure / partial write — re-run; the bootloader stays addressable.
If the operator reports `--help` shows different subcommand syntax (universal-silabs-flasher's CLI has changed across versions), have them check `universal-silabs-flasher --help` and adapt — but the `flash --firmware <path>` form has been stable since 1.0.x.
---
### Task 6: Operator handoff — `nixos-rebuild switch` on jupiter
**Files:** _(none on dev Mac)_
- [ ] **Step 1: Push the branch so jupiter can fetch it**
On dev Mac:
```bash
git push -u origin feature/ha-zbt-2-thread
```
(If the operator pulls via a different mechanism — local checkout, fileshare — adapt accordingly. The standard pattern in this repo is `git pull` on jupiter.)
- [ ] **Step 2: Hand off — pull + rebuild**
Tell the operator:
> "On jupiter:
>
> ```bash
> cd ~/development/nixos # or wherever the flake lives on jupiter
> git fetch origin
> git checkout feature/ha-zbt-2-thread
> sudo nixos-rebuild switch --flake .#jupiter
> ```
>
> Paste the tail of the output (everything from the first `building ...` line onward). Expected: build completes, switch to the new generation, no errors."
- [ ] **Step 3: Verify the switch succeeded**
If the operator's pasted output includes `error:` or the switch failed mid-activation, **stop here**. Common failure: option name mismatch with whatever version of nixos-unstable is locked in the flake. Fix on dev Mac, push, ask operator to pull + rebuild again.
If the rebuild succeeded, proceed to Task 7.
---
### Task 7: Operator handoff — service-level verification on jupiter
**Files:** _(none)_
- [ ] **Step 1: Hand off — service health**
Tell the operator:
> "On jupiter, run each command and paste output:
>
> ```bash
> systemctl status otbr-agent.service --no-pager
> journalctl -u otbr-agent.service -n 50 --no-pager
> ip link show wpan0
> ```"
- [ ] **Step 2: Verify**
Expected:
- `systemctl status` reports `active (running)`.
- `journalctl` shows OTBR startup messages, no repeated restart loops.
- `ip link show wpan0` shows the interface exists; state DOWN is correct (HA hasn't formed a network yet).
If `otbr-agent` is in restart loop with `Failed to open device`: device path mismatch. Re-check Task 3's path.
- [ ] **Step 3: Hand off — mDNS publication**
Tell the operator:
> "On jupiter:
>
> ```bash
> avahi-browse -r -t _meshcop._udp
> ```"
Expected: one entry whose hostname matches jupiter, advertising port 8081.
If empty: `backboneInterfaces` is wrong. On jupiter, run `ip link show` and tell operator to paste; pick the actual primary LAN interface, update `backboneInterfaces`, re-rebuild.
- [ ] **Step 4: Hand off — REST API reachability**
Tell the operator:
> "On jupiter:
>
> ```bash
> curl -s http://127.0.0.1:8081/node/state
> ```"
Expected: a JSON state string, most likely `"disabled"` (HA hasn't formed a network yet).
If connection refused: OTBR isn't actually listening — re-check `journalctl`.
---
### Task 8: Operator handoff — HA UI smoke test
**Files:** _(none)_
- [ ] **Step 1: Hand off — confirm discovery**
Tell the operator:
> "Open `http://jupiter:8123` in a browser. Go to **Settings → Devices & Services**. Within ~30s of the rebuild, you should see **'Open Thread Border Router'** under 'Discovered'. Click **Configure**. Let HA form a new Thread network (or import existing dataset if you have one). Tell me when that's done — and paste any errors if it doesn't work."
- [ ] **Step 2: Wait for confirmation**
Expected: HA reports the Thread network is formed; the OTBR integration appears under 'Configured'.
If discovery doesn't happen: cross-check with Task 7 step 3 (`avahi-browse`). HA reads from the system's avahi cache.
- [ ] **Step 3: Hand off — Matter-over-Thread pairing**
Tell the operator:
> "Pick one Matter-over-Thread device. Use the HA Companion app, scan its Matter QR code, and follow the prompts. Tell me when it's paired — or paste any errors. Pairing should complete in 3090s."
- [ ] **Step 4: Wait for confirmation**
Expected: device appears under both Matter and Thread integrations in HA, and is controllable from the dashboard.
If pairing times out: see "Failure modes" table in the spec — most likely Thread mesh prefix isn't routed back to LAN. Operator runs `nft list ruleset` and `ip -6 route` on jupiter; debug from there.
---
### Task 9: Merge to master
**Files:** _(none)_
- [ ] **Step 1: Final branch state**
On dev Mac:
```bash
git log --oneline master..feature/ha-zbt-2-thread
```
Expected (in chronological order from oldest to newest):
1. `e8d09f4` — original ZHA commit
2. `dbeda27` — design spec
3. `<revert hash>` — Revert "feat(home-assistant): enable ZHA for ZBT-2 Zigbee dongle"
4. `<task-2 hash>` — feat(home-assistant): import openthread-border-router module from unstable
5. `<task-4 hash>` — feat(home-assistant): enable OTBR for ZBT-2 + add HA otbr/thread components
That's a fine history to merge as-is (the ZHA→revert pair is honest about the pivot).
- [ ] **Step 2: Hand off — merge**
The user runs the merge themselves (per repo policy: never commit to master without explicit consent). Tell the operator:
> "If the smoke tests in Task 8 worked, merge with:
>
> ```bash
> git switch master
> git merge --no-ff feature/ha-zbt-2-thread
> git push origin master
> ```
>
> Or open a merge request / PR if you prefer review first."
- [ ] **Step 3: Optional cleanup**
After merge:
```bash
git branch -d feature/ha-zbt-2-thread
git push origin --delete feature/ha-zbt-2-thread
```
---
## Self-Review
**Spec coverage:**
- Goals (4 bullets) → Tasks 2 (OTBR module wiring), 4 (OTBR enable + HA components), 5 (firmware flash), 8 (Matter-over-Thread smoke test) ✓
- Non-goals → respected; no multipan, no auto-flash, no fallback paths ✓
- Architecture diagram → Task 4 produces the wiring shown; Tasks 68 verify it ✓
- File changes (one module) → Tasks 1, 2, 4 ✓
- Reverts of prior ZHA commit → Task 1 ✓
- Operator workflow steps 07 → Tasks 1, 2, 3, 4, 5, 6, 7, 8 ✓
- Verification (eval-only / service-level / functional) → Tasks 2/4/6/7/8 ✓
- Failure-mode table → referenced in Tasks 6, 7, 8 for triage ✓
**Placeholder scan:**
- `<PASTE-BY-ID-PATH-FROM-TASK-3>` in Task 4 step 3 is intentional — it's a runtime parameter the operator fills in, captured in Task 3.
- `<serial>`, `<version>` in shell commands are intentional placeholders for operator substitution.
- No "TBD", "TODO", "implement later", or vague "handle errors" steps.
**Type / name consistency:**
- `services.openthread-border-router` used consistently (matches the unstable module's option path).
- `pkgs.unstable.openthread-border-router` matches the overlay (`machines/configuration.nix:11`).
- `extraComponents` strings (`"otbr"`, `"thread"`) match HA Core integration names.
- `radio.device``radio.url` relationship documented (module composes `url` from `device`).