Files
nixos/docs/superpowers/plans/2026-05-10-zbt2-thread-otbr.md
marthsincemelee 311e358d88 docs(plan): correct Task 2 scope — specialArgs needed for self in imports
The original plan claimed no flake-level changes were needed because
machines/configuration.nix:21 already passes `_module.args.self = self;`.
That's only true for `config`-time evaluation; `imports` are collected
before `config` is available, so referencing `self` in `imports` causes
infinite recursion. Fix: promote `self` to `specialArgs` on each
nixosSystem call. The implementer of Task 2 caught this on first
dispatch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 15:52:20 +02:00

608 lines
21 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ZBT-2 Thread + OTBR Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Run the Home Assistant Connect ZBT-2 as an OpenThread Border Router on `jupiter`, fully integrated with the existing native `services.home-assistant` + `services.matter-server` stack so Matter-over-Thread devices commission through the dongle.
**Architecture:** Single NixOS module file (`modules/environments/home-assistant/default.nix`) is edited to import the `services.openthread-border-router` module from `nixos-unstable` (not yet in 25.11 stable), enable it against the ZBT-2's `/dev/serial/by-id/...` path, and add HA's `otbr` + `thread` extra components. The previous ZHA-direction commit on this branch is reverted first. The dongle is one-time-flashed from Zigbee NCP firmware to OpenThread RCP firmware via `universal-silabs-flasher` outside the NixOS lifecycle (per design decision: option B, CLI-only).
**Tech Stack:** Nix flakes (flake-parts), NixOS 25.11 stable + nixos-unstable, `services.openthread-border-router`, `services.home-assistant`, `services.matter-server`, `python313Packages.universal-silabs-flasher`.
**Spec:** [`docs/superpowers/specs/2026-05-10-zbt2-thread-otbr-design.md`](../specs/2026-05-10-zbt2-thread-otbr-design.md) — read this before starting.
**User feedback rules in force:**
- Never commit to `master`; this branch is `feature/ha-zbt-2-thread`. Final merge happens at the end via PR or operator-driven merge.
- Do not SSH to `jupiter`. All commands targeting jupiter are operator handoffs — present the command, the user runs it and pastes output back.
---
## File Map
| Action | File | Responsibility |
|--------|------|----------------|
| Modify | `modules/environments/home-assistant/default.nix` | Import unstable OTBR module; enable OTBR for the ZBT-2; add `otbr` + `thread` HA components |
| Modify | `machines/configuration.nix` | Pass `self` via `specialArgs` so it's available during NixOS module **imports** evaluation (not just config) |
| Create (auto) | _(no new files)_ | All work fits in the two modules |
The `git revert` of `e8d09f4` automatically un-modifies the home-assistant module (drops `"zha"` and the `dialout` line). No host-level (`machines/jupiter/`) changes.
**Why the flake-level edit is needed:** the existing `_module.args.self = self;` in `machines/configuration.nix:21` makes `self` available in module bodies (option definitions, `config` blocks). It does **not** make `self` available during `imports` evaluation — `_module.args` is resolved from `config`, but `imports` are collected **before** `config` is evaluated, so `self` in `imports` causes an infinite recursion error. Promoting `self` to `specialArgs` short-circuits that and is the conventional fix.
---
## Validation Approach (instead of unit tests)
This is a NixOS configuration change; there's no test framework. We use `nix eval` against `nixosConfigurations.jupiter.config.*` as the equivalent of unit tests — assert option resolution **before** the change (red), then **after** the change (green). Functional / smoke tests happen post-`nixos-rebuild` on jupiter via systemctl, mDNS, and the HA UI.
All `nix eval` commands run on the dev Mac. All `systemctl` / `journalctl` / `nixos-rebuild` commands run on jupiter (operator handoff).
---
### Task 1: Revert the prior ZHA commit
**Files:**
- Modify: `modules/environments/home-assistant/default.nix` (via `git revert`)
- [ ] **Step 1: Verify pre-state**
On dev Mac, in the repo root:
```bash
git log --oneline -3
```
Expected: `dbeda27` (design spec) on top of `e8d09f4` (the ZHA commit) on top of `098e632`.
Also confirm current `extraComponents` includes `"zha"`:
```bash
nix eval --json .#nixosConfigurations.jupiter.config.services.home-assistant.extraComponents
```
Expected: `["matter","mobile_app","zha"]`
- [ ] **Step 2: Revert**
```bash
git revert --no-edit e8d09f4
```
Expected: revert commit created cleanly (no merge conflicts), single file changed.
- [ ] **Step 3: Verify post-state**
```bash
nix eval --json .#nixosConfigurations.jupiter.config.services.home-assistant.extraComponents
```
Expected: `["matter","mobile_app"]``zha` is gone.
```bash
nix eval --json .#nixosConfigurations.jupiter.config.users.users.hass.extraGroups
```
Expected: `[]``dialout` is gone.
```bash
git log --oneline -4
```
Expected: revert commit on top of `dbeda27` on top of `e8d09f4`.
(No explicit `git commit` step — `git revert` produced its own commit.)
---
### Task 2: Wire the unstable OTBR module import (still disabled)
This task gets the module into scope so options become available, but leaves `services.openthread-border-router.enable = false` (the default). The point is to confirm the import path works before adding device-specific config.
**Files:**
- Modify: `machines/configuration.nix` (add `specialArgs = { inherit self; };` to each `nixosSystem` call)
- Modify: `modules/environments/home-assistant/default.nix`
- [ ] **Step 1: Write the failing eval check**
On dev Mac:
```bash
nix eval --json .#nixosConfigurations.jupiter.options.services.openthread-border-router.enable.description 2>&1 | head -3
```
Expected: error containing `attribute 'openthread-border-router' missing` or similar — the option doesn't exist yet because the module isn't imported.
- [ ] **Step 1a: Promote `self` to `specialArgs` in `machines/configuration.nix`**
`self` must be reachable during `imports` evaluation (not just `config` evaluation). The existing `_module.args.self = self;` only covers `config`-time access. Edit each `nixosSystem` call (`jupiter` and `mibook`) to add `specialArgs`.
Current shape (lines 5056 and 5763):
```nix
jupiter = nixosSystem {
system = "x86_64-linux";
modules = defaultModules ++ [
# nixos-hardware.nixosModules.bmax-b7-power
./jupiter/configuration.nix
];
};
mibook = nixosSystem {
system = "x86_64-linux";
modules = defaultModules ++ [
# nixos-hardware.nixosModules.mibook
./mibook/configuration.nix
];
};
```
Add `specialArgs = { inherit self; };` to each:
```nix
jupiter = nixosSystem {
system = "x86_64-linux";
specialArgs = { inherit self; };
modules = defaultModules ++ [
# nixos-hardware.nixosModules.bmax-b7-power
./jupiter/configuration.nix
];
};
mibook = nixosSystem {
system = "x86_64-linux";
specialArgs = { inherit self; };
modules = defaultModules ++ [
# nixos-hardware.nixosModules.mibook
./mibook/configuration.nix
];
};
```
- [ ] **Step 2: Add `self` to the module's argument list and add the `imports` block**
Current header (`modules/environments/home-assistant/default.nix` lines 111):
```nix
# manages home automations
{
config,
lib,
pkgs,
...
}:
let
cfg = config.my.profiles.home-assistant;
hostName = config.networking.hostName;
in
```
Replace lines 111 with:
```nix
# manages home automations
{
config,
lib,
pkgs,
self,
...
}:
let
cfg = config.my.profiles.home-assistant;
hostName = config.networking.hostName;
in
```
Then, immediately after the opening brace on line 12 of the modified file (i.e. at the top of the attribute set body, before `options.my.profiles.home-assistant`), add:
```nix
imports = [
# services.openthread-border-router isn't in nixos-25.11; pull from
# nixpkgs-unstable. Package comes from the existing unstable overlay.
"${self.inputs.nixpkgs-unstable}/nixos/modules/services/home-automation/openthread-border-router.nix"
];
```
- [ ] **Step 3: Re-run the eval check**
```bash
nix eval --json .#nixosConfigurations.jupiter.options.services.openthread-border-router.enable.description 2>&1 | head -3
```
Expected: a JSON string describing the option (e.g. `"Whether to enable the OpenThread Border Router."`).
- [ ] **Step 4: Verify the service is currently disabled**
```bash
nix eval --json .#nixosConfigurations.jupiter.config.services.openthread-border-router.enable
```
Expected: `false`.
- [ ] **Step 5: Verify whole config still evaluates**
```bash
nix eval .#nixosConfigurations.jupiter.config.system.build.toplevel.drvPath
```
Expected: a `/nix/store/...drv` path. Pre-existing trace warnings (the `*.service ordered after network-online.target` ones) are fine; no errors.
- [ ] **Step 6: Commit**
```bash
git add machines/configuration.nix modules/environments/home-assistant/default.nix
git commit -m "$(cat <<'EOF'
feat(home-assistant): import openthread-border-router module from unstable
Pulls the services.openthread-border-router NixOS module directly from
nixpkgs-unstable since it isn't in 25.11 yet. Service stays disabled
in this commit; configuration follows.
Also promotes `self` from `_module.args` to `specialArgs` in
machines/configuration.nix, since `imports` are evaluated before
`config` and so can't reach `_module.args.self`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
EOF
)"
```
---
### Task 3: Operator handoff — get the ZBT-2 device path from jupiter
This task has no code. It collects the runtime parameter (USB serial number) that Task 4 needs.
**Files:** _(none)_
- [ ] **Step 1: Hand off**
Tell the operator:
> "Plug the ZBT-2 into a USB-2 port on jupiter (it's still on stock Zigbee firmware — that's fine for this step). Then run `ls -l /dev/serial/by-id/` on jupiter and paste the full output back. We're after the line that contains `Nabu_Casa_Home_Assistant_Connect_ZBT-2`."
- [ ] **Step 2: Wait for the operator's pasted output**
Expected shape: a line like
`lrwxrwxrwx 1 root root 13 May 10 14:30 usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial-string>-if00 -> ../../ttyACM0`
- [ ] **Step 3: Record the by-id path**
Capture the value `/dev/serial/by-id/usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial-string>-if00` for use in Task 4. Use the **by-id** path (not `/dev/ttyACM0`) so USB renumbering can't break OTBR.
---
### Task 4: Enable OTBR + add HA otbr/thread components
**Files:**
- Modify: `modules/environments/home-assistant/default.nix`
- [ ] **Step 1: Write the failing eval checks**
On dev Mac:
```bash
nix eval --json .#nixosConfigurations.jupiter.config.services.openthread-border-router.enable
```
Expected: `false` (still disabled from Task 2).
```bash
nix eval --json .#nixosConfigurations.jupiter.config.services.home-assistant.extraComponents
```
Expected: `["matter","mobile_app"]` — no `otbr`, no `thread` yet.
- [ ] **Step 2: Add `"otbr"` and `"thread"` to `extraComponents`**
In `modules/environments/home-assistant/default.nix`, locate the `extraComponents` list (currently `[ "matter" "mobile_app" ]`) and replace it with:
```nix
extraComponents = [
"matter"
"mobile_app"
"otbr"
"thread"
];
```
- [ ] **Step 3: Add the `services.openthread-border-router` block**
In the same file, **after** the `services.home-assistant.config = { ... };` block and **before** `my.homepage.services`, add:
```nix
services.openthread-border-router = {
enable = true;
package = pkgs.unstable.openthread-border-router;
openFirewall = true;
backboneInterfaces = [ "enp3s0" ];
radio.device = "<PASTE-BY-ID-PATH-FROM-TASK-3>";
};
```
Replace `<PASTE-BY-ID-PATH-FROM-TASK-3>` with the literal string captured in Task 3 step 3 (e.g. `"/dev/serial/by-id/usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_AB12CD34-if00"`).
- [ ] **Step 4: Run the green eval checks**
```bash
nix eval --json .#nixosConfigurations.jupiter.config.services.home-assistant.extraComponents
```
Expected: `["matter","mobile_app","otbr","thread"]`.
```bash
nix eval --json .#nixosConfigurations.jupiter.config.services.openthread-border-router.enable
```
Expected: `true`.
```bash
nix eval --raw .#nixosConfigurations.jupiter.config.services.openthread-border-router.radio.url
```
Expected: a string like `spinel+hdlc+uart:///dev/serial/by-id/usb-Nabu_Casa_..._ZBT-2_<serial>-if00?uart-baudrate=115200` (the module composes this from `radio.device` automatically).
- [ ] **Step 5: Full eval — system derivation must build**
```bash
nix eval .#nixosConfigurations.jupiter.config.system.build.toplevel.drvPath
```
Expected: a `/nix/store/...drv` path with no eval errors.
- [ ] **Step 6: `nix flake check` for good measure**
```bash
nix flake check
```
Expected: no errors. (Same pre-existing trace warnings as before are acceptable.)
- [ ] **Step 7: Commit**
```bash
git add modules/environments/home-assistant/default.nix
git commit -m "$(cat <<'EOF'
feat(home-assistant): enable OTBR for ZBT-2 + add HA otbr/thread components
Brings up otbr-agent against the ZBT-2 over Spinel/UART, opens the
REST API on :8081, and wires HA's otbr + thread integrations so
Matter-over-Thread devices can commission through the existing
matter-server.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
EOF
)"
```
---
### Task 5: Operator handoff — flash OpenThread RCP firmware on the dongle
The dongle is currently running Zigbee NCP firmware and won't speak Spinel until reflashed. This must happen **before** Task 6's rebuild (otherwise `otbr-agent` will try to talk to a Zigbee-firmware dongle and fail).
**Files:** _(none on dev Mac)_
- [ ] **Step 1: Hand off — fetch firmware**
Tell the operator:
> "On any machine with a browser: download the latest **ZBT-2 OpenThread RCP** `.gbl` from <https://github.com/NabuCasa/silabs-firmware-builder/releases>. The asset name will look like `ot-rcp-zbt-2-<version>.gbl`. Get it onto jupiter — `scp` it over, or just `curl` from jupiter's shell. Confirm by running `ls ~/ot-rcp-zbt-2-*.gbl` on jupiter and pasting the result."
- [ ] **Step 2: Wait for confirmation**
Expected: a single matching path, e.g. `/home/finn/ot-rcp-zbt-2-2025.10.0.gbl`.
- [ ] **Step 3: Hand off — flash**
Tell the operator:
> "On jupiter, run (substituting the actual by-id path from Task 3 and the actual `.gbl` filename):
>
> ```bash
> nix shell nixpkgs#python313Packages.universal-silabs-flasher -c \
> universal-silabs-flasher \
> --device /dev/serial/by-id/usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial>-if00 \
> flash --firmware ~/ot-rcp-zbt-2-<version>.gbl
> ```
>
> Paste the full output. Expected duration: ~30 seconds. The tool detects the running firmware, drops the dongle into bootloader mode, writes the `.gbl`, and reboots back to RCP."
- [ ] **Step 4: Verify the flash succeeded**
Expected output ends with something like `Firmware update complete` (or equivalent success message). If the tool reports CRC failure / partial write — re-run; the bootloader stays addressable.
If the operator reports `--help` shows different subcommand syntax (universal-silabs-flasher's CLI has changed across versions), have them check `universal-silabs-flasher --help` and adapt — but the `flash --firmware <path>` form has been stable since 1.0.x.
---
### Task 6: Operator handoff — `nixos-rebuild switch` on jupiter
**Files:** _(none on dev Mac)_
- [ ] **Step 1: Push the branch so jupiter can fetch it**
On dev Mac:
```bash
git push -u origin feature/ha-zbt-2-thread
```
(If the operator pulls via a different mechanism — local checkout, fileshare — adapt accordingly. The standard pattern in this repo is `git pull` on jupiter.)
- [ ] **Step 2: Hand off — pull + rebuild**
Tell the operator:
> "On jupiter:
>
> ```bash
> cd ~/development/nixos # or wherever the flake lives on jupiter
> git fetch origin
> git checkout feature/ha-zbt-2-thread
> sudo nixos-rebuild switch --flake .#jupiter
> ```
>
> Paste the tail of the output (everything from the first `building ...` line onward). Expected: build completes, switch to the new generation, no errors."
- [ ] **Step 3: Verify the switch succeeded**
If the operator's pasted output includes `error:` or the switch failed mid-activation, **stop here**. Common failure: option name mismatch with whatever version of nixos-unstable is locked in the flake. Fix on dev Mac, push, ask operator to pull + rebuild again.
If the rebuild succeeded, proceed to Task 7.
---
### Task 7: Operator handoff — service-level verification on jupiter
**Files:** _(none)_
- [ ] **Step 1: Hand off — service health**
Tell the operator:
> "On jupiter, run each command and paste output:
>
> ```bash
> systemctl status otbr-agent.service --no-pager
> journalctl -u otbr-agent.service -n 50 --no-pager
> ip link show wpan0
> ```"
- [ ] **Step 2: Verify**
Expected:
- `systemctl status` reports `active (running)`.
- `journalctl` shows OTBR startup messages, no repeated restart loops.
- `ip link show wpan0` shows the interface exists; state DOWN is correct (HA hasn't formed a network yet).
If `otbr-agent` is in restart loop with `Failed to open device`: device path mismatch. Re-check Task 3's path.
- [ ] **Step 3: Hand off — mDNS publication**
Tell the operator:
> "On jupiter:
>
> ```bash
> avahi-browse -r -t _meshcop._udp
> ```"
Expected: one entry whose hostname matches jupiter, advertising port 8081.
If empty: `backboneInterfaces` is wrong. On jupiter, run `ip link show` and tell operator to paste; pick the actual primary LAN interface, update `backboneInterfaces`, re-rebuild.
- [ ] **Step 4: Hand off — REST API reachability**
Tell the operator:
> "On jupiter:
>
> ```bash
> curl -s http://127.0.0.1:8081/node/state
> ```"
Expected: a JSON state string, most likely `"disabled"` (HA hasn't formed a network yet).
If connection refused: OTBR isn't actually listening — re-check `journalctl`.
---
### Task 8: Operator handoff — HA UI smoke test
**Files:** _(none)_
- [ ] **Step 1: Hand off — confirm discovery**
Tell the operator:
> "Open `http://jupiter:8123` in a browser. Go to **Settings → Devices & Services**. Within ~30s of the rebuild, you should see **'Open Thread Border Router'** under 'Discovered'. Click **Configure**. Let HA form a new Thread network (or import existing dataset if you have one). Tell me when that's done — and paste any errors if it doesn't work."
- [ ] **Step 2: Wait for confirmation**
Expected: HA reports the Thread network is formed; the OTBR integration appears under 'Configured'.
If discovery doesn't happen: cross-check with Task 7 step 3 (`avahi-browse`). HA reads from the system's avahi cache.
- [ ] **Step 3: Hand off — Matter-over-Thread pairing**
Tell the operator:
> "Pick one Matter-over-Thread device. Use the HA Companion app, scan its Matter QR code, and follow the prompts. Tell me when it's paired — or paste any errors. Pairing should complete in 3090s."
- [ ] **Step 4: Wait for confirmation**
Expected: device appears under both Matter and Thread integrations in HA, and is controllable from the dashboard.
If pairing times out: see "Failure modes" table in the spec — most likely Thread mesh prefix isn't routed back to LAN. Operator runs `nft list ruleset` and `ip -6 route` on jupiter; debug from there.
---
### Task 9: Merge to master
**Files:** _(none)_
- [ ] **Step 1: Final branch state**
On dev Mac:
```bash
git log --oneline master..feature/ha-zbt-2-thread
```
Expected (in chronological order from oldest to newest):
1. `e8d09f4` — original ZHA commit
2. `dbeda27` — design spec
3. `<revert hash>` — Revert "feat(home-assistant): enable ZHA for ZBT-2 Zigbee dongle"
4. `<task-2 hash>` — feat(home-assistant): import openthread-border-router module from unstable
5. `<task-4 hash>` — feat(home-assistant): enable OTBR for ZBT-2 + add HA otbr/thread components
That's a fine history to merge as-is (the ZHA→revert pair is honest about the pivot).
- [ ] **Step 2: Hand off — merge**
The user runs the merge themselves (per repo policy: never commit to master without explicit consent). Tell the operator:
> "If the smoke tests in Task 8 worked, merge with:
>
> ```bash
> git switch master
> git merge --no-ff feature/ha-zbt-2-thread
> git push origin master
> ```
>
> Or open a merge request / PR if you prefer review first."
- [ ] **Step 3: Optional cleanup**
After merge:
```bash
git branch -d feature/ha-zbt-2-thread
git push origin --delete feature/ha-zbt-2-thread
```
---
## Self-Review
**Spec coverage:**
- Goals (4 bullets) → Tasks 2 (OTBR module wiring), 4 (OTBR enable + HA components), 5 (firmware flash), 8 (Matter-over-Thread smoke test) ✓
- Non-goals → respected; no multipan, no auto-flash, no fallback paths ✓
- Architecture diagram → Task 4 produces the wiring shown; Tasks 68 verify it ✓
- File changes (one module) → Tasks 1, 2, 4 ✓
- Reverts of prior ZHA commit → Task 1 ✓
- Operator workflow steps 07 → Tasks 1, 2, 3, 4, 5, 6, 7, 8 ✓
- Verification (eval-only / service-level / functional) → Tasks 2/4/6/7/8 ✓
- Failure-mode table → referenced in Tasks 6, 7, 8 for triage ✓
**Placeholder scan:**
- `<PASTE-BY-ID-PATH-FROM-TASK-3>` in Task 4 step 3 is intentional — it's a runtime parameter the operator fills in, captured in Task 3.
- `<serial>`, `<version>` in shell commands are intentional placeholders for operator substitution.
- No "TBD", "TODO", "implement later", or vague "handle errors" steps.
**Type / name consistency:**
- `services.openthread-border-router` used consistently (matches the unstable module's option path).
- `pkgs.unstable.openthread-border-router` matches the overlay (`machines/configuration.nix:11`).
- `extraComponents` strings (`"otbr"`, `"thread"`) match HA Core integration names.
- `radio.device``radio.url` relationship documented (module composes `url` from `device`).