Files
nixos/docs/superpowers/specs/2026-05-10-zbt2-thread-otbr-design.md
marthsincemelee dbeda276e1 docs(home-assistant): design spec for ZBT-2 Thread + OTBR setup
Captures the architecture, operator workflow, and verification for
running the Connect ZBT-2 as an OpenThread Border Router on jupiter
(via nixos-unstable's services.openthread-border-router module),
with HA's otbr + thread integrations driving the Thread network
and the existing matter-server consuming credentials for
Matter-over-Thread device commissioning.

Supersedes the ZHA-direction commit on this branch (e8d09f4),
which will be reverted at the start of implementation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 15:29:21 +02:00

13 KiB
Raw Permalink Blame History

ZBT-2 as a Thread Border Router for Home Assistant on jupiter

Date: 2026-05-10 Branch: feature/ha-zbt-2-thread Status: Design — pending implementation plan

Context

Home Assistant on jupiter already runs natively (services.home-assistant) with the Matter integration and services.matter-server enabled, but has no Zigbee or Thread radio. The user has acquired a Home Assistant Connect ZBT-2 (Nabu Casa's Silicon Labs EFR32MG24based USB Zigbee/Thread radio).

The user wants the dongle running as an OpenThread Border Router (OTBR) — Thread only, not Zigbee — so Matter-over-Thread devices can be onboarded through the existing HA Matter integration.

A previous iteration of this work shipped zha enablement on the same branch (commit e8d09f4). That commit will be reverted as part of implementation; this design supersedes it.

Goals

  • Bring up otbr-agent on jupiter against the ZBT-2.
  • Have Home Assistant auto-discover the OTBR via mDNS and use its REST API to manage the Thread network.
  • Have services.matter-server (already enabled) consume Thread credentials from HA so Matter-over-Thread devices commission through the ZBT-2.
  • One-time, manual firmware flash from Zigbee NCP to OpenThread RCP via universal-silabs-flasher (option B from brainstorming — no HA-driven update flow).

Non-goals

  • Multipan / multiprotocol (Zigbee + Thread on one radio). Out of scope; the dongle will be Thread-only.
  • Falling back to ZHA if Thread misbehaves. Thread-only by choice; if it fails the response is to debug, not to dual-stack.
  • HA-UI-driven firmware updates. The HAOS "Silicon Labs Multiprotocol" add-on workflow doesn't translate to native NixOS without faking a supervisor; the user explicitly accepted CLI-only flashing.
  • Thread network credential backups. HA owns the dataset; standard HA backup hygiene (separate concern) covers it.

Architecture

                ┌────────────────────────── jupiter (NixOS) ──────────────────────────┐
                │                                                                      │
   ZBT-2 USB ──►│  /dev/serial/by-id/usb-Nabu_Casa_..._ZBT-2_<serial>-...             │
                │            │                                                         │
                │            │  spinel+hdlc+uart, 115200 baud                          │
                │            ▼                                                         │
                │      ┌───────────────┐  REST :8081 (loopback) ┌──────────────────┐   │
                │      │  otbr-agent   │ ◄─────────────────────►│ home-assistant   │   │
                │      │  (systemd)    │                         │ + matter-server │   │
                │      │  wpan0 ───────┼── advertises via ─┐     │ extraComponents:│   │
                │      └───────────────┘  avahi (_meshcop) │     │  matter,        │   │
                │                                          ▼     │  mobile_app,    │   │
                │                       enp3s0 (LAN — backbone)  │  otbr, thread   │   │
                │                                                └──────────────────┘   │
                └────────────────────────────────────┬──────────────────────────────────┘
                                                     │
                                          home LAN ◄─┘
                                          (Matter-over-Thread devices join here)

Components

  1. The radio. ZBT-2, USB-attached, running OpenThread RCP firmware after a one-time flash.
  2. otbr-agent (systemd). Managed by the unstable services.openthread-border-router NixOS module imported via inputs.nixpkgs-unstable. Owns wpan0, talks Spinel to the dongle, exposes the OTBR REST API on 127.0.0.1:8081, advertises _meshcop._udp over enp3s0 via avahi.
  3. Home Assistant (already running). Gains the otbr and thread extra components. Discovers OTBR via mDNS, drives the REST API, supplies Thread operational datasets to matter-server during Matter commissioning.

Data flows

  • OTBR ↔ ZBT-2: Spinel-over-HDLC over UART. Built automatically by the module from radio.device as spinel+hdlc+uart://<device>?uart-baudrate=115200.
  • HA ↔ OTBR: mDNS discovery (_meshcop._udp) → REST calls to 127.0.0.1:8081 for network management.
  • Matter commissioning: HA scans QR → matter-server does BLE commissioning → asks HA for Thread dataset → HA fetches from OTBR → ships to device → device joins Thread mesh through the ZBT-2.

HA never opens the serial port directly; matter-server never talks to OTBR directly. HA brokers between them — that's why all four extra components are needed.

NixOS-side changes

All changes live in modules/environments/home-assistant/default.nix. No host-level changes in machines/jupiter/ (the existing profile activation handles that), no flake-level changes (the existing _module.args.self = self; wiring is sufficient).

Edited module sketch

{ config, lib, pkgs, self, ... }:
let
  cfg = config.my.profiles.home-assistant;
  hostName = config.networking.hostName;
in
{
  imports = [
    # OTBR module isn't in 25.11 yet; use unstable's directly. Package
    # comes from the existing `unstable` overlay.
    "${self.inputs.nixpkgs-unstable}/nixos/modules/services/home-automation/openthread-border-router.nix"
  ];

  options.my.profiles.home-assistant.enable = lib.mkEnableOption "Home Automation";

  config = lib.mkIf cfg.enable {
    services.matter-server.enable = true;

    services.home-assistant = {
      enable = true;
      openFirewall = true;
      extraComponents = [
        "matter"
        "mobile_app"
        "otbr"
        "thread"
      ];
    };

    services.home-assistant.config = {
      name = "Home - Rechberg";
      unit_system = "metric";
      mobile_app = { };
    };

    services.openthread-border-router = {
      enable = true;
      package = pkgs.unstable.openthread-border-router;
      openFirewall = true;
      backboneInterfaces = [ "enp3s0" ];   # verify with `ip link` post-deploy
      radio.device = "/dev/serial/by-id/usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial>-...";
      # web.enable left default (off) — HA UI is the management surface
    };

    my.homepage.services = [
      {
        group = "Services";
        name = "Home Assistant";
        description = "Home automation";
        href = "http://${hostName}:8123";
        icon = "si-homeassistant";
      }
    ];
  };
}

Reverts of the prior ZHA commit

Drop both lines from commit e8d09f4:

  • "zha" from extraComponents (replaced by "otbr" + "thread").
  • users.users.hass.extraGroups = [ "dialout" ];otbr-agent runs as root and owns the device directly; HA never opens the serial port itself.

Done by git revert e8d09f4 at the start of implementation, before applying the new diff.

Decisions captured

  • No universal-silabs-flasher in environment.systemPackages. Flashing is a once-or-twice-a-year operation; nix shell nixpkgs#python313Packages.universal-silabs-flasher is sufficient when needed and avoids a perma-dep on a tool that's idle most of the time.
  • No firmware pinning in the flake. Consistent with option B (CLI-only manual flashing). The user fetches the .gbl from https://github.com/NabuCasa/silabs-firmware-builder/releases at update time.
  • backboneInterfaces = [ "enp3s0" ] as a starting value (per machines/jupiter/hardware-configuration.nix:64). To be verified against ip link after first deploy; correctable in a follow-up commit if the actual primary interface differs.

Operator workflow

All commands the user runs themselves; nothing is SSH'd from the dev session.

Step 0 — branch hygiene (dev Mac)

git switch feature/ha-zbt-2-thread          # already renamed
git revert --no-edit e8d09f4                # drops ZHA + dialout commit

Step 1 — apply the module changes (dev Mac)

Edit modules/environments/home-assistant/default.nix per the sketch above. Leave <serial> as a placeholder; fill after Step 3.

Step 2 — eval-only sanity check (dev Mac)

nix flake check

or, equivalently,

nixos-rebuild dry-build --flake .#jupiter

Catches: bad import path, option typos, version skew between unstable and stable.

Step 3 — plug ZBT-2 into jupiter (still on stock Zigbee firmware)

On jupiter:

ls -l /dev/serial/by-id/

Then on dev Mac: copy the full usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial>-... path into radio.device, commit on the feature branch.

Step 4 — flash OpenThread RCP firmware (one-time, on jupiter)

nix shell nixpkgs#python313Packages.universal-silabs-flasher -c \
  universal-silabs-flasher \
    --device /dev/serial/by-id/usb-Nabu_Casa_Home_Assistant_Connect_ZBT-2_<serial>-... \
    flash --firmware ~/ot-rcp-zbt-2-<version>.gbl

Firmware download: latest ZBT-2 OpenThread RCP .gbl from https://github.com/NabuCasa/silabs-firmware-builder/releases.

OTBR isn't running yet at this point, so there's no contention on the device.

Step 5 — rebuild (on jupiter)

sudo nixos-rebuild switch --flake .#jupiter

Brings up otbr-agent.service, opens TCP/8081, loads otbr + thread integrations in HA.

Step 6 — confirm HA discovered it

  • http://jupiter:8123 → Settings → Devices & Services → "Open Thread Border Router" appears as auto-discovered within ~30 s.
  • Click "Configure", form a new Thread network (or import an existing dataset).
  • "Matter" integration page now shows Thread credentials available.

Step 7 — Matter-over-Thread smoke test

Pair one Matter-over-Thread device end-to-end via the HA Companion app. Pairing should complete in 3090 s. If it does, merge feature/ha-zbt-2-thread into master.

Future updates

Identical to Step 4: stop otbr-agent.service, run the flasher with a new .gbl, start the service.

Failure modes

Symptom Likely cause Mitigation
otbr-agent.service fails: "Failed to open device" Dongle unplugged or radio.device path stale (e.g. after replacement) Module sets Restart = "on-failure"; check systemctl status otbr-agent, re-check /dev/serial/by-id/, update path.
OTBR up but HA never discovers it mDNS not propagating on enp3s0 (most often: backboneInterfaces wrong) avahi-browse -r _meshcop._udp should show one entry. If not: ip link, fix backboneInterfaces, rebuild.
HA shows OTBR but Matter pairing times out Thread mesh prefix not routed to LAN, or matter-server can't reach the device's IPv6 ULA nft list ruleset should show OTBR's forwarding rules; ip -6 route should include the Thread mesh prefix.
Dongle stuck after a half-completed flash Flasher interrupted mid-write Re-run the flash; bootloader stays addressable even if RCP firmware is corrupt. The tool detects bootloader-mode automatically.
nixos-rebuild fails: "option services.openthread-border-router does not exist" Unstable module import path wrong / not in scope Caught by Step 2 (eval-only). Fix before deploy.

Verification

Eval-only (dev Mac, before deploy)

nix flake check
nix eval --json .#nixosConfigurations.jupiter.config.services.openthread-border-router.radio.url
nix eval --json .#nixosConfigurations.jupiter.config.services.home-assistant.extraComponents

Expected: flake check passes; radio.url is a spinel+hdlc+uart://... string built from the by-id path; extraComponents includes "otbr" and "thread".

Service-level (jupiter, after rebuild)

systemctl status otbr-agent.service
journalctl -u otbr-agent.service -n 50 --no-pager
ip link show wpan0
avahi-browse -r -t _meshcop._udp
curl -s http://127.0.0.1:8081/node/state

Expected: service active; wpan0 exists (DOWN until HA forms a network — correct); one _meshcop._udp entry; REST returns a JSON state string.

Functional (HA UI)

  • "Open Thread Border Router" appears under auto-discovered integrations.
  • Forming a Thread network from the integration UI succeeds.
  • Pairing one Matter-over-Thread device end-to-end succeeds.

Open questions / risks

  • Unstable module ABI. The services.openthread-border-router module is in nixos-unstable and may change shape before landing in 26.05. If options rename, the eval-only step catches it before deploy. Acceptable risk; we can pin the unstable input revision if churn becomes annoying.
  • Backbone interface name. enp3s0 is a best guess from hardware-configuration.nix:64's commented-out line. Definitive answer comes from ip link on the actual host. Trivial to correct if wrong.
  • First-flash chicken-and-egg. Deferred to nix shell rather than baked into the system, because the dongle must be flashed before otbr-agent claims it. This is documented in Step 4.