Network setup and sysupgrade

Started from the bottom now we here, started from the bottom now my whole team here

Twelve posts deep. Cache flushes, MIPS trampolines, XOR verification, UART monitoring, failsafe timing. This post covers the last mile—once the kernel is booted and failsafe is confirmed, how do we talk to the device over the network and flash the permanent sysupgrade image? No more PRACC, no more cache paranoia. Just TCP/IP and a couple of file transfers.

configure_nic(): Setting Up the Host Side

The MR18 in failsafe comes up at 192.168.1.1. Our host needs to be on the same subnet. That’s the whole function:

def configure_nic():
    cmds = [
        f"ip addr flush dev {HOST_NIC}",
        f"ip addr add {HOST_IP} dev {HOST_NIC}",
        f"ip link set {HOST_NIC} up",
    ]
    for c in cmds:
        subprocess.run(shlex.split(c), check=True)

HOST_NIC is enx6c1ff71fee83—my specific USB Ethernet adapter. HOST_IP is 192.168.1.2/24. Flush existing addresses, add ours, bring the interface up. The flush matters because if you ip addr add the same address twice, Linux happily stacks duplicates and ARP gets confused about which one to use. Packets go to the shadow realm. Ask me how many times I debugged that.

wait_for_openwrt(): Polling Until Something Answers

This function blocks until the MR18 is reachable, using three detection methods because no single one works in every boot stage:

def wait_for_openwrt(timeout=720):
    target = "192.168.1.1"
    while time.time() - start < timeout:
        arp_up  = _nmap_arp_alive(target)        # L2: ARP scan
        icmp_up = _icmp_alive(target)             # L3: ICMP ping
        tcp_up  = _tcp_alive(target, 23) or \
                  _tcp_alive(target, 80)          # L4: telnet or LuCI

ARP scan (nmap -sn -PR) detects the device the moment its NIC is up, even before the IP stack is configured. TCP probe hits port 23 (telnet/failsafe) and port 80 (LuCI/normal boot). Returns True on first response:

[*] ARP: up | ICMP: -- | TCP: --
[*] ARP: up | ICMP: up | TCP: 23/open
[+] OpenWrt is reachable!

The clever part: remember the _failsafe_active event from the UART thread? If that’s set, the serial console has seen the failsafe shell prompt—the device IS running. So if we hit the primary timeout but UART says it’s alive, we extend by 120 seconds:

        if _failsafe_active.is_set():
            print("[*] UART confirms device alive, extending timeout 120s")
            timeout = time.time() - start + 120

Throwing away a whole flash attempt because Ethernet is slow while UART proves the device is alive would be stupid. If we timeout and _failsafe_active is NOT set, something is genuinely wrong and we bail with manual flash instructions.

do_sysupgrade_telnet(): The Raw Fallback

Raw telnet to port 23, firmware piped over netcat. No SSH, no auth—failsafe doesn’t have any of that. The recv_until() helper handles telnet IAC (0xFF) option negotiations inline—when the remote sends DO, we respond WONT; when it sends WILL, we respond DONT. Reject everything. BusyBox telnetd doesn’t care, it works fine with zero negotiated options. I wrote this instead of using telnetlib because that module is deprecated and I didn’t want a dependency for 20 lines of socket code.

The tsh() helper sends a command and waits for #:

def tsh(sock, cmd):
    sock.sendall((cmd + "\n").encode())  # send shell command as bytes, add newline to execute it
    return recv_until(sock, b"#")        # read back all output until we see the shell prompt again

The actual sysupgrade is dead simple—start a netcat listener on the MR18, pipe the image from the host, run sysupgrade -n (no config preserve, fresh install):

def do_sysupgrade_telnet():
    sock = socket.create_connection(("192.168.1.1", 23), timeout=10)
    recv_until(sock, b"#")
    tsh(sock, "nc -l -p 9000 > /tmp/sysupgrade.bin &")
    subprocess.run(f"nc -w 30 192.168.1.1 9000 < {SYSUPGRADE}",
                   shell=True, check=True, timeout=120)
    tsh(sock, "sysupgrade -n /tmp/sysupgrade.bin")

Janky as hell. But it works when SSH doesn’t, and that’s the whole point of a fallback.

do_sysupgrade(): Primary Path

The primary method uses SCP and SSH like a civilized person:

def do_sysupgrade():
    ssh_opts = "-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"
    try:
        subprocess.run(
            f"scp {ssh_opts} {SYSUPGRADE} [email protected]:/tmp/sysupgrade.bin",
            shell=True, check=True, timeout=120)
        subprocess.run(
            f"ssh {ssh_opts} [email protected] 'sysupgrade -n /tmp/sysupgrade.bin'",
            shell=True, check=True, timeout=300)
    except (subprocess.CalledProcessError, subprocess.TimeoutExpired):
        print("[!] SCP/SSH failed, falling back to telnet method")
        do_sysupgrade_telnet()

StrictHostKeyChecking=no and UserKnownHostsFile=/dev/null because every initramfs boot generates new host keys. If I let SSH verify, it screams “REMOTE HOST IDENTIFICATION HAS CHANGED!!!!” every damn time. Yeah no shit, I just reflashed the thing. If SCP fails—which it does in failsafe mode since Dropbear isn’t running—we fall through to the telnet method.

The main() Final Flow

After load_and_run() returns and the kernel is booting, here’s the endgame:

    _lout = {}
    ok = load_and_run(ocd, _lout)
    configure_nic()
    tn.close()                        # done with JTAG forever

    if not wait_for_openwrt(300):
        if _failsafe_active.is_set():
            if not wait_for_openwrt(120):
                print("[!] Network unreachable. Manual flash required.")
                return
        else:
            print("[!] Device not responding. Check wiring.")
            return

    do_sysupgrade()

Configure the NIC, close JTAG (the CPU is running Linux now), wait for network, flash. The finally block guarantees cleanup:

    finally:
        if "_repl" in _lout:
            _lout["_repl"].kill()     # triggers PSU safe state

No matter what—success, failure, exception, Ctrl-C—we kill the REPL thread. That triggers the safe state handler that powers down the bench supply. I learned this the hard way when an unhandled exception left the PSU on overnight and I came back to a very warm access point sitting on my desk.

That’s It

From the EJTAG timing attack to cache coherency to hand-encoded MIPS to UART failsafe detection to network polling to sysupgrade. The whole mr18_flash.py pipeline, 1300 lines of Python, one command: sudo python3 mr18_flash.py. Walk away, come back in 5 minutes, OpenWrt is installed.

1300 lines of Python for a $15 access point. It works though.