9 - MR18 Deep Dive: The Timing Attack

Catching a CPU in the dark with a 1.5 second flashlight

I been up for a long time, I ain’t get no sleep for it

Post 3 gave the high-level overview of the timing attack—power on, wait 1.5 seconds, halt the CPU before Cisco’s kernel murders the JTAG interface. This post rips open mr18_flash.py and walks through every halt strategy and the main retry loop in detail. None of this is elegant. It’s the kind of code you write at 2am when you’ve already power-cycled an access point forty times and you’re starting to take it personally.

If you haven’t read posts 3 through 6 yet, the short version: the MR18’s bootloader disables JTAG about 2 seconds after power-on. We have to halt the CPU inside that window or we lose our only way in. The script automates the entire race condition—power cycling, OpenOCD startup, and multiple halt strategies fired in rapid succession.

The Halt Helpers

There are five functions involved in stopping the CPU. They’re layered like increasingly desperate attempts to get someone’s attention.

try_halt_highlevel() is the polite approach. It sends OpenOCD’s halt command over the telnet interface and checks if the response contains "halted" or "debug mode". This works when OpenOCD has a clean connection and the TAP is behaving. When it doesn’t work, it returns False and we escalate.

try_halt_ejtag() is the raw approach. Forget OpenOCD’s abstractions—this one talks directly to the EJTAG control register. It does an irscan to select the EJTAG control register (IR = 0x0a), then a drscan to write 0x9008 into it. That value sets three bits: PROBEN (enable processor access), JTAGBRK (request a debug break), and BIT3 (the spec says “just always set this,” real helpful documentation guys). Then it polls the register up to 10 times with 50ms sleeps between each read, checking for the BRKST bit to confirm the CPU actually stopped.

Why 0x9008 specifically? Let me break it down:

Bit 15 (PROBEN, 0x8000): enables the PRACC debug channel so we can actually do things once the CPU halts
Bit 12 (JTAGBRK, 0x1000): the actual “please stop” request to the CPU
Bit 3 (BIT3, 0x0008): the EJTAG spec says this must always be written as 1. No explanation given. Classic.
OR those together: 0x8000 | 0x1000 | 0x0008 = 0x9008

Here’s the fun part: the AR9344 sometimes clears PROBEN on its own. Just… decides it doesn’t want to be debugged anymore. The Nandloader might be reconfiguring the debug unit, or there’s some hardware-level reset happening—I never figured out exactly why. So try_halt_ejtag() checks for this on every poll iteration and re-asserts the full 0x9008 value if PROBEN drops. You’d think writing a register would be a one-and-done operation but no, you have to babysit it.

init_tap() is the setup step. It runs jtag arp_init to scan the chain and ar9344.cpu arp_examine to make OpenOCD actually look at the target. Without this, OpenOCD knows a TAP exists but hasn’t bothered to figure out what state it’s in. Calling examine forces it to read the IDCODE and set up the debug interface. You’d think init would do this automatically but OpenOCD has opinions about when things should happen and those opinions are wrong.

try_halt_once() is the quick-and-dirty version. It sends halt, then wait_halt 300 (wait up to 300ms for the halt to take effect), and checks the response. This is for the tight inner loop where we’re firing halt attempts as fast as possible and don’t want to waste time on the full EJTAG song and dance unless the simple approach fails.

examine_and_halt() combines everything. It calls init_tap() first to make sure the TAP is scanned and examined, tries try_halt_highlevel(), and if that fails, falls back to try_halt_ejtag(). This is the “throw everything at the wall” function. It’s used during the initial connection phase before the tight halt loop takes over.

Why so many functions for what’s conceptually one operation (“stop the CPU”)? Because JTAG is flaky as hell and different failure modes require different recovery strategies. Sometimes the TAP needs re-examination. Sometimes OpenOCD’s high-level command works fine. Sometimes you have to bypass OpenOCD entirely and bit-bang the EJTAG control register yourself. Having all these layers means the script can adapt to whatever state the debug interface happens to be in at any given moment.

The main() Timing Attack Loop

This is the heart of the script. The entire point of mr18_flash.py is this loop. Everything else—the binary loading, the checksums, the cache flushes—only matters if this part succeeds.

MAX_ATTEMPTS = 6

for attempt in range(1, MAX_ATTEMPTS + 1):
    print(f"\n--- Attempt {attempt}/{MAX_ATTEMPTS} ---")

    # 1. Kill any leftover OpenOCD
    kill_openocd()

    # 2. Full power cycle
    psu("psu chan off")
    time.sleep(2.5)          # caps discharge
    psu("psu chan on")
    time.sleep(1.5)          # Nandloader alive, JTAG window open

    # 3. Start OpenOCD with live TAP scan
    start_openocd()

    # 4. Connect telnet to OpenOCD
    tn = connect_telnet()
    if not tn:
        continue

    # 5. Tight halt loop -- 1 second, alternating strategies
    halted = False
    t0 = time.time()
    while time.time() - t0 < 1.0:
        if try_halt_once(tn):
            halted = True
            break
        if try_halt_ejtag(tn):
            halted = True
            break
        time.sleep(0.02)       # 20ms between attempts

    if halted:
        print("HALTED!")
        break
    else:
        print("missed the window, retrying...")
        tn.close()

Six chances. That’s it. If we can’t catch the CPU in six power cycles, something is fundamentally wrong—bad wiring, dead TAP, or the ESP-Prog decided to take the day off. Six felt right. Enough retries to handle normal variance, not so many that you sit there for five minutes watching failures pile up before admitting defeat. I originally had it at 10 but realized that if it hasn’t worked in 6 tries, attempt 7 through 10 aren’t going to magically fix whatever’s wrong.

Let me walk through each step.

Kill OpenOCD: Any previous OpenOCD instance still running will hold the USB interface open. The FT2232H can only have one master. Kill it or the new instance won’t be able to claim the device.

Power off + 2.5 second sleep: The MR18’s board has decoupling capacitors that hold charge for a surprisingly long time. If you power cycle too fast, the SoC never fully resets—it just kind of brown-outs and comes back in some weird half-initialized state where the TAP doesn’t respond correctly. 2.5 seconds is enough for everything to drain to zero. I found this number by reducing it until things broke and then adding a margin. Science.

Power on + 1.5 second sleep: This is the sweet spot. After power is applied, the AR9344’s internal ROM boots, hands off to the Nandloader, and the Nandloader initializes DDR RAM and starts loading the Cisco kernel. At about 1.5 seconds, the Nandloader is alive, DDR is initialized, and—critically—the JTAG TAP is active and scannable. Wait too short and the TAP isn’t ready. Wait too long and the Cisco kernel has already reconfigured GPIO and killed TDO. I didn’t calculate 1.5 seconds from any datasheet. I tried 0.5, 1.0, 1.5, 2.0, and 2.5. Anything below 1.0 was too early (OpenOCD couldn’t scan the chain). Anything above 2.0 was too late (JTAG already dead). 1.5 was the Goldilocks number. Empirical. Not math, just “try it and see.”

Start OpenOCD: This has to happen AFTER power-on, not before. I made this mistake early on—starting OpenOCD first seemed logical, have the debugger ready and waiting. But OpenOCD runs its init sequence immediately, tries to scan the chain while all the lines are floating, sees nothing, and enters an error state it never recovers from. Starting it after the device is alive means the first scan finds a real TAP.

The tight halt loop: One full second of alternating try_halt_once() and try_halt_ejtag() with 20ms sleeps between. At 20ms per iteration that’s roughly 50 halt attempts per power-on window. The alternation matters—sometimes the high-level halt works on the first try, sometimes it takes the raw EJTAG register write to actually stop the core. Throwing both at it maximizes the odds of catching the CPU in the ~0.5 seconds remaining in our window.

The 20ms sleep: Without it, we’d spam the telnet connection faster than OpenOCD can process commands. The responses pile up, the socket buffer fills, and everything falls apart. 20ms gives OpenOCD enough time to actually execute each command and respond before we fire the next one.

“HALTED!”

Empirically, this works on the first or second power cycle about 90% of the time. Occasionally the third. I’ve never seen it take more than four. The failures are almost always USB latency—the host takes an extra 200ms to enumerate the FT2232H after OpenOCD starts, and that 200ms is the difference between catching the window and missing it.

That first HALTED! meant we caught the CPU mid-Nandloader. JTAG was ours and Cisco’s kernel never got a chance to ruin our day.

What Happens After the Halt

Once the CPU is stopped, main() calls load_and_run()—which handles the entire binary loading pipeline covered in posts 4 and 5: D-cache flush (evict Cisco’s dirty cache lines), load_image (6.9 MB over PRACC at ~97 KB/s), per-chunk CPU-executed XOR verification, rewrite any bad chunks, final checksum, and the launch trampoline that jumps to the lzma-loader entry point.

load_and_run() returns a dict (_out) containing references to the UART monitoring thread—the same thread that watches for the failsafe prompt and sends f\n at the right moment. The main function needs those references to coordinate everything that follows: triggering failsafe mode, setting up the network, and eventually flashing the sysupgrade image.

After load_and_run() returns, main() calls configure_nic() to set the host Ethernet interface to 192.168.1.2/24, closes the JTAG telnet connection (we’re done with debug mode forever at this point—the CPU is running Linux now, not sitting in a debug halt), and calls wait_for_openwrt() which blocks until the MR18 responds to pings on 192.168.1.1.

The whole flow from HALTED! to a pingable OpenWrt instance takes about 3-4 minutes. Most of that is the 70-second load_image transfer and the 90-second kernel boot. The timing attack itself—the hard part—is over in under 10 seconds if it works on the first try. All that effort, all those halt strategies, all that empirical timing work, for 10 seconds of actual execution. Worth it.

The Empirical Nature of All of This

I want to be really clear about something: almost none of the timing values in this script came from datasheets or calculations. The 2.5 second discharge time? Tried shorter, it broke. The 1.5 second boot delay? Swept from 0.5 to 2.5 in 0.5 second increments. The 20ms halt loop sleep? Started at 100ms (too slow, missed windows), went to 10ms (overwhelmed OpenOCD), settled on 20ms. The 1-second halt window duration? Gut feeling that turned out to be right. The 300ms timeout in try_halt_once()? Long enough to not give up too early, short enough to get multiple attempts in before the window closes. Every single number was found the same way—try it, see what happens, adjust.

You could argue this is bad engineering. A “real” engineer would read the AR9344 technical reference manual, find the boot sequence timing diagram, calculate the exact window, and derive optimal values from first principles. And sure, maybe. But the TRM doesn’t document how long the Meraki Nandloader takes to initialize DDR. It doesn’t tell you when Cisco’s Linux kernel reconfigures TDO. Those are firmware-level behaviors that depend on what Meraki decided to put in their NAND image, and no datasheet covers that.

This is embedded systems debugging in its purest form. You have a hypothesis, you test it, you adjust. The AR9344 datasheet doesn’t have a section titled “how long after power-on until the JTAG TAP is scannable but the kernel hasn’t killed it yet.” That number only exists in the real world, on this specific board, with this specific firmware version. You find it by trying.

And honestly? That’s the part I love about this shit. The code isn’t clever. The timing values are just numbers I found by being stubborn. But it works—reliably, repeatedly, across dozens of flash cycles during development. Sometimes the best engineering is just being too stubborn to stop power-cycling.