Five approaches to pressing a button (and a ghost from Cisco)

I know you’re somewhere out there, somewhere far away

Okay so the kernel booted. Like actually booted. I saw Linux version 6.6.73 scroll across the UART console. The CPU was running, the scheduler was active, idle task spinning. I set up my host Ethernet at 192.168.1.2/24, pointed my browser at 192.168.1.1, and… nothing. Ping? Nothing. Telnet? Nothing. ARP? Nothing.

But I was getting SOMETHING on the wire. I fired up tcpdump on my host and saw Meraki management frames with ethertype 0x0642—Cisco’s proprietary cloud registration protocol—and DHCP discover packets from a Meraki client. This was not OpenWrt. My OpenWrt kernel was running, but the device was behaving like a Cisco AP.

The Ghost in the NAND

Here’s what happened: OpenWrt’s preinit sequence includes a step where it searches for an overlay filesystem to mount on top of the initramfs root. This is normally a good thing—it’s how OpenWrt persists configuration across reboots. But the MR18’s NAND flash still contains the original Cisco/Meraki filesystem. OpenWrt’s preinit found it, said “oh nice, a valid overlay,” and mounted it.

Once the Meraki filesystem was overlaid, Cisco’s /etc/init.d/, /usr/bin/, and all their service configs shadowed our initramfs versions. OpenWrt’s init proceeded but was now running Meraki daemons—cloud management, DHCP client for cloud connectivity, the whole phone-home stack. My OpenWrt kernel was just a vehicle for Cisco’s userspace.

The fix is failsafe mode. When OpenWrt boots into failsafe, it skips the overlay mount entirely. No NAND filesystem, just the read-only initramfs root. Static IP 192.168.1.1, telnet on port 23, no authentication. Clean environment, no Cisco ghosts.

The problem is getting INTO failsafe mode. And oh boy did I try.

Approach 1: GPIO Manipulation via JTAG (Bugs 16-20)

OpenWrt’s preinit polls GPIO17 (the reset button pin) to check if someone is holding it down. If GPIO17 reads LOW for long enough, failsafe mode activates. My brilliant plan: halt the CPU via JTAG, write the GPIO registers to drive GPIO17 LOW, resume the CPU, repeat in a loop through the preinit window.

halt
mww 0xb8040000 <OE | bit17>     # Set GPIO17 as output
mww 0xb8040010 0x00020000       # Drive GPIO17 LOW
resume
sleep 1.5
# repeat for 25 seconds...

This didn’t work for approximately five different reasons, each more infuriating than the last.

Bug 16 — Wrong timing: I started the GPIO writes too late. The preinit failsafe check window had already closed by the time my script got around to toggling GPIO17. The kernel had already decided “no failsafe, mount the overlay” and moved on.

Bug 17 — Missing resume: I halted the CPU to write the GPIO registers, wrote them, and then… forgot to resume. The CPU was still halted. The kernel couldn’t advance to preinit because the CPU was stuck in debug mode. This one was embarrassing. In my defense I was debugging at like 3am.

Bug 18 — Silent failures: OpenOCD’s mdw and mww commands (memory read/write) require the target to be halted. If the CPU is running, they silently fail—no error, no warning, they just do nothing. So half my GPIO writes weren’t even happening because I’d already resumed the CPU from a previous iteration.

Bug 19 — External pull-ups: Even when the timing was right, the resume was there, and the writes were actually happening… GPIO17 still read HIGH. The MR18 board has pull-up resistors on the GPIO17 line to keep the reset button reading “not pressed” by default. The AR9344’s GPIO output driver couldn’t sink enough current to pull the voltage below the logic LOW threshold. The pull-ups won.

Bug 20 — The kill shot: It wasn’t just passive pull-ups. The MR18 has a dedicated reset supervisor IC—an active CMOS chip that drives the GPIO17 line HIGH with a push-pull output capable of sourcing 10 to 50 milliamps. The SoC’s GPIO can barely manage 2-4 mA. I was trying to have a small dog pull against a truck. The reset supervisor always wins. No amount of register manipulation from software can overcome a dedicated IC that owns that signal line.

Five bugs. Five different reasons the same approach failed. The GPIO method was fundamentally broken on this hardware, and I probably should have realized that after bug 19 but I’m stubborn and a completionist.

Approach 2: Manual Button Press (Bug 21)

Okay fine, can’t drive GPIO17 from software. What about physically pressing the reset button? That’s literally what the button is for. My script could prompt me to press it during the preinit window.

Problem: by the time the script finished the JTAG hammer loop (which was eating up time trying to write GPIO registers), printed the prompt to my terminal, and I reacted… the failsafe window on the device had closed 10 seconds ago. The timing margin for a human to react was negative. Like literally impossible, not “difficult,” but mathematically impossible given the sequence of events.

Approach 3: ESP-Prog EN Pin (Bugs 22-23)

Remember the ESP-Prog’s UART connector? It has an EN pin that’s driven by the FT2232H’s RTS line through an NPN transistor—the same auto-reset circuit used by esptool.py for flashing ESP32 boards. If I wire this EN pin to the MR18’s reset button pad, I can simulate a button press from software:

ser.rts = True   # NPN conducts -> EN pulled LOW -> GPIO17 LOW (button "pressed")
ser.rts = False  # NPN off      -> EN released   -> pull-up wins -> GPIO17 HIGH

This is actually electrically sound—the NPN transistor can sink way more current than the reset supervisor sources, because we’re connecting to the button pad (downstream of the supervisor). Two issues though:

Bug 22 — Resistor on the wrong side: I put a 100 ohm series resistor between the NPN collector and the GPIO17 net as “contention protection.” The resistor was on the wrong side of the signal path. The voltage drop happened between the NPN and GPIO17, not between the supervisor and GPIO17, so GPIO17 stayed at 3.3V. Removed the resistor, wired directly. Fixed.

Bug 23 — Timing, again: I fired the EN pin at t=12 seconds after launching the lzma-loader. The lzma-loader takes about 13 seconds just to decompress the kernel. I was asserting the “button press” before the kernel had even started, let alone before preinit opened its failsafe check window. My EN assertion happened during decompression and released before preinit even ran.

Fix: assert EN at t=2 seconds (early enough to not miss anything) and hold it LOW for 40 seconds (long enough to blanket the entire range from decompression through kernel init through preinit and beyond). Brute force timing—no matter when the window opens, EN is already asserted.

Approach 4: Just Send the Letter ‘f’ (The Correct Answer)

After all of that—5 bugs, 3 failed approaches, GPIO register wars, hardware limitations, timing nightmares—the actual working solution was hilariously simple.

OpenWrt’s preinit doesn’t only check the hardware reset button. It also prints this on the serial console:

Press the [f] key and hit [enter] to enter failsafe mode

And then it reads keyboard input. If it sees f\n, failsafe mode activates. Through the same serial connection I was already using for console output.

I set up a background thread in my Python script that watches every line of UART output. When it sees the failsafe prompt, it immediately sends f\n. Self-synchronizing—no timing guesses needed. The script watches for the prompt, then responds. Works every time.

Belt and Suspenders

The final implementation uses BOTH: the UART f key (primary, event-driven, reliable) and the EN pin held LOW for 40 seconds (backup, brute force, catches edge cases). A background thread handles everything:

  1. At t=2s after kernel launch: assert EN (GPIO17 LOW via NPN transistor)
  2. Continuously read UART console output, print it for monitoring
  3. When “press the [f] key” appears: send f\n immediately
  4. When the failsafe shell prompt appears (/#): send a watchdog kicker, configure eth0, start telnetd
  5. At t=42s: release EN

The watchdog kicker is important—the QCA9557 has a hardware watchdog that fires at ~90 seconds if nothing feeds it. Initramfs failsafe mode doesn’t always start the watchdog feeder. Without it the device just reboots in the middle of the sysupgrade transfer and you’re back to square one.

After the failsafe shell comes up:

BusyBox v1.36.1 (OpenWrt) built-in shell
 --- failsafe ---
================= FAILSAFE MODE active =================
/#

The script configures the network (ifconfig eth0 192.168.1.1 netmask 255.255.255.0 up) and starts telnetd (telnetd -l /bin/sh &). At this point… wait. I haven’t mentioned the Ethernet problem yet.

Ethernet: TX Works, RX Doesn’t

Failsafe mode is up. I ping 192.168.1.1. No response. But tcpdump on my host shows ARP requests coming FROM the MR18. It can send packets, but it can’t receive them. The rx_packets counter on eth0 stays at zero while tx_packets climbs normally. Every incoming frame triggers FCS (Frame Check Sequence) errors.

This is the AR8035 PHY bug, and it’s the last boss of this entire project. But that’s for the next post, along with the 20-minute UART file transfer and the ending of this increasingly ridiculous saga.