6 - Installing OpenWrt on a Meraki MR18
Contents
The last mile: a PHY bug, a 20-minute serial transfer, and victory
We made it, yeah, we made it
Last post ended with failsafe mode running but Ethernet RX completely broken. TX works—the MR18 sends ARP requests just fine—but it can’t receive a single packet. Every incoming frame triggers FCS errors. rx_packets is zero. Ping doesn’t work because ARP replies from the host never reach the device.
This is the final boss.
RGMII RX Clock Delay: A 2 Nanosecond Problem
The QCA9557 SoC connects to the AR8035 Ethernet PHY over an RGMII bus—Reduced Gigabit Media Independent Interface. At 1 Gbps, RGMII clocks data on both the rising and falling edges of a 125 MHz clock. The spec requires a 2 nanosecond delay between the clock and data signals so the MAC (on the SoC) can sample the data at the right point in each cycle.
The AR8035 has an internal programmable delay line for this. On the RX path (PHY-to-MAC direction), this delay is controlled by bit 15 of debug register 0x00. When this bit is cleared—which is the default—the delay line is disabled. The RX clock arrives aligned with the data edges. The MAC samples at transition points and gets garbage.
Why isn’t this already configured? Because OpenWrt’s device tree for the MR18 says:
phy-mode = "rgmii";
When it should say:
phy-mode = "rgmii-rxid";
That -rxid suffix tells the at803x kernel PHY driver to enable the RX internal delay during initialization. Without it, the driver leaves everything at hardware defaults. One missing string in a device tree source file, and RX is completely broken.
The “correct” upstream fix is a one-line device tree change. But I can’t recompile the kernel right now—I’m in failsafe mode with no network, no package manager, and a read-only initramfs root. I need to fix this from the inside.
Bare-Metal MIPS: Writing a PHY Fix Without libc
The fix is to write MDIO register 0x1D with the debug register address (0x00), then write MDIO register 0x1E with the new value (bit 15 set). That’s two register writes. Simple, right? On a normal Linux system I’d use ethtool or phytool or even a quick Python script. But failsafe mode is minimal—there’s no ethtool, no Python, barely any tools at all.
So I wrote a standalone C program that uses raw Linux syscalls. No libc, no dynamic linking, nothing. Just sys_socket, sys_ioctl, sys_write, and sys_exit. The MIPS O32 ABI has some fun quirks—errors are signaled through the $a3 register being set to 1, not through a negative return value like on x86. And you HAVE to declare temporary registers as clobbers in your inline assembly or GCC will keep live values in $t0-$t9 across syscalls and the kernel will happily overwrite them.
The program:
- Creates an
AF_INETsocket (needed for the MDIO ioctl) - Queries the PHY address via
SIOCGMIIPHY(it’s 3 on this board) - Reads PHY ID registers 0x02 and 0x03 to verify it’s actually an AR8035 (
0x004d,0xd072) - Disables hibernation mode via debug register 0x0B (defensive, was a no-op on my board)
- Enables RGMII RX clock delay via debug register 0x00 bit 15—the actual fix
Compiled with mips-linux-gnu-gcc using every bare-metal flag in the book: -nostdlib, -nostartfiles, -msoft-float (the AR9344 has no FPU—any floating point instruction is an illegal instruction trap), -mno-abicalls, -fno-pic, and a custom _start in assembly that sets up the global pointer and stack before calling into C. The resulting binary is 5,592 bytes. Tiny.
Wait but how do I get a 5.5 KB binary onto a device that can’t receive network packets?
UART: The Only Channel That Works
The serial console—the same 115200 baud UART connection I’ve been using for console output this whole time—is the only working bidirectional channel. TX and RX both work on UART because it doesn’t go through the PHY. So I built a transfer protocol over serial.
You can’t send raw binary over a serial console because the shell and tty layer interpret control characters, do CR/LF conversion, and generally mangle anything that isn’t printable ASCII. The solution is dumb but it works: hex-encode every byte (so 0xFF becomes the two ASCII characters "ff"), send it as text, and decode on the other end.
The decoder runs on the MR18 as a busybox awk one-liner:
awk 'BEGIN{h="0123456789abcdef"}{for(i=1;i<=length($0);i+=2)
printf"%c",(index(h,tolower(substr($0,i,1)))-1)*16+
(index(h,tolower(substr($0,i+1,1)))-1)}' > /tmp/ar8035-fix
It reads hex pairs from stdin, converts each pair to a byte, and writes the binary output to a file. My Python script (send_binary.py) hex-encodes the ar8035-fix binary in 512-byte chunks and writes them to the serial port. At 115200 baud that’s about 5,760 bytes/sec effective binary throughput after hex encoding overhead. The 5.5 KB binary transfers in under 2 seconds.
chmod +x /tmp/ar8035-fix
/tmp/ar8035-fix
The program runs, writes the MDIO registers, and immediately… rx_packets starts incrementing. Ping works. SSH works. The MR18 has bidirectional Ethernet.
The 20-Minute Sysupgrade
Now I need to flash the sysupgrade image to NAND. The sysupgrade image is about 7 MB. Over UART at 5,760 bytes/sec that’s… roughly 20 minutes.
Twenty. Minutes. Of watching hex scroll across a serial terminal.
I used the same hex-over-UART protocol but with more safeguards (uart_transfer.py). Before committing to a 20 minute transfer I run a pre-test: send 32 known bytes through the whole pipeline (hex encode, awk decode, write to file, verify MD5). If the pre-test fails, abort immediately instead of wasting 20 minutes on a broken channel.
The full transfer also needs an echo drainer—a background thread that continuously reads and discards the serial echo. The shell echoes every character back, so the 14 MB of hex data (7 MB binary doubled) produces 14 MB of echo flowing back to the host. Without active draining, the host’s serial receive buffer fills up, flow control stalls everything, and the transfer hangs.
After 20 minutes:
Transfer complete: 7340032 bytes
Verifying MD5... 53e272bed2041616068c6958fe28a197 /tmp/fw.bin OK
Running sysupgrade -n /tmp/fw.bin...
The -n flag tells sysupgrade to not preserve any configuration—wipe everything and start fresh. The device reboots. 90 seconds pass. The longest 90 seconds of my life.
BusyBox v1.36.1 (OpenWrt) built-in shell
_______ ________ __
| |.-----.-----.-----.| | | |.----.| |_
| - || _ | -__| || | | || _|| _|
|_______|| __|_____|__|__||________||__| |____|
|__|
-------------------------------------------------
OpenWrt 25.12.0, r0+28088-eca...
-------------------------------------------------
root@OpenWrt:/#
IT BOOTED FROM NAND. Not initramfs. Not with a Meraki overlay. Actual OpenWrt, flashed to NAND, booting on its own.
But wait—the AR8035 PHY fix doesn’t persist across reboots. The PHY resets to its default register values on every power cycle. So I installed a hotplug script:
# /etc/hotplug.d/iface/10-ar8035-fix
#!/bin/sh
[ "$ACTION" = "ifup" ] && [ "$INTERFACE" = "lan" ] && /usr/bin/ar8035-fix
This fires every time the LAN interface comes up and re-applies the RGMII RX clock delay fix. Copy the ar8035-fix binary to /usr/bin/, install the hotplug script, and Ethernet RX works on every boot. Automatic, no manual intervention needed.
23 Bugs Later
Let me put this in perspective. From the first power-on to a working OpenWrt installation, I hit 23 distinct bugs. Not 23 instances of the same problem—23 unique, categorically different issues:
- 6 JTAG/PRACC bugs (protocol errors, timing, state machine issues)
- 4 verification bugs (phantom errors, XOR cancellation, encoding mistakes)
- 4 failsafe trigger bugs (timing, GPIO, hardware contention)
- 3 cache coherency bugs (the KSEG0/KSEG1 nightmare)
- 3 hardware/electrical bugs (pull-ups, reset supervisor, resistor placement)
- 2 toolchain bugs (wrong binary target, compiler flags)
- 1 boot bug (NAND overlay mounting Cisco’s filesystem)
Categories that my degree directly helped with: JTAG stuff (embedded systems), cache coherency (computer architecture), MIPS encoding (computer architecture), bare-metal C (systems programming), circuit debugging (embedded systems lab). Categories that nothing prepared me for: the PRACC bit-flip rate just being a fundamental property of the universe that you can’t fix, only work around. And the reset supervisor IC being an active driver instead of a passive pull-up—that one you just have to find by running into the wall.
Was It Worth It?
I spent my entire Spring Break and a random weekend on a $15 access point from a second hand store. I could have just bought a TP-Link for $30 on Amazon and been done in 10 minutes.
But where’s the fun in that? I got to exploit a JTAG timing window, write hand-encoded MIPS assembly, debug a MIPS D-cache coherency problem that most people only encounter in textbooks, build a hex-over-UART transfer protocol, write bare-metal C with raw syscalls for a PHY register fix, and fight a reset supervisor IC that was determined to keep me out.
The MR18 is now running OpenWrt 25.12.0 with working WiFi (both 2.4 GHz and 5 GHz), bidirectional Gigabit Ethernet, and it’s sitting doing exactly what a $30 TP-Link would do. Except it has a better story. And I learned more in one Spring Break than in most semesters.
That shitty ahh ESP-Prog and that little MR18 were the most fun I’ve had in a while. If you made it this far—thanks for reading. My grammar still sucks but at least the AP works now.