Debugging Asymmetric Routing II

07 Mar 2026

While trying to take my own advice to bind ports to host and forward, I inadvertently reintroduced asymmetric routing to the the network.

This time I was able to fix it without a network redesign since the chaos was isolated to a single box.

Here’s my notes on diagnosis and solution.

Diagnosis

Inexplicable MQTT errors:

Home Assistant/zigbee2mqtt: z2m: MQTT error: Keepalive timeout
Mosquitto server: Client XXXX has exceeded timeout, disconnecting
mosquitto_sub - no errors
Many Default deny / state violation rule packets on router

Proof

This was easy. I just adjusted the tcp command in debugging asymmetric routing to monitor MQTT traffic. It was very obvious that replies were being sent out on the wrong interface and this was breaking routing

We can see why the OS does this by inspecting the routing table:

root@atlas:/home/geoff# ip route show
default via 10.100.0.1 dev br0 proto dhcp src 10.100.254.246 metric 1004 
default via 172.16.0.1 dev br1 proto dhcp src 172.16.0.244 metric 1006 
default via 10.110.0.1 dev br110 proto dhcp src 10.110.94.63 metric 1010 
10.89.0.0/24 dev podman1 proto kernel scope link src 10.89.0.1 
10.100.0.0/16 dev br0 proto dhcp scope link src 10.100.254.246 metric 1004 
10.110.0.0/16 dev br110 proto dhcp scope link src 10.110.94.63 metric 1010 
172.16.0.0/16 dev br1 proto dhcp scope link src 172.16.0.244 metric 1006 

And these all come from dhcpcd running on several interfaces, as defined in /etc/network/interfaces (debian trixie):

...
# os bridge
auto br0
iface br0
    bridge-ports enp1s0
    use dhcp

# tagged vlan bridges
auto br1
iface br1
    bridge-ports enp1s0.1
    use dhcp

auto br110
iface br110
    bridge-ports enp1s0.110
    use dhcp

Solution

Turns out dhcpcd needed will always add routes for new interfaces so we need to clean them up ourselves to avoid mayhem with some little bash script hooks, like this:

/etc/dhcpcd.exit-hook

#!/bin/sh
# logs: journalctl -u networking

# if DHCP gives us a new address, we could check with $old_ip_addres == $new_ip_address
# but this happens basically 0% of the time so just reboot until run out of IP addresses
# for now/forever if changed address breaks things (traefik)

for script in /etc/dhcpcd.exit-hook.d/*.sh ; do
    echo "dhcpcd.exit-hook run script: ${script}" 
    "$script"

/etc/dhcpcd.exit-hook.d/br1_isolate.sh

#!/bin/bash

# only when we actually have an address
if [ "$interface" = "br1" ] && [ -n "$new_ip_address" ]; then
    echo "delete routes to avoid asymmetric routing - br1"
    ip route del default dev br1
    ip route del 172.16.0.0/16 dev br1
fi

/etc/dhcpcd.exit-hook.d/br110_isolate.sh

#!/bin/bash

# only when we actually have an address
if [ "$interface" = "br110" ] && [ -n "$new_ip_address" ]; then
    echo "delete routes to avoid asymmetric routing - br110"
    ip route del default dev br110
    ip route del 10.110.0.0/16 dev br110
fi

After restarting networking, routing table looks perfect:

default via 10.100.0.1 dev br0 proto dhcp src 10.100.254.246 metric 1025 
10.89.0.0/24 dev podman1 proto kernel scope link src 10.89.0.1 
10.100.0.0/16 dev br0 proto dhcp scope link src 10.100.254.246 metric 1025

And MQTT works in Home Assistant again!

Root of the problem

This whole problem was caused by enabling a bunch of interfaces that grant access to different VLANS, but as mentioned in previous debugging post, VLANS are for traffic separation, and if you have something essentially bridging the VLANS as above, it defeats the whole point of the exercise.

So what was I trying to do here? I was trying to expose a bunch of services like nexus and MQTT to the whole network and run prometheus in the management VLAN.

The problem is the main host is not officially on the management VLAN, it has an interface setup with an IP address and thats it. After deleting the routes so it can’t reach anything, this effectively blocks me from using docker/podman bridge networking to run prometheus boudn to the management VLAN without a lot of extra/brittle work since podman expects the host to participate in networking.

MACVLAN and IPVLAN cause problems with dhcp which I already looked at extensively which leaves… VMs.

In the end, the solution is simple and bulletproof, if a little clunky: Just bind the interface into a VM directly and podman containers can just use the default network. Ironically, this also means the interface no longer needs an IP address on the host and you guessed it, removing the use dhcp from /etc/network/interfaces would have also prevented the asymmetric routing problem.

What a weekend.

geoffwilliams@home:~$