I have 4 devices on my network that will lose connectivity randomly. It's never very long (only a couple seconds, the longest drop out was a couple of minutes) and they always reconnect without any direct intervention.
This week, the same issue has started to happen to more devices. While the outages are still brief and self-healing, one of the outages was very ill-timed, so now it's starting to attract more attention.
A few key facts about the network:
1. The affected devices are all connected to the same subnet and VLAN.
2. The affected devices are at a distant corner of the network - 7 hops away from the main internet gateway.
3. The affected devices are connected to the main network through a wireless bridge
I know that the first prime suspect would be the wireless bridge, but I checked on it and it didn't log any outages during that time period. Furthermore, the outage doesn't affect all the devices that are using the wireless bridge.
The first 4 devices are all connected to the same switch, which led me to suspect a problem with the switch. The switch in question has had other problems, such as the management IP randomly becoming unavailable, etc. But I haven't been able to narrow down or replicate the error in a controlled environment. And now I have devices connected to other switches being affected by the same issue.
So, in brief, here's the signal chain:
ISP - Firewall - Switch - Switch - Switch - Wireless AP - Client Bridge - Switch - Switch - Problem devices
Most of those connections are copper Cat5e or Cat6 connections, except for the first three switches, which are multimode fiber and the wireless bridge, which is, well wireless (5ghz 802.11N, ubiquiti nanobridge m5 on both sides)
Yes, I recognize that it's not ideal to have that many hops. I'm going to be running some new fiber this year to address that, but still it's a simple enough setup that there aren't that many places things should be able to go wrong. It seems to me that a damaged cable would cause more persistent problems.
Anyway, I'll be doing some more monitoring and I'll share any new insights I stumble across. If I was going to start rolling up my sleeves and doing some packet captures, where should I start?