Saturday, February 24, 2024

Cracking the Code: Navigating Network Nightmares with Home Assistant, LibreNMS, and Wireshark

Introduction:

Hey there! Ever found yourself scratching your head over a quirky network issue that just won't quit? I recently had my fair share of head-scratching moments when network broadcast storms started causing serious disruptions in our switches and wireless access points. In this post, I'm breaking down the steps I took to unravel this persistent, network mystery.

Setting the Scene:

Picture this – network switches and access points dropping off the radar every few nights. To get to the bottom of it, I needed to gather some intel without losing sleep, especially as these problems seemed to occur in the middle of the night.  First move? Configuring Home Assistant to ping, with the Ping integration, a crucial switch and give me a notification on my phone when things go south. It became my silent night watchman, marking the time when the broadcast storm kicked in.

The Quest for Network Clarity:

LibreNMS, a network monitoring system, was my next stop. Although it did show spikes in network bandwidth, the specifics were elusive. SNMP logs lacked the juicy details needed for a deep investigation.

Enter rsyslogd:

To beef up my data game, I brought in rsyslogd on a Proxmox server. This Linux container was configured to capture logs from the admin VLAN, where switches and access points were. This did capture a lot of data, however, drowning in logs was not my idea of fun.  It was too hard to find the root cause of these issues.

Wireshark to the Rescue:

Since these broadcast storms were persistent, there was plenty of time to look at the actual traffic.  The logical next step was Wireshark. Installed on another Linux container, it quietly recorded 10-minute snippets of network traffic data. The ring buffer was configured to store more than 24 hours of these 10-minute buffers.  With this setup, I waited until the problem happened again

The Culprit Unveiled:

Guess what? One of our Chromebooks was causing the ruckus.  It was connected to an USB-C dock with an Ethernet jack, connected to one of our switches. Disabling the Wi-Fi, since it was connected to Ethernet anyway, put the broadcast storm to bed. Crisis averted.

Root Causes:

So, why the hiccup? Well, the Chromebook couldn't handle having both adapters on, leading to a bit of a network panic. Also, Unifi's RSTP implementation seemed a bit overwhelmed, especially when dealing with a flood that spanned both sides of the wireless and wired tracks.

Conclusion:

In the world of network troubleshooting, ther isn’t a need to be overwhelmed – just a dash of persistence and a tech toolkit. Home Assistant, LibreNMS, rsyslogd, and Wireshark played their parts, and now we're coasting through a week of smooth sailing. The Proxmox containers are still around, ready to lend a hand when the next tech puzzle comes knocking.


No comments:

Post a Comment

Unlocking Raspberry Pi Potential: Navigating Network Booting Challenges for Enhanced Performance and Reliability

I've set up several Raspberry Pis around our house for various projects, but one recurring challenge is the potential for SD card failur...