So you want to monitor your network, you don’t have a lot of time to learn how to setup Nagios, and you have no budget for either consultants or Off the Shelf Software. What do you do? One of your options is to use GroundWork Open Source or GWOS. GWOS is a group of scripts that wrap the Nagios OpenSource Monitoring, Cacti, MTRG and some other tools they have developed on their own with a pretty simple GUI. As a super jump start, I setup and highly recommend the VM that you can get from their website. This VM makes quick work of the setup portion of getting the software up and running. If you are planning on monitoring more than a few hundered devices, this VM solution will likely not work optimally. That isn’t a reflection on Virtualization Technologies or Groundwork but a reality that current storage and hardware technologies have difficulties writing large quantities of small data points to disk efficiently. I set this up with about twenty hosts and seven of them over a WAN/VPN connection to simulate a remote office. Here are Joe’s and my impressions of the whole process.
One of the things we looked for in a solution like this, is what it shows in regards to a map of the network as well as informing us of what should be monitored. The first thing that impressed us about Groundwork was their use of the NMAP Program(link to nmap.org) to identify what OS, Ports and applications were in use on the machines on the network. The auto discovery found all devices, and was able to identify all but one machines OS. It then went that next step and configured the machines with SSH but without SNMP, so that we could use SSH to monitor to them. The machine it did not identify the OS of or setup tests/monitors for was my 24 port network switch, which does not appear to look like any OS on it’s web interface. This was the only miss to this part of the tool, and it is really minor, but it did not map anything out. All of the devices initially looked like they were directly connected to the monitoring server. This was easy to fix by setting up some associations that identify parent and child devices. A parent device is something that is dependant on by one or more devices. Our network switch is a parent device to a VMWare server which is in turn a parent to the VM’s it hosts. The interface makes what is a tedious process in Nagios, faster and more efficient. Once you set all of the associations the maps draw themselves into a clear and easy to understand drawing of where your dependencies are. These associations do more than create informative maps, it also tells the alerting parts of Groundwork when to ignore false alarms caused by events like downed switches or internet links.
So once we got the basics working we started trying to get the alerting working. Unfortunately things like my Droid and Ipod regularly go on and off my network. So the first day or two of working with the alerting was painful but more my own fault than the software. Once that was all sorted out things started to hum. The dependency checking worked as expected. When I dropped the VPN link between Joe and I the only alert we received was for the firewall that did the link. None of Joe’s devices alerted. Once restored the check restarted and everything was updated.
All in all it acted and reacted as expected. The only real issues were related to UI and a need for better testing. Joe attempted to name a device Joe’s Desktop through the interface. Groundworks accepted the illegal ‘ character until he tried to save the device. At that point all of the information about the device disappeared. We attempted to delete the device so we could read it in multiple places in the UI and none of them seemed to work. While looking for something else I found a 3rd or 4th place to delete devices which actually worked and let us save. There are some minor user interface glitches that while annoying, are not show stopping. Things like tabs, that when clicked, do not work on some screens but do on others. All in all these are just minor annoyances and not major issues.
If you are looking for a nice tool that is easy to use and free, unless you want to purchase support from them, this should be on your short list of systems to review. I would probably suggest purchasing support for any of our recommended and reviewed systems if available, for at least the first year to get these issues corrected. The Virtual Machines they offer is perfect for setting up a quick proof of concept. New users to the system should expect at least 8-16 hours of effort to get the machines to the level where they are presenting useful alerts and data. If you plan on measuring a large number of devices and software products with this tool, using a system on bare metal would be my recommendation. The problem with any monitoring solution is the amount of data being written or read from the database. So go out download it and give it a try. The faster you get monitoring the faster you and your admins will be able to get a good nights sleep.