3 monitoring programs in one????

Well not really.  This article does a good job of explaining the difference between Nagios, Icinga and Opsview.  At the core these are all Nagios though.  Missing from this set is Groundwork Opensource which is also based on Nagios, which we reviewed last year.  If you haven’t looked at the original or these two alternatives give the article a look.  If you are evaluating Network Monitoring/Datacenter Monitoring solutions to check out Zenoss, we we also reviewed last year.  While inspired by Nagios it is a completely new process.  But as a testiment to how awesome Nagios is still can use any Nagios plugins that you just can’t live without.

Zenoss how the big four should do monitoring…

The biggest benefit to both Open Source Producers and consumers is the community.  The Zenoss community is its greatest strengths as we learned in podcast number 21 back on 3/13/2010.  The tool is being used by some large corporate customers right along side an army of small businesses. If you do not have a staff experienced with Zenoss or a large enough staff to properly roll it out, they have the ability to support any size company, for a fair fee of course.  As opposed to Groundwork which is based on Nagios, Zenoss is a completely distinct product.  Zenoss is developed as a blended company that delivers an Open Source and free to use Core product.  Zenoss also offers additional support features through their Enterprise version, for an additional fee though.

So how was Zenoss to use?  Well if you actually read the documentation and watched the videos the tool is straight forward, relatively easy to use and quick to get up and running.  After the normal initial learning curve with the UI you can start to really get to the meat and value of what the product has to offer.  Your mileage may vary, but I started to get the hang of it after watching the videos and spending about fifteen or twenty unfocused hours on it.  As has been my experience with most software, the more you know about this type of software, the easier it will be for you to get up to speed.

Let me state this again, watching the videos helped immensely, so at least start there if you do not want to read the manuals either before or as you are getting this setup.  The UI for Zenoss was the hardest item for me to learn.  While chatting with Matt and Mark from Zenoss, they assured us they understood it was a problem and that we should expect big changes in this area within the next few releases.  Once you have decoded how to work in the app, it really starts to make sense.  I could start to see the logic in what started out as chaos. 

Once you have enough data to work with, I started with just a few days worth, the tool starts to get interesting. Creating custom reports and alerts are so easy that I could easily see people ending up with report overload.  For reports, you tell the tool the server or group of servers you want to report on.  Then you tell it what out of the available metrics you want to report on and how to layout the report. The tool is all Ajax/Web Gui based and it works smoothly, and really is just that easy.

One of the neatest features in Zenoss is the way they handle alerting.  You have the option as a user to setup your own alerts.  Alerts can also be setup for groups as most normal systems do.  Why is that something neat?  I have been in very few IT shops where team members  I worked with, didn’t each have their own pet systems or applications.  Allowing each of them to set up the extra alerting they want, on a one by one basis, is one of the many signs that experienced operational engineers built this system.  There are other little things that support personnel will pick up on that just make you stop and say “WOW, someone really thought of that feature.”  It is these little differences, that as individual items, do not seem like a lot but as a collective you will quickly learn to love about this tool.  

The next big thing with Zenoss is what they call ZenPaks.  ZenPaks are groups of scripts and small applications that add functionality like a plug-in in FireFox.  This is where the strength of the Community really comes in.  I am running an ESXi Server at home on a Core i7 machine I built.  While I love the server, VMWare has intentionally encumbered several of the features that normal ESX has.  One of those is in the area of monitoring.  VMWare intentionally built the system with no SNMP based agent built-in.  With most systems, this means you are just out of luck for checking anything other than if the machine is up and has connectivity to it.  With Zenoss, there is very likely a ZenPak for that.  If a ZenPak does not already exist, there is a group of people in the community that love challenges and are eager to help you create a ZenPak for that.  This level of support is really helping the Zenoss team and community to set themselves apart.

So what didn’t I like about the product?  The UI takes serious effort to master.  The tutorials and hours of videos are a tremendous help while the Zenoss team works to make it more intuitive.  The other issue is the limited support for using SSH.  It is another area we were assured is being addressed, but took me considerable effort to figure out the first time I tried.  By contrast snmp based discovery worked perfectly, assuming that all of your machines are using the same read and write keys or user name and password.  The last minor issue is that several of the services I have running on my test machines were either misidentified, causing a failure after discovery, or missing completely.  This is easy to fix for small environments of less than 50 servers and it won’t take you a long time to correct.  Another feature I missed that would help, is the import feature as a way to add systems to your installation.

Once you have this tool up and running, you really do start forgiving the pain it put you through to get there.  Creating reports quickly and using the event correlation features starts to pay off quickly.  The Zenpaks will help you keep things monitored without having to write something custom.  All and all this is definitely a solid, scalable and flexible system for monitoring.  I suggest that you download the VM and give it  a try.

Eps. 22 – Interview Tara and Simon from Groundwork OpenSource

Running Time: 47:41

Episode found here

In this episode Brian and Joe do their second interview with two folks from GWOS(GroundWork OpenSource).  Tara Spaulding, Cheif Marketing Officer, and Simon Bennett, Senior Director of Product Management, sat down with us this week and discussed all things GWOS.  No news this week so enjoy the interview.

Contact Us:

linuxinstall – on Twitter.com and Identi.ca

E-Mail us podcast@linuxinstall.net

Call us by going to https://linuxinstall.net/podcast/ and clicking on the google voice button on that page.

Watch for our review of the product in a couple of weeks.


Groundwork OpenSource the little monitoring engine that could…

So you want to monitor your network, you don’t have a lot of time to learn how to setup Nagios, and you have no budget for either consultants or Off the Shelf Software.  What do you do?  One of your options is to use GroundWork Open Source or GWOS.  GWOS is a group of scripts that wrap the Nagios OpenSource Monitoring, Cacti, MTRG and some other tools they have developed on their own with a pretty simple GUI.  As a super jump start, I setup and highly recommend the VM that you can get from their website.  This VM makes quick work of the setup portion of getting the software up and running.  If you are planning on monitoring more than a few hundered devices, this VM solution will likely not work optimally.  That isn’t a reflection on Virtualization Technologies or Groundwork but a reality that current storage and hardware technologies have difficulties writing large quantities of small data points to disk efficiently.  I set this up with about twenty hosts and seven of them over a WAN/VPN connection to simulate a remote office.  Here are Joe’s and my impressions of the whole process. 

One of the things we looked for in a solution like this, is what it shows in regards to a map of the network as well as informing us of what should be monitored.  The first thing that impressed us about Groundwork was their use of the NMAP Program(link to nmap.org) to identify what OS, Ports and applications were in use on the machines on the network.  The auto discovery found all devices, and was able to identify all but one machines OS.  It then went that next step and configured the machines with SSH but without SNMP, so that we could use SSH to monitor to them.  The machine it did not identify the OS of or setup tests/monitors for was my 24 port network switch, which does not appear to look like any OS on it’s web interface.  This was the only miss to this part of the tool, and it is really minor, but it did not map anything out.  All of the devices initially looked like they were directly connected to the monitoring server.  This was easy to fix by setting up some associations that identify parent and child devices.  A parent device is something that is dependant on by one or more devices.  Our network switch is a parent device to a VMWare server which is in turn a parent to the VM’s it hosts.  The interface makes what is a tedious process in Nagios, faster and more efficient.  Once you set all of the associations the maps draw themselves into a clear and easy to understand drawing of where your dependencies are.  These associations do more than create informative maps, it also tells the alerting parts of Groundwork when to ignore false alarms caused by events like downed switches or internet links.

So once we got the basics working we started trying to get the alerting working.  Unfortunately things like my Droid and Ipod regularly go on and off my network.  So the first day or two of working with the alerting was painful but more my own fault than the software.  Once that was all sorted out things started to hum.   The dependency checking worked as expected.  When I dropped the VPN link between Joe and I the only alert we received was for the firewall that did the link.  None of Joe’s devices alerted.  Once restored the check restarted and everything was updated.

All in all it acted and reacted as expected.  The only real issues were related to UI and a need for better testing.  Joe attempted to name a device Joe’s Desktop through the interface.  Groundworks accepted the illegal ‘ character until he tried to save the device.  At that point all of the information about the device disappeared.  We attempted to delete the device so we could read it in multiple places in the UI and none of them seemed to work.  While looking for something else I found a 3rd or 4th place to delete devices which actually worked and let us save. There are some minor user interface glitches that while annoying, are not show stopping. Things like tabs, that when clicked, do not work on some screens but do on others. All in all these are just minor annoyances and not major issues.

If you are looking for a nice tool that is easy to use and free, unless you want to purchase support from them, this should be on your short list of systems to review.  I would probably suggest purchasing support for any of our recommended and reviewed systems if available, for at least the first year to get these issues corrected.  The Virtual Machines they offer is perfect for setting up a quick proof of concept.  New users to the system should expect at least 8-16 hours of effort to get the machines to the level where they are presenting useful alerts and data.  If you plan on measuring a large number of devices and software products with this tool, using a system on bare metal would be my recommendation.  The problem with any monitoring solution is the amount of data being written or read from the database.  So go out download it and give it a try.  The faster you get monitoring the faster you and your admins will be able to get a good nights sleep.

Pre-Built Virutal Machines will Let you JumpStart your next project…

Are you thinking about using Joomla, Tomcat, or a traditional LAMP Stack?  Do you need it up litterally Tommorrow so you can start developing a solution?  Don’t worry there are solutions out there to get you up and running a base configuration before in less time that it will take you to read this article.  This link to itworld.com has a great article about two solutions providers.  Getting servers up can’t be any easier with the solutions provided by both Jumpbox and Turnkey Linux.  Our experience so far has been that these pre-built servers proved an awesome jump start and save hours and days of reading documentation about how to set up the software.  Very often the control panels that these two companies include can help even teams of people with minimal Linux Skills to get systems up and running.  All of these machines should run in VMWare, Virutal Box, Parrallels and any other technolgy that supports the virtual disk/system standards.  In our upcoming article about Zenworks and Groundwork wouldn’t have been possible if I would have installed the systems from scratch.  The use of their Virutal Machines let me get the system up and running in about an hour each.  I didn’t need to setup MySQL, Apache, or anything else just download it and started it up on my VMWare server host.  So the next time you are planning a project check out these great companies and the solutions they offer before you spend to much time trying to figure stuff out before you know you have choosen the right software.

Are you monitoring your servers and Network?

In the last CTO-Brief we discussed building and managing a large number of servers.  The general response we received on reddit, LinkedIn, Twitter, and in E-Mail was that the article was informative but overlooked monitoring.  Let me assure you that we did not leave monitoring out on accident.  We thought it was too large a topic for one article.  Everyone who criticized us was absolutely right about saying that once you build it, you then have to monitor it.  The reasons you need to monitor are pretty simple.  Following is our list of top five reasons for monitoring:

  1. Keeping Customers Happy – You cannot fix what you do not know is broken.  Unless you are monitoring, you will have to rely on customers to tell you when something is down.  When you do have an outage, being able to tell your customers that you are already aware and working on the problem builds their confidence in your abilities to administer the systems.
  2. Proving that you are an AWESOME administrator and/or Administration Team – I have had more than one Director of Operations tell me that we need to “tell the story” of how good we are.  Unless you can demonstrate with data and confidence that you are meeting the Service Level expectations of your customers, there really is no story to tell.
  3. Getting a restful nights sleep after a major release or update to your systems – If you are monitoring and trust those systems to do their jobs then sleeping is easy the night of a big deployment or upgrade.
  4. Performance Management – Knowing when to buy that next system or when to shutdown a server or two, is best shown with data than without.  Getting new machines approved is far easier when you can show managers a graph of how the use of a system is growing and needs to be scaled to the next level.  If your plans including a migration to a Virtual Infrastructure, monitoring lets you easily pick off the first candidates for virtualization.  The machines with the least used CPU’s and Memory can be the ones to set your site on.
  5. Troubleshooting Application Issues – Both performance and troubleshooting, benefit from being able to see what was going on when the problems occurred.  Looking at a set of pretty graphs can save hours of time looking for errors in logs and running down the wrong path to a speedy resolution.

So now we know why to monitor. Next we need to know what to monitor.  To do that, we need to know what our goals and priorities are for monitoring.  The goals for monitoring do not tell you much about which tools to use, but they do tell you how far you need to go.  For instance, if all you want to monitor is whether a server is up and functional, your monitoring needs are quite less then if you want to monitor down to an application level.

The number of open source options in the area of monitoring generally gather information in one of two ways.  The first is by use of the Simple Network Management Protocol or SNMP.  The second is with a software agent, which is usually proprietary to the monitoring software.  The more advanced systems can sometimes take a hybrid approach of both.  There are advantages to both approaches.  SNMP is a very low resource consuming system.  SNMP is supported by nearly every network device and operating system.  If not configured properly though it can be extremely insecure.  Where security is concerned, agents are not guaranteed to be any better.  What they do offer though is tighter integration between the client and hosts.  One drawback to an agent thought can be the additional system resources that they consume, but this depends on the agent in question.

In our next article we will delve deeper into one monitoring project called Nagios which is the base of several other pieces of monitoring software.  Nagios is a wonderful open source project that is amazingly feature complete.  One of the most useful features are System Templating, Hours of operations for alerting, Outage Windows, Escalation Paths and reporting.  The big complaint with it though is how painful it is to configure.  It is not overly complicated, but setup can be very tedious.  To address this, several different projects have created web based user interfaces that abstract the configuration into an easy to use system of templates and other tools to make life with Nagios as close to perfection as possible.  These tools generally incorporate other tools with painful configuration files like MRTG and Cacti for performance and usage graphing.  Both of these reporting packages are awesome projects we have used on numerous projects to show off all kinds of facts about system performance and usage. 

In the future we plan to review Zenoss, GroundWork, and Hyperion HQ.  We know this is not a complete list, but we think it is a pretty good start.  Is there one you think we are crazy to leave off?  If so please let us know in the Comments.