Managing Large Numbers of Linux Systems

So you have seen the power and stability of Linux and are ready to get your feet wet with the little Penguin.  Your Management is sold and they have started buying more and more Linux servers.  How do you manage and control this growth?  Where should you focus your efforts first when trying to manage all of this?  Do you focus on building servers fast, or is managing your configurations the most important task at hand? 

In our opinion, your end goal should be to build, manage and monitor all of your servers with an automated process via a series of scripts and applications.  To determine what order to accomplish this you need to determine why you are growing.  If you are growing because development efforts on Linux are in full force, you will probably want to focus on building servers fast.  If your are growing because your production servers are getting large amounts of traffic, then you should probably focus on both building and managing your configuration first. 

How do you build a server really fast?
On the free side of things we recommend that you use the Red Hat created system called Anaconda.  Anaconda allows you to create a text file that describes almost everything about a system.  When invoked, the Anaconda process will create a complete system with all of the packages you want to use installed and configured. Both Ubuntu(Debian based package system) and every RPM based system I know of like Fedora, OpenSuSe, and Mandrivia have support for Anaconda. (more detailed Anaconda information can be found here)  If you have a system you want to clone or use as a base system, you will want to use Anaconda to profile the system and create the KickStart Configuration file for you.  Most installers create an Anaconda created KickStart file for the system in the root users home directory. (Normally called anaconda-ks.cfg.) If you then take this file and change the machine specific information, like the host name and ip address, you can create a new system.  Combine that with either the use of a PXE booting system, or command line arguments to the installer program for your configuration files location and it will be setup for you on the new machine.  Normally you will set up a few templates of key system types.  For instance, one Kickstart file for web servers, one for database servers and one for desktops.

If you prefer to use disk images similar to the old Ghost program from Norton(Symantec), then take look at the Clonezilla project.  This project started in the educational arena and is used by a fair number of K-12 and College schools.  It has the advantage of being able to manage both Linux and Windows Images.  The speed to install is similar to Anaconda and Clonezilla and also has OS plug-in's that allow you to configure the system with the unique system information.  If you happen to be using VMWare they have built in cloning and templementing for a very similar this and with the same limitations.  The main downside to this system and any of the other disk clone systems is that to update a piece of software you must build and then re-clone the entire system.  By contrast with Ananconda, as long as the packages in the package repository are up to date, the system will be built with them.  This means no additional steps are required to bring the system up to the latest patches after building.

On the paid side of the equation, the one that seems to be leading the pack is Novell's Zenworks product.  It can use both snapshots(or images) or do an Anaconda derived install.   It will allow you to manage the packages and configurations on both Suse and Redhat Linux machines.  The configuration of the software includes the ability to setup and manage DHCP and PXE boot servers.  These two server types can combine to allow you to place a system on your network,  assign the new machine to a template type and grouping, and when it boots, create the server from scratch without any assistance from a person after switching on the power.  The software works well and is easy to configure and use. There is an agent that runs to allow you access to manage the configuration after the install.  This agent can be configured to alert on most of the common system problems like low disk space, and high CPU load.  In this role, it works best as a feeder system into a more robust logging and alerting system.

How do I keep all of my servers configuration complete and consistent?
On the paid side, I believe the best choice is the Novell Zenworks product.  Several others exist, but the cost per machine is much steeper and they generally do not offer any additional  features.  Several companies have gone so far as to just package one of the two configuration titles I mentioned on the free side and re-produce them as their own.

On the free side the two leaders for configuration file management are CFEngine and Puppet.  Both offer a framework of files, the flexibility to automate nearly any task, and agents for the systems to audit and verify that everything stays consistent after initial install.   If they are so similar than what is the difference?  The main difference is the syntax for the input or configuration files.  Having played with both files and formats, the Puppet teams software was much easier to work with and was faster at getting to a point of configuring systems.  They both have tutorials and seem to work, once they are configured.  Also, both pieces of software can be configured to observe, validate and then correct if needed, what the configuration should look like from a remote server and centralize your configuration.  Once you have the software set up, they will quickly become both your auditors dream and your savior.  When you can show the auditors that just because you changed a file, it does not mean it will stay that way, will make even the grumpiest of them at least a little more happy.  This type of system builds a tremendous level of confidence within your development and management ranks.

How long does it take to set all this up?

That really depends on the choices you make and your knowledge in the tools.  People new to systems like this will generally take a day or two to get the software installed and a first attempt at building a server going.  Getting to the state of complete management of all systems takes time and will depend on where you are in the system life cycle.  Spending the time when you are starting out and thinking through will pay itself back in weeks or months depending on the rate you are building.  Keeping it current after that is generally simple. 

Conclusion
Managing your systems with these tools and some simple scripts reduces staff in the long run while simultaneously increasing stability and consistency.  The bulk of the cost you will spend on these systems will be in the initial setup and configuration.  Once the majority of the servers are incorporated into the system, the number of changes will drop tremendously.  Even a server count as low as ten is more than enough to get a fast Return On Investment.

Give us your feedback in the comments by answering any of the following questions:
So what's your favorite system management tool?
Why do you prefer it?
What did we miss?



Brian Wagner

Brian started working with *nix in while a student at Kent State University in the early 90's. In 1995, as an E-Mail Administrator for Caliber Technology (now part of Fedex) he was tasked with administering Sendmail on both Slackware Linux and Solaris Systems. His first home install of Linux was MkLinux DR1 in 1996 on his 60 Mhz PowerMac. Since then Brian has been working and consulting on Linux and it's uses in the Enterprise to support everything from E-Mail, Firewalls, Web and File serving to custom cluster solutions and grid solutions. Brian has had the opportunity to work in both Fortune 500 companies and small 2 person organizations. This has given him the unique insight into the differences every size business faces.