Senior systems administrators on any platform know that automation is the single fastest way to improve the effectiveness of their team. Scripts provide stability, repeatability and reduce the time spent on often repeated tasks. If done correctly, automation will make everything more stable and manageable.
However, scripts for managing systems can be a double edged sword. On one hand, they make a team highly efficient. They can help junior admins perform far above their experience level and free senior admins up to investigate more difficult problems. On the other hand though, they can lead to a loss of knowledge. The knowledge it took to create the scripts becomes locked inside of them. So what do you do to strike the proper balance? How can you keep the knowledge fresh in every-one’s mind while still automating? What steps can be taken to avoid knowledge erosion and worse the brain drain or vacuum that is left when people leave?
The first thing to remember is that there is no one thing that can be done to answer these questions. Here we will provide you with some tips and ideas we have found to be useful and effective. This is a short list and we hope that it will inspire you to think about what might work for you and your company.
The first item is well documented scripts and procedures. Taking 5 minutes to write up what you were thinking when you wrote the script can save you days trying to figure it out later. As more object oriented scripting languages like Python, Ruby and Perl take hold, it becomes easier to break down complex scripts into much easier and digestable chunks. These smaller chunks, like the core ideas behind Linux, should do one thing and do it well. The names of the functions should describe what they do. For instance, a function called createNewSSHKeys, should probably create new SSH keys. This combined with an explanation of what you were trying to do inside the function will help you and others manage them. When you get really good at this way of thinking, people should be able to take your function calls out and write a manual procedure that could replace your automation. If that is your goal, then it only makes sense that starting with a well documented procedure to compare against when your done scripting makes sense. It is unlikely that every procedure step will match a function or series of function calls. Getting everything close does count though.
As much as self documenting scripts helps though, documenting configuration files for your scripts can keep things fresh in peoples mind. At the very least, if done correctly, it will give them a breadcrumb trail to follow to see if what they think is being set is set. We recently began testing out Puppet, an automated way to manage server configuration files and other admin related tasks. The configuration files for Puppet can be used as a great example. They allow you to use a combination of intelligent names and comments to inform the person reading the file what will be changed. They also include a description of where to look to verify that the changes are being done correctly. This means that I don’t need to know Ruby, the language Puppet is written in, to figure out how or what its going to do. The configuration file itself tells me everything I need to know. When you write your own script, the time it takes to do this may not be warranted. So at the very least, make sure that you have comments that tell people where to look for the output based on these configurations or what the configurations mean in the file.
Try to keep everyone with the sharp skills needed so they are ready to slice through problems as they arise. This also means internal training. One of the things we have participated in on a regular basis is a short one hour refresher put on by the subject matter experts(SME) for each of the technologies we use. Doing this accomplishes a few different things at once. It helps the SME keep their documentation current. It gives the SME an opportunity to share changes they want to make or have made in the environment. Then it gives everyone supporting the environment a chance to ask questions about the technology when there is no pressure. When possible, annual reviews of each area that a team supports, goes a long way towards elevating the teams ability to be as productive as possible.
While you can never completely prevent brain drain when a team member leaves, the steps above, if done correctly, can go a long way. Having been the person transitioned to more than once, the better these steps are followed, the better we have felt about taking on the responsibility. Another side effect of these approaches and others along the same thought process is that it allows people to migrate from one SME area to another. This helps people stay fresh and keeps them from becoming bored and complaisant. The more driven your team is to solve businesses problems, the more profitable you will be.