I work at a dot com and part of our team's responsibility is to maintain the production web application and server farm. Only recently did our department even get created, and now we have a huge amount of catchup patching servers, and implementing monitoring and backups.
To start on this monster we've broken it down into phases, and as part of our first phase, we are reinstalling OS'es on several servers getting them updated from old Redhat 8 (not fedora 8) OS installs. As a webapp, the servers need to run apache and php. The modules that need to be compiled into these programs are documented, and an old build process for compiling is documented.
As sys admins, what do you guys out there expect to have documented, and what should you be documenting? Since both build process and documentation need to be updated, what is the best way to go about laying out the items that need to be done? Should defining the steps be part of the sys-admin's job, or part of the technical manager's job? Is this part of the qualification of being a "senior unix engineer" vs a junior engineer? What standard would you want to be held to evaluating your performance on a project like this if it would affect your performance review?
Edit: The application is under continuous development. A majority of it was written in PHP4 and continues to run on PHP4, however, a newer code running as a web service runs as PHP5. So on the same boxes there is both a php4 and a PHP5 installation. The modules required for each build are documented. The sysadmin has that doc.
If it's a unique problem, how can you measure whether the problem lay in the person or the problem?
You should be documenting everything that would be required to get your department running if half your people are killed/fired/etc...if you needed to rebuild the department with new admins, they should be able to get things running again at a new location with your documentation.
In practice...hee! Yeah, right. You're lucky if the docs are kept up to date if they're even created in most places.
If you're managing the monster tasks perhaps you need to just meet up with your admins and ask how things are going and what's been tried. If in this three weeks he's been tasked with just this problem and it's not getting solved, is it because he's not working on it? What has he tried to rectify the issue?
You can't micromanage the issue or he'll probably start fighting you on it. The sysadmins need enough freedom to work without feeling like he's being scrutinized every step. But if the project or task is really far behind, then you have a legitimate concern. Find out from him if there's something he needs in order to get the job done, or what the problem is that he is having difficulty overcoming.
Good book: Managing Humans by Michael Lopp.
Performance should be based on how well IT issues are addressed to meet the needs of users, along with maintenance of the servers and infrastructure issues. You can't possibly reduce the issue down to "solving X issues a day" or "writing X lines of code" to measure each employee.
Maybe you can get input from others on the team to get some feedback on how each other is doing or what major needs are. Good techies want to work with good techies. They don't want to work with people that are "happy and nice" but incompetent. They'll work with a grumpy curmudgeon who hates being in the room with them if it means that everything works well and the curmudgeon knows his stuff.
Old Stuff (Legacy) Can be Hard:
If I read correctly, the you have old builds of software and are trying to get it running on recent OS buildings. Red hat 8 is 7 years old now, so I would say the application should be updated too (Maybe these modules haven't been updated since then). So it sounds like a difficult mess as you say.
Documenting and Expectations:
It depends, but you really should lay out what you expect in general. Make everything you want very clear. Then you should be able to trust the admin to follow through with that and update you if they can't for some reason. You can check in with them, and make sure they are doing this stuff. System administration is odd in that it varies greatly from position to position, so it might take some time to get them to understand what you expect from them.
My Recommendation, Communicate!:
I think we can't tell you if these are hard problems are not. Developers should not be that far off from system administrators, so if you are having issues, get a developer you trust to sit down with the Admin and help him solve these problems. That developer should be able to give some feed back.
Regarding updating Everything:
Some thoughts that may or may not be useful:
I'd say that if your sysadmin can't get a custom OS installation completed after 3 weeks, either he/she is incompetent or else you're somehow confusing him/her, thus resulting in endless delays. In the scenario that you described, a basic/foundation workflow should be: management and/or deployment team comes up with a list of requirements and dependencies. The requirements would include timeframe, scalability, fault tolerance, robustness, availability thresholds, etc. Dependencies would cover what applications need to run on the server and, optionally, what software is required to support those applications. The sysadmin could possibly handle the latter unless you had very specific, known needs regarding software and software versions. Either way, it should all be documented, with approval processes in place so that the "guy down the hall" can't make changes behind people's backs and end up messing with the sysadmin's workflow and expectations. Once all the information is given to the sysadmin, he/she should be able to provide a more or less solid time estimate.
From what you've said, it sounds like this person isn't even testing the builds to see if everything works. In an ideal environment, a set of test scripts would be in place so that a build can be verified as correct or not by running said scripts. They would verify not only functionality but also whether or not the right software versions have been included (this includes system and application libraries). In larger environments, it's not uncommon to have an entire team devoted to performance testing, as well, so that once a server and its installed apps have been deployed, you can be sure that it will function and scale as well as, if not better than, in a lab or staging environment. That's another thing: a staging environment is key. You could have policies in place that require that servers transition from a lab environment to a staging environment and finally to a production environment.
I don't mind if a sysadmin takes time to carefully study things so that when a server is put into production, it works perfectly. I used to know a guy who did that. It wasn't that he was incompetent; rather, he was aware of the seriousness of failed deployments, and so he took a little extra time to make 100% sure that everything was kosher. His reputation so far is nearly impeccable, and I'd recommend him to any system administration team. However, repeated slip-ups on trivial tasks should raise orange (not yet red) flags. A basic sysadmin should know his operating systems and commonly used application libraries, so that when it comes time to build a system, there are very few questions in his/her mind about which OS to use and which libraries and applications to deploy. As far as a custom server build for a set of custom applications, it would take me about 1-2 days to get the base installation and configuration (plus performance tweaks, hardening, etc.) completed. After that, it would depend on what needs to get installed. The greater the number of software requirements, the more time it's going to take to build, install and test, and maybe that's what's holding up your sysadmin. I can't say that for sure, though, since you didn't provide enough information.
I hope that helps.
Michael
Great answers above. I would especially like to stress this point from Bart's post:
This is absolutely vital for some business practices and it should be a requirement, not an option. What if "the only one who knows vital system XYZ" quits on you, or has to be fired? People are people -- these things do happen. Document major systems and processes, any special requirements, caveats, which servers are responsible for what. That's at least the basics -- most decent admins will figure out the smaller details as part of their job.
However, as echoed above, in 'real life' , you would indeed be lucky to have these documents created, much less current and correct. IMO it's worth it to pull an admin off the project and task him with catching up on the documentation of it, if that is possible.
Hope things work out.
The guy is probably freaking out because it sounds like your IT environment is a nightmare, based on your brief explanation of how things work.
I'd be willing to bet a nickel that the instructions that your SA is getting from the devs/business unit-type people are terrible as well. Get somebody to sit between the people submitting requests and the guy doing to work. Let them reject the requests that don't make sense and the document what's being done.
Einstein said: "Insanity is doing the same thing over and over again and expecting different results"
I've done a lot of sysadmin work for startups, and I have to say that old documentation is worse than no documentation. I can't count the times I looked over the existing system documentation for some idea of how things are stitched together only to find that the system has been totally re-architected.
This situation usually arises when a sysadmin leaves the company and their last task is to document the systems. With one foot out the door the quality of information generated is often poor. And if the sysadmin isn't replaced right away (the usual case) the systems are usually managed by the least-compentent and/or junior developer (since he has the time). Which means the systems can get out of sync, undocumented, and -- in the worse case -- vary from machine to machine (a real pain with a cluster of web apps where one is different from the others).
I abhor wiki-syntax, but I like system documentation to reside in a wiki so I at least have a time-stamp and a name of who documented what and when. A MediaWiki installation is trivial to setup and perfect for system stuff.
As for how good your sr. sysadmin is, it's hard to say. A lot of us suck and a lot of us fade into the background just doing our job. And we all have our bad days.
Not long ago I spent an insane amount of time (like days) trying to get Ganglia to compile on a 64-bit machine only to find it was a bug in the linking. I'm sure I looked like a complete idiot to those folks...
Most sr. sysadmin are pretty good coders, in my experience. Figuring out compile options to get thing to work shouldn't be a problem, unless it's something non-obvious. It sounds like your sysadmin has everything needed to do the job, but the devil is in the details.
My advice -- be direct and ask what the problem is. And check out the "Managing Humans" book someone else suggested -- it's very good.