After months of neglect, e-mail flames and management battles our current sysadmin was fired and handed over "the server credentials" to me. Such credentials consist in a root password and nothing else: no procedures, no documentation, no tips, nothing.
My question is: assuming he left boobytraps behind, how do I gracefully take over the servers with as little downtime as possible?
Here are the details:
- one production server located in a server farm in the basement; ubuntu server 9.x probably, with grsec patches (rumours I heard last time I asked the admin)
- one internal server that contains all internal documentation, file repository, wikis, etc. Again, ubuntu server, few years old.
Assume both servers are patched and up-to-date, so I'd rather not try to hack my way in unless there's a good reason (i.e. that can be explained to upper management).
The production server has a few websites hosted (standard apache-php-mysql), a LDAP server, a ZIMBRA e-mail suite/server, and as far as I can tell a few vmware workstations running. No idea what's happening in there. Probably one is the LDAP master, but that's a wild guess.
The internal server has an internal wiki/cms, a LDAP slave that replicates the credentials from the production server, a few more vmware workstations, and backups running.
I could just go to the server farm's admin, point at the server, tell them 'sudo
shut down that server please', log in in single user mode and have my way with it. Same for the internal server. Still, that would mean downtime, upper management upset, the old sysadmin firing back at me saying 'see? you can't do my job' and other nuisances, and most importantly I'd have to lose potentially a few weeks of unpaid time.
On the other end of the spectrum I could just log in as root and inch trough the server to try to make an understanding of what's happening. With all risks of triggering surprises left behind.
I am looking for a solution in the middle: try to keep everything running as it is, while understanding what's happening and how, and most importantly avoiding triggering any booby traps left behind.
What are your suggestions?
So far I thought about 'practicing' with the internal server, disconnecting the network, rebooting with a live cd, dumping the root file system into a USB drive, and load it on a disconnected, isolated virtual machine to understand the former sysadmin way of thinking (a-la 'know your enemy'). Could pull the same feat with the production server, but a full dump would make somebody notice. Perhaps I can just log in as root, check crontab, check the .profile for any commands that's launched, dump the lastlog, and whatever comes to mind.
And that's why I'm here. Any hint, no matter how small, would be greatly appreciated.
Time is also an issue: there could be triggers happening in a few hours, or a few weeks. Feels like one of those bad Hollywood movies, doesn't it?
As others have said that looks like a loose-loose situation.
(Starting at the end)
Of course you can't just take the servers down and let the installer do it's magic.
General Process
rm -rf $service
(sounds harsch but what I mean is decommission the service)What did you gain?
Been there done that, it's no fun at all :(
Why do you need to get it signed off by management?
Oh, and present the overall plan to them before you start, with some estimates about what will happen in the worst and best case.
It will cost a lot of time regardless of redeployment if you don't have documentation. There's no need to think of backdoors, IMHO if you don't have documentation a rolling migration is the only way to reach a sane state that will deliver value for the company.
First of all, if you're going to invest extra time in this I'd advise you to actually get paid for it. It seems you've accepted unpaid overtime as a fact, judging from your words - it shouldn't be that way, in my opinion, and specially not when you're in such a pinch because of someone else's fault (be it management, the old sysadmin or probably a combination of both).
Take the servers down and boot into single user mode (init=/bin/sh or 1 at grub) to check for commands that run on root's login. Downtime is necessary here, make it clear to management that there's no choice but some downtime if they want to be sure they will get to keep their data.
Afterwards look over all cronjobs, even if they look legit. Also perform full backups as soon as possible - even if this means downtime. You can turn your full backups into running VMs if you want.
Then if you can get your hands on new servers or capable VMs I would actually migrate the services to new, clean environments one by one. You can do this in several stages as to minimize perceived downtime. You'll gain much needed in-depth knowledge of the services while restoring your confidence in the base systems.
In the meantime you can check for rootkits using tools as chkrootkit. Run nessus on the servers to look for security holes that the old admin may use.
Edit: I guess I didn't address the "gracefully" part of your question as well as I could. The first step (going into single user mode to check for login traps) can be probably skipped - the old sysadmin giving you the root password and setting up the login to do a
rm -rf /
would be pretty much the same that deleting all files himself, so there's probably no point on doing that. As per the backup part: try using anrsync
based solution so you can do most of the initial backup online and minimize downtime.Do you have reason to believe that the previous admin left something bad behind, or do you just watch a lot of movies?
I'm not asking to be facetious, I'm trying to get an idea what sort of threat you think is there and how probable it is. If you think the chances really are very high that some sort of seriously disruptive problem might really exist then I'd suggest treating it as if it were a successful network intrusion.
In any case, your bosses don't want the disruption of downtime while you deal with this - what is their attitude to planned downtime to tidy systems up vs. unplanned downtime if there is a fault in the system (whether a real fault or a rogue admin) and if their attitude is realistic vs. your assessment of the probability that you will really have a problem here.
Whatever else you do, consider the following:
Take an image of the systems right now. Before you do anything else. In fact, take two and put one aside and don't touch it again until you know what, if anything, is happening with your system, this is your record of how the system was when you took it over.
Restore the "2nd" set of images to some virtual machines and use these to probe what is going on. If you're worried about things being triggered after a certain date then set the date forward a year or so in the virtual machine.
I'll invest time in learning what apps run on those servers. After you know what is what at any time you can install a new server. In case you feel that may be some backdoor it will be a good idea to Just boot in single mode or have some firewall in between the servers and The external net.
You are getting paranoid about security. There is no need to get paranoid. (b'cos you talk about booby traps). Go through the software list installed. See what are the service running (netstat, ps , etc), see cron jobs. Disable the previous sys admin user account without deleting the account (easily done by pointing the shell to nologin). See through the log files. I think with these steps and from your knowledge of company needs from which you can guess the use of the servers , i think you should be able to maintain them without any major goofups.