Symptoms : our mail server, a dedicated Fedora server with zimbra, stopped working, in such a way we couldn't even log in with SSH. Soft reboots, through our web admin console, did make it work again, for 6-8 hours, and came crashing down. Putting Zimbra services and the whole server on autorestart didn't help.
These symptoms lasted during a couple of months. Now the machine came back working mostly correctly.
The only suspicious thing I found is that during these faults, the server wasn't able to find a route to himself, but our DNS was still up and able to find these names. Googling for similar symptoms provided little help, and our ISP was even less friendly.
The thing is I have little clue where to begin my search for the causes of these breakdowns, in order to prevent them. Where should I start ?
I suppose it could have something to do with the machine "not being able to find itself". I make a practice of putting the machine's own IP address in /etc/hosts, so that whether a particular service is bound to 127.0.0.1 XOR to the external address, it will be able to connect to that service.
I agree, you have provided very little info, and a very broad problem description.
If EVERYTHING stopped working, then perhaps it was a NIC failure, or routing problem?
It is hard to give any conclusive answer based on the little you provide. It seems what you require is a System Administrator. Hire one.