Question
When changing network configurations remotely, is there a way for the networking to attempt to use a different configuration file in the case of a failure?
Background - tldr;
I've been searching around but I'm not really seeing any references to doing something like passing a file to ifup, although saying that gave me an idea to check the man page for ifup, but irregardless. I can't test it right now.
Our server has been moved to the datacenter, while I am here working in a different city. Networking is not my strong suit, and after installing I wanted to bond the two nics together to improve throughput. But in doing so I lost connectivity as the networking interface failed initialization.
I had attempted in setting up the bond to have in /etc/sysconfig/network-scripts
bond0 : Taking eth0 and eth1 eth0 : Set up to bond eth1 : Set up to bond and then eth1:1 thinking I can bind an ip to that just in case i do get blocked out again...
Unfortunately this didn't work, and the only person qualified enough to go to the datacenter to do the support is my boss. Not a good situation. (And I had tested it twice on a virtual server just to make sure I wouldn't lost connectivity)
Now, we have it bonded but there is no way to do a "just in case" configuration so far as I can tell...
I thus today needed to bridge the connection for the vm inside the server.... Low and behold, I lost connectivity again and it's the second trip my boss will make this month to the datacenter. :facepalm:
There's gotta be a way, where if the interface isn't detected as up that the networking will use a completely different set of config files, a failsafe, if you will, so that after the failed networking attempt, a cron job running every five minutes would reestablish the network connection to the failsafe if the network is down.
I wish I had access to a linux box right now, but I usually check the network by doing a service network restart command. Is there a way to give it a failsafe command that if the network is not detected, that it will in turn, try a different failsafe configuration, until it is up.
tl;dr: Go with OOB, look at config management, or you'll need to build your own solution.
I'm not familiar with anything pre-built in linux-land to do this sort of thing - IPMI/ILOM/OOB is usually the way to go. Not only would you have remote console access to the host, but you can also (usually) check the status of the hardware, issue remote reboots if its hard locked, etc etc.
If OOB is not an option, you could consider setting up a cron job to check various scenarios and determine if your host in an unreachable state, and perform tasks to try and recover itself.
There are big risks with this, of course. You have to consider a lot of different scenarios - say you want to check to make sure you can hit your gateway IP address, but your gateway goes away briefly - you don't want your host re-configuring its interface if its not a problem with your box, but something upstream.
There's also the option of configuration management control which you could configure to restore your local machine to an expected state/verify its in an expected state every hour, etc - you'd have to configure those applications to use a local copy of configuration files rather than trying to talk to a remote server, but it is possible. This may be a little excessive depending on how many systems you are managing (and if it's more than 5, I highly suggest looking into config management in general, it will save you a LOT of time).
If you do feel like you want to go the route of having some script on the box monitor for changes, I highly recommend you set it up in a dry run mode for quite some time. This way you could have it log when it would have thought it needed to reconfigure the network interface, allowing you to debug/test/sanity check the functionality before you throw it into service.
Even better, you could have a 2nd or 3rd interface (since you're wanting to bond) cabled up on your host and either never touch that interface configuration or have your script only try to restore itself to service using that interface - that way if it goes haywire, its not potentially mucking with interfaces that it thinks are bad, but just the 3rd interface which you only use for this purpose.