I search a job management solution for local processes. Usually they run for some weeks. At the moment I use a jenkins, but there the server is not restartable (security updates) and there is no redudancy. If one server goes offline, all the jobs should be rebalanced to the online servers. It is okay to just start the script again with the same parameters, but it should be possible to disable this behavior. Also it should be easy to add/remove new servers.
I dont need a full solution for everything, but I search for a software like this and did not really find, what I was looking for. I appreciate any hints (also search keywords) pointing to the right direction. I basically just found CI software, but I want a server fault tolerant solution.