My requirements:
1) must be able to continue operating after node failure 2) logs must be recoverable after node failure - no data loss 3) must be able to scale out 4) must be transactional - when a message is logged, I need a guarantee that it is persisted to disk
This is similar to a previous question of mine, but I just realized that importance of the transactional feature. This is for a medical app; we cannot afford to lose any log messages.
Thanks!!
Reliable forwarding with rsyslog will get you started. The rsyslog docs also have pages describing how to write the data to a database and how to scale writing to a database.
Now this setup does not specifically handle automatic fail-over between multiple log servers. I personally didn't worry about it as each client sending logging data would queue the data on their end until the logging server was back up. And I had monitors in place that would notify me of the logging server being down.
IF you already had a DB system that had appropriate fail-over and high-availability setup, you could setup two logging servers and use a heartbeat system (perhaps linux-ha) to do automatic take-over of the IP from the live logging server.
From what you described above what you want is continuous-computing. There are 2 sort of software on windows based platform that can provide what you are looking for. I'm not too familiar with any transactional log applications. With both the below HA/FT solutions, you could just use about any out there. ( just as long they run in windows )
Neverfail is a HA solution that protects your application from any data loss. In an event of a server outage, failover between both servers are seemless ( does not require human interaction) and depending on how much data is still left in the memory that isn't written to I/O on the Active server, the Passive Server will take over operations. This would provide you with close to 99.99% uptime.
Marathon is similar like Neverfail but it has an added protection which is component protection. With their FT feature, if any failure were to happen to your sever, like a disk failure or even a network failure, your application will keep on running. Thus no data loss or interruption to business.