I've been trying to understand exactly how Heartbeat works - I understand how when one server dies, it switches to the backup. But, for me, it also switches when the primary has a large increase in workload. But, it doesn't always switch at the same value. There doesn't seem to much information on the web about how it works. The best I've found is this article.
How does Heartbeat determine when to switch to the secondary, and how does it determine when it switch back to the primary? Is this an editable setting, and can I force it to switch between one and the other? Sometimes when Heartbeat will switch to the secondary, it takes a few days or I've even seen two weeks before it switches back to the primary. This is well after the primary traffic has gone down.
I'm currently using BlueOnyx, and my Heartbeat settings are:
Auto Failback: on
Keepalive: 1 seconds
Warntime: 10 seconds
Deadtime: 20 seconds
Initdead: 30 seconds
Normally heartbeat fails over if all heartbeat-lines, ping-nodes and ping-groups are down (or if heartbeat thinks they are down).
In your setup this will happen after 20s no response from any of these methods.
I can not answer your question about auto-failback, since I always turn that off to avoid ping-pongs.
If there is a failover I have to investigate, remove the reason for that failover and then fail-back manually (on a planned downtime).