We have an Openstack Pike system setup (6 servers) via kolla/docker containers. We recently had to restart the system due to a planned power outage and we were successful in getting it back up and running. A couple of months later we started having issues where any disk activity on one of our servers in the cluster was running into issues. We could not create any new VM's on it nor could we delete or rebuild any instances when they were located on this server. After researching it, we found via the gui System->System Information->Block Storage Services that this servers cinder-volume and cinder-backup were stuck in the DOWN state, yet when looking at each of these docker containers they appear to be up and running fine. Each container's logs had no errors and appears to be operating properly. We tried restarting the cinder-volume and cinder-backup containers on that server but this did not change the gui report which continues to show them in a DOWN state.
Does anyone have any suggestions as to what we can do to correct this state we are in? All research I have done for Openstack and Cinder have turned up nothing that seems relevant. Any suggestions are welcome. Thanks.
Today we had a chance to bring the servers down and attempt to fix the issue. We noticed that 3 of the 6 servers were using a different NTP server for setting the system time. As soon as we updated all 6 servers to use the same NTP Server, the Cinder Volumes changed back to the UP State and we were good to go. Lesson learned here was that all servers must be using the same ntp time as any time drift between servers can induce errors like the Cinder Volumes not working properly.