Some version info:
Operating system is Ubuntu 11.10, on EC2, kernel is 3.0.0-16-virtual and the application info is:
Version: 8.3.11 (api:88)
GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by buildd@allspice, 2011-07-05 19:51:07
Getting some strange errors in dmesg (seen below) as well, there is no replication happening. I have made my first node primary and its showing:
drbd driver loaded OK; device status:
version: 8.3.11 (api:88/proto:86-96)
srcversion: DA5A13F16DE6553FC7CE9B2
m:res cs ro ds p mounted fstype
0:r0 StandAlone Primary/Unknown UpToDate/DUnknown r----s ext3
my secondary node is showing:
drbd driver loaded OK; device status:
version: 8.3.11 (api:88/proto:86-96)
srcversion: DA5A13F16DE6553FC7CE9B2
m:res cs ro ds p mounted fstype
0:r0 StandAlone Secondary/Unknown Inconsistent/DUnknown r----s
Showing /proc/drbd on the master shows:
version: 8.3.11 (api:88/proto:86-96)
srcversion: DA5A13F16DE6553FC7CE9B2
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----s
ns:0 nr:0 dw:4 dr:1073 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:262135964
Showing /proc/drbd on the slave shows that there is nothing being transfered...
version: 8.3.11 (api:88/proto:86-96)
srcversion: DA5A13F16DE6553FC7CE9B2
0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown r----s
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:262135964
Here is my config...
resource r0 {
protocol C;
startup {
wfc-timeout 15;
degr-wfc-timeout 60;
}
net {
cram-hmac-alg sha1;
shared-secret "test123;
}
on drbd01 {
device /dev/drbd0;
disk /dev/xvdm;
address 23.XX.XX.XX:7788; # blocked out ip
meta-disk internal;
}
on drbd02 {
device /dev/drbd0;
disk /dev/xvdm;
address 184.XX.XX.XX:7788; #blocked out ip
meta-disk internal;
}
}
I have run the following on the master:
sudo drbdadm -- --overwrite-data-of-peer primary all
There is no firewall between the systems.
Here is the dmesg with some errors:
[2285172.969955] drbd: initialized. Version: 8.3.11 (api:88/proto:86-96)
[2285172.969960] drbd: srcversion: DA5A13F16DE6553FC7CE9B2
[2285172.969962] drbd: registered as block device major 147
[2285172.969965] drbd: minor_table @ 0xffff88000276ea00
[2285173.000952] block drbd0: Starting worker thread (from drbdsetup [1300])
[2285173.003971] block drbd0: disk( Diskless -> Attaching )
[2285173.006150] block drbd0: No usable activity log found.
[2285173.006154] block drbd0: Method to ensure write ordering: flush
[2285173.006158] block drbd0: max BIO size = 4096
[2285173.006165] block drbd0: drbd_bm_resize called with capacity == 524271928
[2285173.008512] block drbd0: resync bitmap: bits=65533991 words=1023969 pages=2000
[2285173.008518] block drbd0: size = 250 GB (262135964 KB)
[2285173.079566] block drbd0: bitmap READ of 2000 pages took 17 jiffies
[2285173.081189] block drbd0: recounting of set bits took additional 1 jiffies
[2285173.081194] block drbd0: 250 GB (65533991 bits) marked out-of-sync by on disk bit-map.
[2285173.081203] block drbd0: Suspended AL updates
[2285173.081210] block drbd0: disk( Attaching -> UpToDate )
[2285173.081214] block drbd0: attached to UUIDs 1C1291D39584C1D1:0000000000000004:0000000000000000:0000000000000000
[2285173.095016] block drbd0: conn( StandAlone -> Unconnected )
[2285173.095046] block drbd0: Starting receiver thread (from drbd0_worker [1301])
[2285173.099297] block drbd0: receiver (re)started
[2285173.099304] block drbd0: conn( Unconnected -> WFConnection )
[2285173.099330] block drbd0: bind before connect failed, err = -99
[2285173.099346] block drbd0: conn( WFConnection -> Disconnecting )
[2285173.295788] block drbd0: Discarding network configuration.
[2285173.295815] block drbd0: Connection closed
[2285173.295826] block drbd0: conn( Disconnecting -> StandAlone )
[2285173.295840] block drbd0: receiver terminated
[2285173.295844] block drbd0: Terminating drbd0_receiver
Edit:
Reading some other similar issues, it was suggested to do a 'drbdadm dump all', so I figured it couldn't hurt.
ubuntu@drbd01:~$ drbdadm dump all
/etc/drbd.conf:19: in resource r0, on drbd01:
IP 23.XX.XX.XX not found on this host.
and on slave:
root@drbd02:~# drbdadm dump all
/etc/drbd.conf:25: in resource r0, on drbd02:
IP 184.XX.XX.XX not found on this host.
Strange it doesn't find its own ip, however, this is an Amazon EC2 system using an elastic IP... here are my ipconfigs for both...
master:
ubuntu@drbd01:~$ ifconfig
eth0 Link encap:Ethernet HWaddr 22:00:0a:1c:27:11
inet addr:10.28.39.17 Bcast:10.28.39.63 Mask:255.255.255.192
inet6 addr: fe80::2000:aff:fe1c:2711/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1569 errors:0 dropped:0 overruns:0 frame:0
TX packets:1169 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:124409 (124.4 KB) TX bytes:213601 (213.6 KB)
Interrupt:26
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
slave:
root@drbd02:~# ifconfig
eth0 Link encap:Ethernet HWaddr 12:31:3f:00:14:9d
inet addr:10.160.27.107 Bcast:10.160.27.255 Mask:255.255.254.0
inet6 addr: fe80::1031:3fff:fe00:149d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:915 errors:0 dropped:0 overruns:0 frame:0
TX packets:774 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:75381 (75.3 KB) TX bytes:109673 (109.6 KB)
Interrupt:26
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
You actually did not need to run
sudo drbdadm -- --overwrite-data-of-peer primary all
AS long as /dev/drbdYou should have done the following
Step01)
sudo service mysql stop
on DRBD Primary so additional changes are not piled for DRBD to syncStep02)
sudo drbdadm connect all
on DRBD SecondaryStep03)
sudo cat /proc/drbd
on DRBD Secondary to make sure the connection stats isWFConnection
Step04)
sudo drbdadm connect all
on DRBD PrimaryStep05)
sudo cat /proc/drbd
on DRBD Primary to make sure the connection state isSyncTarget
.Step06)
sudo service mysql stop
on DRBD Primary so MySQL can get back up. The sync will continue. You do not have to wait for DRBD to be fully sync'd in Step 05.CAVEAT
DRBD should not be used over a geographic distance. I work with setups that have DRBD pairs connect via CrossOver Cables over 192.168.x.x..
Try following:
On primary node
On secondary node