This problem belongs to a KVM node on which the VMs get their storage via LVM. So each VM has it's own Logical Volume. Every night some VMs are backed up (snapshot - dd [..] | ssh [..]
- nothing special). However, last night this somehow fucked up the LVM system. 2-3 minutes after the second backup started, the kernel began to log "hung tasks" - in short it reported three qemu-kvm processes as hanging and the dd process. At least one of the VMs (which is a managed server, so monitored by us) went down - to be more precisely: It was still running, but the services did not answer anymore. VNC showed hung tasks within the VM. After a hard reset (and a migration - see below) the VM was fine, but the dd
process never terminated (kill -9
does nothing) and commands like lvdisplay
don't work anymore - they just give out nothing. The lvmetad
also can't be restarted and every process which belongs to LVM can't be killed. They just hang in disk state forever, while the node generally runs fine. The VM which went down had to be migrated to another node as virsh shutdown
also didn't work anymore - "device or resource busy". But the other VMs also keep doing their job.
We had this on another node a few weeks ago, where the "snapshotted" VM also went down, performed a kernel upgrade from 4.4 to 4.9 (as we had to reboot the machine anyway) and didn't see a problem like this again. But as the node which showed the problem today has got an uptime of two months, this doesn't say that this is really fixed. So - can anyone see more in this logs than we do? It would be greatly appreciated. Thanks for reading!
Apr 28 00:37:15 vnode19 kernel: INFO: task qemu-kvm:32970 blocked for more than 120 seconds.
Apr 28 00:37:15 vnode19 kernel: Not tainted 4.4.51 #1
Apr 28 00:37:15 vnode19 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 28 00:37:15 vnode19 kernel: qemu-kvm D ffff88734767f908 0 32970 1 0x00000080
Apr 28 00:37:15 vnode19 kernel: ffff88734767f908 ffff880166d65900 ffff887048ef0000 ffff887347680000
Apr 28 00:37:15 vnode19 kernel: 0000000000000000 7fffffffffffffff 0000000000000000 ffff88492b5b8a00
Apr 28 00:37:15 vnode19 kernel: ffff88734767f920 ffffffff816b2425 ffff887f7f116cc0 ffff88734767f9d0
Apr 28 00:37:15 vnode19 kernel: Call Trace:
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b2425>] schedule+0x35/0x80
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b5137>] schedule_timeout+0x237/0x2d0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81309826>] ? generic_make_request+0x106/0x1d0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b1b96>] io_schedule_timeout+0xa6/0x110
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8124b537>] do_blockdev_direct_IO+0xca7/0x2d20
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81247290>] ? I_BDEV+0x20/0x20
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8124d5f3>] __blockdev_direct_IO+0x43/0x50
Apr 28 00:37:15 vnode19 kernel: [<ffffffff812479d8>] blkdev_direct_IO+0x58/0x80
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81190a3d>] generic_file_direct_write+0xad/0x170
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81190bc2>] __generic_file_write_iter+0xc2/0x1e0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81247cd0>] blkdev_write_iter+0x90/0x130
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81247c40>] ? bd_unlink_disk_holder+0xe0/0xe0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8120d8a1>] do_readv_writev+0x1f1/0x2b0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff811308cf>] ? __audit_syscall_entry+0xaf/0x100
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8120d9e9>] vfs_writev+0x39/0x50
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8120e9e8>] SyS_pwritev+0xb8/0xe0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b5fee>] entry_SYSCALL_64_fastpath+0x12/0x71
Apr 28 00:37:15 vnode19 kernel: INFO: task qemu-kvm:33655 blocked for more than 120 seconds.
Apr 28 00:37:15 vnode19 kernel: Not tainted 4.4.51 #1
Apr 28 00:37:15 vnode19 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 28 00:37:15 vnode19 kernel: qemu-kvm D ffff886a1dd23908 0 33655 1 0x00000080
Apr 28 00:37:15 vnode19 kernel: ffff886a1dd23908 ffff8875c6e442c0 ffff88582127ac80 ffff886a1dd24000
Apr 28 00:37:15 vnode19 kernel: 0000000000000000 7fffffffffffffff 0000000000000000 ffff886d0d021e00
Apr 28 00:37:15 vnode19 kernel: ffff886a1dd23920 ffffffff816b2425 ffff887f7f496cc0 ffff886a1dd239d0
Apr 28 00:37:15 vnode19 kernel: Call Trace:
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b2425>] schedule+0x35/0x80
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b5137>] schedule_timeout+0x237/0x2d0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81309826>] ? generic_make_request+0x106/0x1d0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b1b96>] io_schedule_timeout+0xa6/0x110
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8124b537>] do_blockdev_direct_IO+0xca7/0x2d20
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81247290>] ? I_BDEV+0x20/0x20
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8124d5f3>] __blockdev_direct_IO+0x43/0x50
Apr 28 00:37:15 vnode19 kernel: [<ffffffff812479d8>] blkdev_direct_IO+0x58/0x80
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81190a3d>] generic_file_direct_write+0xad/0x170
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81190bc2>] __generic_file_write_iter+0xc2/0x1e0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81247cd0>] blkdev_write_iter+0x90/0x130
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81247c40>] ? bd_unlink_disk_holder+0xe0/0xe0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8120d8a1>] do_readv_writev+0x1f1/0x2b0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff811308cf>] ? __audit_syscall_entry+0xaf/0x100
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8120d9e9>] vfs_writev+0x39/0x50
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8120e9e8>] SyS_pwritev+0xb8/0xe0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b5fee>] entry_SYSCALL_64_fastpath+0x12/0x71
Apr 28 00:37:15 vnode19 kernel: INFO: task qemu-kvm:33661 blocked for more than 120 seconds.
Apr 28 00:37:15 vnode19 kernel: Not tainted 4.4.51 #1
Apr 28 00:37:15 vnode19 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 28 00:37:15 vnode19 kernel: qemu-kvm D ffff8855341f3728 0 33661 1 0x00000080
Apr 28 00:37:15 vnode19 kernel: ffff8855341f3728 ffff880166d642c0 ffff886916a4c2c0 ffff8855341f4000
Apr 28 00:37:15 vnode19 kernel: ffff880d40fc8c18 ffff880d40fc8c00 ffffffff00000000 fffffffe00000001
Apr 28 00:37:15 vnode19 kernel: ffff8855341f3740 ffffffff816b2425 ffff886916a4c2c0 ffff8855341f37d0
Apr 28 00:37:15 vnode19 kernel: Call Trace:
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b2425>] schedule+0x35/0x80
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b4c05>] rwsem_down_write_failed+0x1f5/0x320
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81343233>] call_rwsem_down_write_failed+0x13/0x20
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b44ad>] ? down_write+0x2d/0x40
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa06dfdfe>] __origin_write+0x6e/0x210 [dm_snapshot]
Apr 28 00:37:15 vnode19 kernel: [<ffffffff811918ae>] ? mempool_alloc+0x6e/0x170
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa06e0007>] do_origin.isra.14+0x67/0x90 [dm_snapshot]
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa06e0092>] origin_map+0x62/0x80 [dm_snapshot]
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa04acf8a>] __map_bio+0x3a/0x110 [dm_mod]
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa04ae73f>] __split_and_process_bio+0x24f/0x3f0 [dm_mod]
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa04ae94a>] dm_make_request+0x6a/0xd0 [dm_mod]
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81309826>] generic_make_request+0x106/0x1d0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81309967>] submit_bio+0x77/0x150
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81300deb>] ? bio_alloc_bioset+0x1ab/0x2d0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8124ccb7>] do_blockdev_direct_IO+0x2427/0x2d20
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81247290>] ? I_BDEV+0x20/0x20
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8124d5f3>] __blockdev_direct_IO+0x43/0x50
Apr 28 00:37:15 vnode19 kernel: [<ffffffff812479d8>] blkdev_direct_IO+0x58/0x80
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81190a3d>] generic_file_direct_write+0xad/0x170
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81190bc2>] __generic_file_write_iter+0xc2/0x1e0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81247cd0>] blkdev_write_iter+0x90/0x130
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8120cf59>] __vfs_write+0xc9/0x110
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8120d5b2>] vfs_write+0xa2/0x1a0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8120e537>] SyS_pwrite64+0x87/0xb0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b5fee>] entry_SYSCALL_64_fastpath+0x12/0x71
Apr 28 00:37:15 vnode19 kernel: INFO: task dmeventd:33781 blocked for more than 120 seconds.
Apr 28 00:37:15 vnode19 kernel: Not tainted 4.4.51 #1
Apr 28 00:37:15 vnode19 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 28 00:37:15 vnode19 kernel: dmeventd D ffff8803493b7af8 0 33781 1 0x00000080
Apr 28 00:37:15 vnode19 kernel: ffff8803493b7af8 ffff880166da1640 ffff880b15a50000 ffff8803493b8000
Apr 28 00:37:15 vnode19 kernel: ffff880d40fc8c18 ffff880d40fc8c00 ffffffff00000000 fffffffe00000001
Apr 28 00:37:15 vnode19 kernel: ffff8803493b7b10 ffffffff816b2425 ffff880b15a50000 ffff8803493b7b98
Apr 28 00:37:15 vnode19 kernel: Call Trace:
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b2425>] schedule+0x35/0x80
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b4c05>] rwsem_down_write_failed+0x1f5/0x320
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81343233>] call_rwsem_down_write_failed+0x13/0x20
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b44ad>] ? down_write+0x2d/0x40
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa06df172>] snapshot_status+0x82/0x1a0 [dm_snapshot]
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa04b51a6>] retrieve_status+0xa6/0x1b0 [dm_mod]
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa04b6363>] table_status+0x63/0xa0 [dm_mod]
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa04b6300>] ? dm_get_live_or_inactive_table.isra.3+0x30/0x30 [dm_mod]
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa04b6015>] ctl_ioctl+0x255/0x4d0 [dm_mod]
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81095806>] ? __dequeue_signal+0x106/0x1b0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81095a1b>] ? recalc_sigpending+0x1b/0x50
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa04b62a3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81220872>] do_vfs_ioctl+0x2d2/0x4b0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff811308cf>] ? __audit_syscall_entry+0xaf/0x100
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81220ac9>] SyS_ioctl+0x79/0x90
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b5fee>] entry_SYSCALL_64_fastpath+0x12/0x71
Apr 28 00:37:15 vnode19 kernel: INFO: task dd:33790 blocked for more than 120 seconds.
Apr 28 00:37:15 vnode19 kernel: Not tainted 4.4.51 #1
Apr 28 00:37:15 vnode19 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 28 00:37:15 vnode19 kernel: dd D ffff885238e1f828 0 33790 33746 0x00000080
Apr 28 00:37:15 vnode19 kernel: ffff885238e1f828 ffff883f77ce42c0 ffff884a64088000 ffff885238e20000
Apr 28 00:37:15 vnode19 kernel: ffff880d40fc8c18 ffff880d40fc8c00 ffffffff00000000 fffffffe00000001
Apr 28 00:37:15 vnode19 kernel: ffff885238e1f840 ffffffff816b2425 ffff884a64088000 ffff885238e1f8d0
Apr 28 00:37:15 vnode19 kernel: Call Trace:
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b2425>] schedule+0x35/0x80
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b4c05>] rwsem_down_write_failed+0x1f5/0x320
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81343233>] call_rwsem_down_write_failed+0x13/0x20
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b44ad>] ? down_write+0x2d/0x40
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa06e0d32>] snapshot_map+0x62/0x390 [dm_snapshot]
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa04acf8a>] __map_bio+0x3a/0x110 [dm_mod]
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa04ae73f>] __split_and_process_bio+0x24f/0x3f0 [dm_mod]
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa04ae94a>] dm_make_request+0x6a/0xd0 [dm_mod]
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81309826>] generic_make_request+0x106/0x1d0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81309967>] submit_bio+0x77/0x150
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8124d6ba>] mpage_bio_submit+0x2a/0x40
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8124e0b0>] mpage_readpages+0x130/0x160
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81247290>] ? I_BDEV+0x20/0x20
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81247290>] ? I_BDEV+0x20/0x20
Apr 28 00:37:15 vnode19 kernel: [<ffffffff811e0428>] ? alloc_pages_current+0x88/0x120
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81247add>] blkdev_readpages+0x1d/0x20
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8119bfbc>] __do_page_cache_readahead+0x19c/0x220
Apr 28 00:37:15 vnode19 kernel: [<ffffffff810b4c39>] ? try_to_wake_up+0x49/0x3d0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8119c175>] ondemand_readahead+0x135/0x260
Apr 28 00:37:15 vnode19 kernel: [<ffffffffa04ae0aa>] ? dm_any_congested+0x4a/0x50 [dm_mod]
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8119c30c>] page_cache_async_readahead+0x6c/0x70
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81190748>] generic_file_read_iter+0x438/0x680
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81215e79>] ? pipe_write+0x3d9/0x430
Apr 28 00:37:15 vnode19 kernel: [<ffffffff81247da7>] blkdev_read_iter+0x37/0x40
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8120ce56>] __vfs_read+0xc6/0x100
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8120d45f>] vfs_read+0x7f/0x130
Apr 28 00:37:15 vnode19 kernel: [<ffffffff8120e2d5>] SyS_read+0x55/0xc0
Apr 28 00:37:15 vnode19 kernel: [<ffffffff816b5fee>] entry_SYSCALL_64_fastpath+0x12/0x71
Apr 28 00:39:15 vnode19 kernel: INFO: task qemu-kvm:32970 blocked for more than 120 seconds.
Apr 28 00:39:15 vnode19 kernel: Not tainted 4.4.51 #1
Apr 28 00:39:15 vnode19 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 28 00:39:15 vnode19 kernel: qemu-kvm D ffff88734767f908 0 32970 1 0x00000080
Apr 28 00:39:15 vnode19 kernel: ffff88734767f908 ffff880166d65900 ffff887048ef0000 ffff887347680000
Apr 28 00:39:15 vnode19 kernel: 0000000000000000 7fffffffffffffff 0000000000000000 ffff88492b5b8a00
Apr 28 00:39:15 vnode19 kernel: ffff88734767f920 ffffffff816b2425 ffff887f7f116cc0 ffff88734767f9d0
Apr 28 00:39:15 vnode19 kernel: Call Trace:
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b2425>] schedule+0x35/0x80
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b5137>] schedule_timeout+0x237/0x2d0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81309826>] ? generic_make_request+0x106/0x1d0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b1b96>] io_schedule_timeout+0xa6/0x110
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8124b537>] do_blockdev_direct_IO+0xca7/0x2d20
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81247290>] ? I_BDEV+0x20/0x20
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8124d5f3>] __blockdev_direct_IO+0x43/0x50
Apr 28 00:39:15 vnode19 kernel: [<ffffffff812479d8>] blkdev_direct_IO+0x58/0x80
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81190a3d>] generic_file_direct_write+0xad/0x170
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81190bc2>] __generic_file_write_iter+0xc2/0x1e0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81247cd0>] blkdev_write_iter+0x90/0x130
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81247c40>] ? bd_unlink_disk_holder+0xe0/0xe0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8120d8a1>] do_readv_writev+0x1f1/0x2b0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff811308cf>] ? __audit_syscall_entry+0xaf/0x100
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8120d9e9>] vfs_writev+0x39/0x50
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8120e9e8>] SyS_pwritev+0xb8/0xe0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b5fee>] entry_SYSCALL_64_fastpath+0x12/0x71
Apr 28 00:39:15 vnode19 kernel: INFO: task qemu-kvm:33655 blocked for more than 120 seconds.
Apr 28 00:39:15 vnode19 kernel: Not tainted 4.4.51 #1
Apr 28 00:39:15 vnode19 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 28 00:39:15 vnode19 kernel: qemu-kvm D ffff886a1dd23908 0 33655 1 0x00000080
Apr 28 00:39:15 vnode19 kernel: ffff886a1dd23908 ffff8875c6e442c0 ffff88582127ac80 ffff886a1dd24000
Apr 28 00:39:15 vnode19 kernel: 0000000000000000 7fffffffffffffff 0000000000000000 ffff886d0d021e00
Apr 28 00:39:15 vnode19 kernel: ffff886a1dd23920 ffffffff816b2425 ffff887f7f496cc0 ffff886a1dd239d0
Apr 28 00:39:15 vnode19 kernel: Call Trace:
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b2425>] schedule+0x35/0x80
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b5137>] schedule_timeout+0x237/0x2d0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81309826>] ? generic_make_request+0x106/0x1d0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b1b96>] io_schedule_timeout+0xa6/0x110
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8124b537>] do_blockdev_direct_IO+0xca7/0x2d20
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81247290>] ? I_BDEV+0x20/0x20
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8124d5f3>] __blockdev_direct_IO+0x43/0x50
Apr 28 00:39:15 vnode19 kernel: [<ffffffff812479d8>] blkdev_direct_IO+0x58/0x80
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81190a3d>] generic_file_direct_write+0xad/0x170
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81190bc2>] __generic_file_write_iter+0xc2/0x1e0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81247cd0>] blkdev_write_iter+0x90/0x130
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81247c40>] ? bd_unlink_disk_holder+0xe0/0xe0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8120d8a1>] do_readv_writev+0x1f1/0x2b0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff811308cf>] ? __audit_syscall_entry+0xaf/0x100
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8120d9e9>] vfs_writev+0x39/0x50
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8120e9e8>] SyS_pwritev+0xb8/0xe0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b5fee>] entry_SYSCALL_64_fastpath+0x12/0x71
Apr 28 00:39:15 vnode19 kernel: INFO: task qemu-kvm:33661 blocked for more than 120 seconds.
Apr 28 00:39:15 vnode19 kernel: Not tainted 4.4.51 #1
Apr 28 00:39:15 vnode19 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 28 00:39:15 vnode19 kernel: qemu-kvm D ffff8855341f3728 0 33661 1 0x00000080
Apr 28 00:39:15 vnode19 kernel: ffff8855341f3728 ffff880166d642c0 ffff886916a4c2c0 ffff8855341f4000
Apr 28 00:39:15 vnode19 kernel: ffff880d40fc8c18 ffff880d40fc8c00 ffffffff00000000 fffffffe00000001
Apr 28 00:39:15 vnode19 kernel: ffff8855341f3740 ffffffff816b2425 ffff886916a4c2c0 ffff8855341f37d0
Apr 28 00:39:15 vnode19 kernel: Call Trace:
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b2425>] schedule+0x35/0x80
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b4c05>] rwsem_down_write_failed+0x1f5/0x320
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81343233>] call_rwsem_down_write_failed+0x13/0x20
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b44ad>] ? down_write+0x2d/0x40
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa06dfdfe>] __origin_write+0x6e/0x210 [dm_snapshot]
Apr 28 00:39:15 vnode19 kernel: [<ffffffff811918ae>] ? mempool_alloc+0x6e/0x170
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa06e0007>] do_origin.isra.14+0x67/0x90 [dm_snapshot]
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa06e0092>] origin_map+0x62/0x80 [dm_snapshot]
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa04acf8a>] __map_bio+0x3a/0x110 [dm_mod]
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa04ae73f>] __split_and_process_bio+0x24f/0x3f0 [dm_mod]
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa04ae94a>] dm_make_request+0x6a/0xd0 [dm_mod]
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81309826>] generic_make_request+0x106/0x1d0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81309967>] submit_bio+0x77/0x150
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81300deb>] ? bio_alloc_bioset+0x1ab/0x2d0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8124ccb7>] do_blockdev_direct_IO+0x2427/0x2d20
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81247290>] ? I_BDEV+0x20/0x20
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8124d5f3>] __blockdev_direct_IO+0x43/0x50
Apr 28 00:39:15 vnode19 kernel: [<ffffffff812479d8>] blkdev_direct_IO+0x58/0x80
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81190a3d>] generic_file_direct_write+0xad/0x170
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81190bc2>] __generic_file_write_iter+0xc2/0x1e0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81247cd0>] blkdev_write_iter+0x90/0x130
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8120cf59>] __vfs_write+0xc9/0x110
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8120d5b2>] vfs_write+0xa2/0x1a0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8120e537>] SyS_pwrite64+0x87/0xb0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b5fee>] entry_SYSCALL_64_fastpath+0x12/0x71
Apr 28 00:39:15 vnode19 kernel: INFO: task dmeventd:33781 blocked for more than 120 seconds.
Apr 28 00:39:15 vnode19 kernel: Not tainted 4.4.51 #1
Apr 28 00:39:15 vnode19 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 28 00:39:15 vnode19 kernel: dmeventd D ffff8803493b7af8 0 33781 1 0x00000080
Apr 28 00:39:15 vnode19 kernel: ffff8803493b7af8 ffff880166da1640 ffff880b15a50000 ffff8803493b8000
Apr 28 00:39:15 vnode19 kernel: ffff880d40fc8c18 ffff880d40fc8c00 ffffffff00000000 fffffffe00000001
Apr 28 00:39:15 vnode19 kernel: ffff8803493b7b10 ffffffff816b2425 ffff880b15a50000 ffff8803493b7b98
Apr 28 00:39:15 vnode19 kernel: Call Trace:
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b2425>] schedule+0x35/0x80
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b4c05>] rwsem_down_write_failed+0x1f5/0x320
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81343233>] call_rwsem_down_write_failed+0x13/0x20
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b44ad>] ? down_write+0x2d/0x40
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa06df172>] snapshot_status+0x82/0x1a0 [dm_snapshot]
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa04b51a6>] retrieve_status+0xa6/0x1b0 [dm_mod]
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa04b6363>] table_status+0x63/0xa0 [dm_mod]
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa04b6300>] ? dm_get_live_or_inactive_table.isra.3+0x30/0x30 [dm_mod]
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa04b6015>] ctl_ioctl+0x255/0x4d0 [dm_mod]
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81095806>] ? __dequeue_signal+0x106/0x1b0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81095a1b>] ? recalc_sigpending+0x1b/0x50
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa04b62a3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81220872>] do_vfs_ioctl+0x2d2/0x4b0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff811308cf>] ? __audit_syscall_entry+0xaf/0x100
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81220ac9>] SyS_ioctl+0x79/0x90
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b5fee>] entry_SYSCALL_64_fastpath+0x12/0x71
Apr 28 00:39:15 vnode19 kernel: INFO: task dd:33790 blocked for more than 120 seconds.
Apr 28 00:39:15 vnode19 kernel: Not tainted 4.4.51 #1
Apr 28 00:39:15 vnode19 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 28 00:39:15 vnode19 kernel: dd D ffff885238e1f828 0 33790 33746 0x00000080
Apr 28 00:39:15 vnode19 kernel: ffff885238e1f828 ffff883f77ce42c0 ffff884a64088000 ffff885238e20000
Apr 28 00:39:15 vnode19 kernel: ffff880d40fc8c18 ffff880d40fc8c00 ffffffff00000000 fffffffe00000001
Apr 28 00:39:15 vnode19 kernel: ffff885238e1f840 ffffffff816b2425 ffff884a64088000 ffff885238e1f8d0
Apr 28 00:39:15 vnode19 kernel: Call Trace:
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b2425>] schedule+0x35/0x80
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b4c05>] rwsem_down_write_failed+0x1f5/0x320
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81343233>] call_rwsem_down_write_failed+0x13/0x20
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b44ad>] ? down_write+0x2d/0x40
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa06e0d32>] snapshot_map+0x62/0x390 [dm_snapshot]
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa04acf8a>] __map_bio+0x3a/0x110 [dm_mod]
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa04ae73f>] __split_and_process_bio+0x24f/0x3f0 [dm_mod]
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa04ae94a>] dm_make_request+0x6a/0xd0 [dm_mod]
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81309826>] generic_make_request+0x106/0x1d0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81309967>] submit_bio+0x77/0x150
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8124d6ba>] mpage_bio_submit+0x2a/0x40
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8124e0b0>] mpage_readpages+0x130/0x160
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81247290>] ? I_BDEV+0x20/0x20
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81247290>] ? I_BDEV+0x20/0x20
Apr 28 00:39:15 vnode19 kernel: [<ffffffff811e0428>] ? alloc_pages_current+0x88/0x120
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81247add>] blkdev_readpages+0x1d/0x20
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8119bfbc>] __do_page_cache_readahead+0x19c/0x220
Apr 28 00:39:15 vnode19 kernel: [<ffffffff810b4c39>] ? try_to_wake_up+0x49/0x3d0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8119c175>] ondemand_readahead+0x135/0x260
Apr 28 00:39:15 vnode19 kernel: [<ffffffffa04ae0aa>] ? dm_any_congested+0x4a/0x50 [dm_mod]
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8119c30c>] page_cache_async_readahead+0x6c/0x70
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81190748>] generic_file_read_iter+0x438/0x680
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81215e79>] ? pipe_write+0x3d9/0x430
Apr 28 00:39:15 vnode19 kernel: [<ffffffff81247da7>] blkdev_read_iter+0x37/0x40
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8120ce56>] __vfs_read+0xc6/0x100
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8120d45f>] vfs_read+0x7f/0x130
Apr 28 00:39:15 vnode19 kernel: [<ffffffff8120e2d5>] SyS_read+0x55/0xc0
Apr 28 00:39:15 vnode19 kernel: [<ffffffff816b5fee>] entry_SYSCALL_64_fastpath+0x12/0x71
I assume you've ruled out actual physical disk problems.
I also assume you're making sure the host and none of the VMs have overlapping VG names. That could cause craziness like you're describing.
What you're seeing sounds like "uninterruptible sleep" where the box thinks it's waiting for IO, and nothing can change that. Kill -9 won't even do it. I used to see this with tape backups. I've seen it more recently while doing stupid things, like mounting a VMs LVM on the host and forgetting to unmount it while running the VM. That's always fun.
The most useful tool I've found for a situation like you're describing is
dmsetup
. It lets you manually un-fuck LVM. I don't know if this will get you out of an uninterruptible sleep situation.Another possibility is that you're using slow disks, and something really does take longer than 120 seconds.
I use disk files ala qemu-img rather than LVM. I used to use LVM like you're describing in Xen, but never had any problems that I didn't obviously cause myself.
-Dylan