Got a disturbing crash while performing 'apachectl stop'. General system:
$ uname -a
Linux www.example.com 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -d
Description: Ubuntu 14.04 LTS
Lots of spare capacity in disk, memory, CPU. This is an Amazon EC2 cloud instance, running today at 1PM 5/7/2014, region us-east-1a, medium-size instance with 3.7GB mem/2CPU. My other instances on the same VPC and same region were fine.
I read elsewhere that in today's kernels, you don't get a crash like this unless the hardware is failing. Seems unlikely that Amazon would have faulty hardware in the cloud?? Or am I being pollyannish?
Anyway, the dump from dmesg
(the system continued to operate by serving webpages and talking to the database, but new processes hung instantly, such as ps
and ssh
):
[27917995.400499] general protection fault: 0000 [#1] SMP [27917995.400515] Modules linked in: isofs crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd [27917995.400537] CPU: 0 PID: 1672 Comm: apache2 Not tainted 3.13.0-24-generic #46-Ubuntu [27917995.400545] task: ffff8800020117f0 ti: ffff88005f012000 task.ti: ffff88005f012000 [27917995.400551] RIP: e030:[] [] devpts_kill_index+0x13/0x60 [27917995.400564] RSP: e02b:ffff88005f013d58 EFLAGS: 00010286 [27917995.400568] RAX: dc73af5e3df7dcab RBX: ffff880003f30400 RCX: 0000000181000079 [27917995.400574] RDX: 00000000ffffffff RSI: 0000000000000002 RDI: ffff8800aab76ff8 [27917995.400579] RBP: ffff88005f013d68 R08: 0000000000000000 R09: 0000000000000001 [27917995.400583] R10: ffffea0003a01180 R11: ffffffff8144a320 R12: 0000000000000002 [27917995.400588] R13: ffff8800e87a8001 R14: 0000000000000002 R15: 0000000000000001 [27917995.400598] FS: 00007f8d8b320780(0000) GS:ffff8800ef600000(0000) knlGS:0000000000000000 [27917995.400605] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [27917995.400610] CR2: 00007f8d79aea7e0 CR3: 0000000001c0e000 CR4: 0000000000002660 [27917995.400616] Stack: [27917995.400619] ffff880003f30400 ffff880003f30800 ffff88005f013d78 ffffffff8144caa8 [27917995.400628] ffff88005f013d90 ffffffff81440e47 ffff880003f30400 ffff88005f013e38 [27917995.400636] ffffffff81443159 ffff880003f30610 ffff880003f30628 ffff880003f30630 [27917995.400645] Call Trace: [27917995.400656] [] pty_unix98_shutdown+0x18/0x20 [27917995.400662] [] release_tty+0x37/0x140 [27917995.400668] [] tty_release+0x4b9/0x600 [27917995.400678] [] __fput+0xe4/0x260 [27917995.400684] [] ____fput+0xe/0x10 [27917995.400693] [] task_work_run+0xc4/0xe0 [27917995.400701] [] do_exit+0x2ab/0xa50 [27917995.400708] [] ? vtime_account_user+0x54/0x60 [27917995.400717] [] ? context_tracking_user_exit+0x4f/0xc0 [27917995.400723] [] do_group_exit+0x3f/0xa0 [27917995.400729] [] SyS_exit_group+0x14/0x20 [27917995.400738] [] tracesys+0xe1/0xe6 [27917995.400742] Code: 0f 1f 84 00 00 00 00 00 48 83 c4 08 b8 fb ff ff ff 5b 41 5c 5d c3 66 90 66 66 66 66 90 55 48 89 e5 41 54 41 89 f4 53 48 8b 47 28 81 78 58 d1 1c 00 00 74 0b 48 8b 05 44 bf d7 00 48 8b 40 08 [27917995.400796] RIP [] devpts_kill_index+0x13/0x60 [27917995.400803] RSP [27917995.400811] ---[ end trace 5b24303912015285 ]--- [27917995.400815] Fixing recursive fault but reboot is needed!
The cloud is made up of hardware still so a hardware fault is entirely possible. If you suspect a hardware issues just stop and restart the instance. This should put you on a new host.