Earlier this morning, store.exe fuzzled up in one way or another, which necessitated a restart of our Exchange server. It came back online with no errors or problems, all the transaction logs replayed successfully, and all the stores mounted as normal. To me, it was just one of those random crashes; however, our consultant suspects it was caused by corruption in one of the stores. Perhaps he's correct, since he has far more experience than me, but that's not the point.
To fix the suspected errors, he's planinng to run an ESEUTIL defrag (via PerfectDisk) to fix them, which he claims will also fix any errors present.
From what I understand, defrag, verify, and repair are 3 separate actions, and a defrag does not imply any kind of integrity check. Is this correct? Are there any dangers of running a straight-up defrag on a database that might be corrupt?
Edit:
Here's the first error in the event log, which indicated the start of the problems we were having. Anyone know what it might indicate?
Event Type: Error
Event Source: Microsoft Exchange Server
Event Category: None
Event ID: 1000
Date: 11/23/2011
Time: 8:15:47 AM
User: N/A
Computer: SERVER
Description:
Faulting application exsp.dll, version 6.5.7638.1, stamp 430e735b, faulting module kernel32.dll, version 5.2.3790.4480, stamp 49c51f0a, debug? 0, fault address 0x0000bef7.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 41 00 70 00 70 00 6c 00 A.p.p.l.
0008: 69 00 63 00 61 00 74 00 i.c.a.t.
0010: 69 00 6f 00 6e 00 20 00 i.o.n. .
0018: 46 00 61 00 69 00 6c 00 F.a.i.l.
0020: 75 00 72 00 65 00 20 00 u.r.e. .
0028: 20 00 65 00 78 00 73 00 .e.x.s.
0030: 70 00 2e 00 64 00 6c 00 p...d.l.
0038: 6c 00 20 00 36 00 2e 00 l. .6...
0040: 35 00 2e 00 37 00 36 00 5...7.6.
0048: 33 00 38 00 2e 00 31 00 3.8...1.
0050: 20 00 34 00 33 00 30 00 .4.3.0.
0058: 65 00 37 00 33 00 35 00 e.7.3.5.
0060: 62 00 20 00 69 00 6e 00 b. .i.n.
0068: 20 00 6b 00 65 00 72 00 .k.e.r.
0070: 6e 00 65 00 6c 00 33 00 n.e.l.3.
0078: 32 00 2e 00 64 00 6c 00 2...d.l.
0080: 6c 00 20 00 35 00 2e 00 l. .5...
0088: 32 00 2e 00 33 00 37 00 2...3.7.
0090: 39 00 30 00 2e 00 34 00 9.0...4.
0098: 34 00 38 00 30 00 20 00 4.8.0. .
00a0: 34 00 39 00 63 00 35 00 4.9.c.5.
00a8: 31 00 66 00 30 00 61 00 1.f.0.a.
00b0: 20 00 66 00 44 00 65 00 .f.D.e.
00b8: 62 00 75 00 67 00 20 00 b.u.g. .
00c0: 30 00 20 00 61 00 74 00 0. .a.t.
00c8: 20 00 6f 00 66 00 66 00 .o.f.f.
00d0: 73 00 65 00 74 00 20 00 s.e.t. .
00d8: 30 00 30 00 30 00 30 00 0.0.0.0.
00e0: 62 00 65 00 66 00 37 00 b.e.f.7.
00e8: 0d 00 0a 00 ....
An offline defragmentation using
eseutil
will fail if it encounters page-level corruption in the database because. You'd have to use the/p
option (rePair) to discard corrupt pages.Corruption of data at a logical level (think damage to the "data" in the database, not the "structure" of the database) cannot be repaired by
eseutil
. Theisinteg
tool can find logical inconsistencies in the database in versions of Exchange up to 2007. In Exchange 2010isinteg
was replaced with thenew-MailboxRepairRequest
cmdlet (more details are available on the Exchange Team blog).Having said all that, I'm concerned about your consultant's advice. Are you seeing events in the Application Event log from ESE or Exchange-related services that indicate any database corruption? In general, Exchange doesn't "randomly crash" and problem with a hardware driver or the hardware itself seems to be a more likely cause than a problem with Exchange. Further, since the logs replayed without issue I find it a bit unlikely that you're taking page-level corruption.
Finally, if you are taking page-level corruption just cleaning that corruption up isn't a solution. You need to find the root cause (faulty hardware, etc) that's causing the corruption and eliminate it. Doing anything else is just exposing you to continued risk.
The offline defragmentation isn't, by itself, a major risk. You must immediately take a full backup after the completion of the offline defragmentation because all prior incremental and differential backups cannot be restored (because the new database is just that-- a brand new database). Obviously, your server will be inaccessible to users during the defragmentation period, too.
I'd be researching what happened this morning in detail and coming to a root cause conclusion (or at least a very likely hypothesis) before I started spending money "fixing" it.
I had a recent case where an Exchange Server 2003 machine wouldn't take VSS snapshots and reported various JET errors during attempted backups. I opted to spin up a new Exchange installation on another machine, move all the user mailboxes over to the new machine, then delete the problematic database on the original server and allow a new one to be created. After we did some stress testing and verified that the original server was functioning properly we moved all the mailboxes back. I'd prefer that strategy in the situation you're describing (if I had sufficient Event Log messages that indicated real "corruption" in the original Exchange Server computer's mailbox database).
Edit:
The entry you posted above is a fault in the Exchange provider for Microsoft Search (which makes full-text indexes of Exchange databases). I'd be interested to see more of what happened afterward, as well as any events immediately preceding this one from the System Event Log. Did you have a low disk space condition on any of the server computer's volumes?