I'm looking for a way to flush all pending writes to disk in Windows, then buffer all future writes until a command gives the go-ahead.
I'd like to flush SQL Server writes and Windows system writes, then buffer both.
To be clear about what I'm doing:
I use Windows 2008 R2 and SQL Server 2008 R2 on EC2. I run an hourly snapshot on the drive these reside on. When nothing critical is changing, these snapshots come out just fine - but every now and again I get a bad snapshot. Worst-case scenario, in the event of drive failure (EBS failure technically), if I have 3 bad hourly snapshots I've lost 4 hours of data.
The snapshots already solve the fast, differential backup, and fast, easy restore - so all I'm looking for is a way to flush everything so it's in a consistent state on disk, then suspend all writes until the snapshot is complete. I'm happy to write some code for a service I call into to make this happen, but I need to know what APIs/commands I need to write against in order to get the task done.
I'm aware that I could create a separate volume which I continually run Windows Backup to and then snapshot that, but that significantly lengthens the backup process and feels like a hack. I know Windows and SQL Server are both very good at buffering writes, so this seems like something I should be able to accomplish in-place.
Ideas?
An option for you is to leverage Shadow Copy in Windows. The Shadow Copy process itself drains any write-buffers before committing its own snapshot. Schedule it to take a snapshot a minute or two before the EC2 snapshot. That way when the EC2 snapshot fires you have a recent, consistent copy already in the system.
I can't answer about how to lock the disk so Windows doesn't write or force the flush so this will be a half answer, but there is one relevant piece of information your question doesn't note:
The ec2-snapshot process is instantaneous. You don't have to wait for it to finish! Once you start it, you can start using your disk again and the snapshot will still get the data at the point of the snapshot, not the current state. This happens because once you tag it for snapshotting, all NEW data written to the disk is queued in a kind of overlay disk and the data at the time of you asking for a snapshot is preserved.
That should make it easier for your other problem. All you need to do is flush to the disk to get everything consistent, then pause for a couple seconds while you make sure the snapshot API returned a success code so you know the process kicked in, then you can go back to your writing.
I'm going to add a separate answer to address some of your comments. VSS is a good way to go if you're trying to quiesce the data. In fact, MSFT DPM uses that subsystem to great effect with SQL from what I understand. Of course they had dedicated teams to hash things out.
VSS won't screw SQL up, but it can (and has) interfered with SQLs native backup methods (see my answer to a question here for some more insight). You need to be very aware of what you are trying to accomplish, what VSS is doing, and whether or not the two are coinciding.
You can write something to call the VSS APIs natively (don't ask me how, I'm not a developer), or you can use something like vshadow.exe to make and manipulate shadow copies for you. Vshadow.exe is available in the MSFT SDK; make sure you get the right version for your OS.
Depending on your SQL recovery model, SQL will react differently when VSS is called. It's been a while since I've dealt with the nitty gritty of SQL and VSS, but from what I remember if you have Simple Recovery Model set, the logs will truncate when VSS is called. In Full and Bulk-Logged, this will not be the case and you'll need to do something to manage the size of the logs.
And you probably know this, but test, test, test. Your snapshots are only good if you can actually restore from them.
I'm not an EC2 expert by any means but, outside of EC2, if you want to create consistent backups using storage snapshots you have to quiesce the filesystem before the backup is taken.
The SQL Writer Service must be running (see http://msdn.microsoft.com/en-us/library/ms175536.aspx for more information on the SQL Writer Service), and then the backup would use the VSS COM APIs to freeze IO, take the snapshot then thaw it again (lots of detail here: http://msdn.microsoft.com/en-us/library/aa384589%28v=vs.85%29.aspx).
SAN vendors generally integrate calls to the VSS API into their snapshot routine and this would be done via an agent running on the server.
I can't answer whether EC2 actually does this call for you, or whether it is something you would need to write yourself (use the VSS APIs to freeze IO, then call the EC2 API to take the snapshot), but hopefully my answer gives you a few pointers.
AutomatiCloud does exactly what you are looking for. It has an optional VSS-agent that uses the VSS snapshot-provider delivered by MS-SQL, MS-Exchange or others.
During backup it freezes/redirects I/O on the disk(s), triggers the EC2 snapshot and then allows I/O again.
The whole process is controlled in an easy to use windows GUI without any need for scripting.