Just wondering what kind of processes you guys have for transferring production data offsite for analysis purposes?
I have read some of the literature but I am not sure of the overheads involved and the setup processes.
- Replication: With the production acting as both the distributor and publisher. I will employ a push system that updates my database on my analysis server weekly
- SSIS: No experience but seems to be the 'standard' way?
- Manual: Use windows scheduler to zip back my backup file and open up an FTP connection to my analysis server
- Mirroring: No experience but used more for high availability?
FYI, I am using SQL Server 2008 with Windows Server 2008 R2. MY DB is about 36 gigs. I am only interested in the data.
Other methods welcome!
*EDIT Additional information: I don't mind the database being read only and I am looking to run this process once a week.
You could look at log shipping, very similar to your 'manual' approach but uses built-in SQL components to do the work and works with transaction log backups.
A great explanation and comparision to db mirroring can be see at http://blogs.technet.com/josebda/archive/2009/04/02/sql-server-2008-log-shipping.aspx
As a mere suggestion, you say your db is 36 gigs. Depending on the compression + speed of transfer you might find it easier to dump it on a USB drive and snail mail it. Furthermore, some ftp will never allow files of > 4GB. Incremental pushes might be your way, but again consider the worst case: what happens if your data changes faster than your db can push it to the replicator?
I have a client who needed this exact thing. They couldn't accept the db being inaccessible when the logs restore so log shipping was out. Replication was not possible because their schema would be too much of a hastle. Mirroring wouldn't work because it's read-only, and they needed a writable database... but those writes did not have to go back to the main server.
I ended up writing a maintenance plan and a few scripts that does a full backup of the db in question (120 GB), then uses xp_cmdshell to copy the file to a network share on the other SQL server, and then lastly executes a job remotely (you can set what server to execute on in the plan step) to do a restore of the database.
You may want to check out Tara Kizer's isp_Restore script which can help you with this. http://weblogs.sqlteam.com/tarad/archive/2005/11/08/8262.aspx I ended up writing my own script because hers didn't do exactly what I needed, but it should at least get you started.
Log shipping is great for analysis, but be prepared for high overhead with the initial backup and transfer of the data to another server/instance. Additionally, log shipping implies a delay between live data and analysis data. After that, you're only moving transaction log backups, which are theoretically smaller.
In our environment, we log ship nightly, because management runs reports when the come in to the office in the morning, and those reports typically require a significant amount of pivots.
We had a 4GB database, where we infrequently needed a backup for analyzing issues. The database was too big to take a backup during the day while production was running. So our solution was to run a backup at night when our production system was down and keep the most recent backups (I think we kept one week).
One of our requirements was that the database needed to be shipped to different departments depending on the issue. Another reason was, that we could make the backup files available to Project Managers, who did not have direct access to any SQL server and they were in charge of sending the backups to the appropriate recipients. They didn't needed to involve Admins to get the backups.
The daily backups were scheduled using the standard SQL server tools.
Assuming your target analytic database is the same platform as your production database, the easiest solution is to do a weekly incremental backup and then apply it to your analytic database. If your database isn't growing very quickly, an incremental file should be transmittable in a reasonable period of time.
If you are transmitting to some other database type than what you are currently running in production, an ETL tool like Pentaho Data Integration might be a better fit. An ETL tool is an even better solution if you only care about some of the tables and not all of them.