I have several tables that are 10's of millions of rows, and have multiple indexes. When dumping and reloading the data it takes quite a while to reinsert because of the indexes, or to insert without indexes then re-index.
I understand that an index gives a database engine a quick way to locate a record on disk, and since a database dump and reload would or could effectively change the location of data on the disk, the index could potentially be invalidated. However, it still seems like you should be able to optimize a table by contiguously and sequentially locating all it's records on disk, then just have the index refer to a start location plus offset. Has anyone anywhere done any work on this type of thing?
Desired functionality: I could dump the data raw, load it in a sequential uninterrupted disk area, then just load in the indexes and tell the engine to update with the new table start position. After googling around on this, I see nothing...
MySQL stores the raw data and index files in the data directory; there's usually 3 of them per table, each named with the same name as the table and a different file extension, and all contained within a folder bearing the name of the database itself. These files can be copied elsewhere for your backups (and can be compressed and/or encrypted, based on your needs) just like any other file, HOWEVER you must either stop your database or acquire a read lock on all tables -- if you copy a table's files while data is being written to it, your backups will likely be corrupted and unusable! (Alternatively, stop MySQL or acquire the locks, take a snapshot (if you're lucky enough to be on LVM), then re-start/release the locks while you take the backup.)
When restoring from backup with these files:
I've used this procedure many times to backup and restore databases on MySQL, versions 3, 4, and 5, with no issues. Probably a good idea to run a check on each restored table just in case, however.
Important: This procedure will only work if you restore to the same major version as you backed up from! That is, if you make a copy of a MySQL 5 database in this manner, you must restore to MySQL 5. Going up (and only up) by 1 version (e.g. copy for 4, restore to 5) should work as well, but may require updating the file structure and thus may not be any faster/better than your method of dumping them.
You may want to look into disk snapshots... and/or bulk transactions.
With most linux flavors systems you can use the disk manager to create a snapshot of the current file-system... and then backup/copy/move/do-whatever with the raw database files. The only down-side to this method is that the mysql server will need to be offline for a brief moment while the snapshot is created. This will ensure that you have a database that is correctly committed to disk. After you are done with the snapshot... you can simply delete the snapshot & move on. (no need to restart mysql or anything else) A restore process would be simply a matter of copying/moving the raw data files to the correct data directory & firing mysql up.
You can alternatively use "bulk transaction" commands with mysql in order to bypass all consistency checks & so-on. With most bulk-transactions, there will still be some period of lag while it rebuilds the indexes. The advantage of this method is that there will only be 1 big consistency check at the end rather than 1 for each record being inserted.
When all is said & done, there really is no great method for dealing with databases over the 2 million record mark. This is where you begin to see the flaws in a relational database and start looking at nosql alternatives. (And yes, I do know there's a lot of arguments pushing both ways on this subject)