I have a fairly large dataset which I need to import into MySQL. I'd like to do so as efficiently as possible as I am going to need to do this repeatedly. Are there tricks for this? Is there some way which is more efficient than extended INSERT statements?
To address the questions in the comment, the original format is in XML, though I already have a transformer to put it into CSV format. I'm working with MySQL 5.1.x and the target database is InnoDB. There are relationships between the data, but I do not have any constraints and the data isn't clean enough in most cases to impose them.
Try mysqlimport you can import directly from CSV files.
I would read the mysql manual, I was thinking about INSERT AFTER , but LOAD DATA INFILE looks interesting "20 times faster"....
There's an open-source ETL tool called Pentaho Data Integration that has a mysql bulk load option (under the experimental folder in Design view). You can also quickly load data in a platform agnostic manner using their CSV Input and Table Output steps. If you need to do any processing on the data before you load it you can encode that logic within this program.
Use
/*!40000 ALTER TABLE tbl_name DISABLE KEYS */;
and/*!40000 ALTER TABLE tbl_name ENABLE KEYS */;
around the INSERT statements for each table.This will ensure that non-unique indexes are created after all rows are present, rather than attempting to build them piece-by-piece.
Even a bulk "insert ..." shouldn't take too long. I took a new JIRA database export of 200MB and loaded into a virgin database in approximately 30 seconds.
You want to use the LOAD DATA statement. If you don't have sudo privs you also want to use the LOCAL option so you can load the data from wherever you put it. LOAD DATA sucks in data at prodigious speeds on my Dell T5600, about a gigabyte every 2-3 minutes. I just finished an 8gb load of some 37 million records in 1h13m.