I have an assignment to build a development virtual machine template.
I need to add to the server MySQL databases from each one my company's products so that new developers will have the ability to develop for each one of the products.
Most databases are smaller than 1GB.
But I have a few databases which are huge (around 160G) - but I'm limited in the size of the template I'm trying to create, and it's not reasonable to add there a DB of 160GB.
Therefore, I'm trying to find the right method to extract, let's say, last month's values from this huge database and add them to the server so the developer will be able to "feel" what it's like to work on this database.
Is it possible to do such a thing and how is it done? Thanks!
Edit:
Unfortunately, there is no option for a separate main DB server which will hold all development databases, also there's no requirement to update the data regularly, I just need to present the same data as in production servers (at a random time period) as a sandbox environment for new developers.
If I understood correctly you are planning to make a separate DB copy for each development environment.
While this might be feasible with small databases, it won't work that well with big DBs. So unless you have a really good reason to setup a separate DB for each environment it might be better to consider having a single copy of development database and setting all development environments to use it.
This approach will allow you to refresh development DB with the latest data periodically and if someone messes it up you can just refresh it again.
Also imagine the situation where your developers start to work on some new project which requires new tables to be created. If you have a single copy of development DB you (or devs) will need to create those tables and fill them with test data just once. Now imagine devs realize that initial table structure is not optimal and needs to be changed. Again this will need to be done on a single DB a opposed to possibly tens of environments.
This is the approach I saw being used for big projects time and time again and most of the time it works pretty well.
This is extremely dependent on the kind of data in the database. In some cases, it might be as easy as
while in other cases, it's impossible to separate this because of the structure of the data. In the end, it will likely be a mix and very hard to get right.
We had recently the situation a customer wanted to extract the last 30 days of a database. If ALL tables have the same attribute where you can define the datetime you can do a
mysqldump --where=' datetimefield > "2014-06-28"'
but he wanted to mix different tables with old and new data. So this was no solution for him but it might be one for you?