The setup
We have a Debian Linux set up with MySQL v5.1.73 (innoDB storage engine) and puppet-dashboard version 1.2.23. As you probably guessed, puppet-dashboard is using MySQL as its backend.
Also, it shouldn't be relevant but this a VMware virtual machine on vSphere 5.5.
The problem
The problem is that, despite the number of puppet nodes and run frequency staying relatively the same, the disk space used by MySQL keeps on increasing in a disturbing fashion to the point where it is now becoming an issue.
The following graph illustrates the issue.
We have put in place the two cron jobs that should allow disk space to be freed. They are the following and both run daily :
- rake RAILS_ENV=production db:raw:optimize
- rake RAILS_ENV=production reports:prune:orphaned upto=3 unit=mon
The drops you can see in the graph are the cron jobs running and eating up more space trying to free some space.
MySQL binary logs are not enabled. 95% of the disk space used on this server is located in the /var/lib/mysql/dashboard_production which is the directory where the MySQL data is stored.
We have had this issue before with a different application (Zabbix monitoring) and had to dump the DB and re-import in order to free up space. This was a very painful process and not a very elegant solution but it ultimately worked.
Is there any way we can reclaim this disk space ? What we can do this stop this behavior ?
Edit 1
We are indeed using innoDB and we are not using configuration directive "innodb_file_per_table".
As requested by Felix, the output of the command is the following :
+----------------------+-------------------+-------------+
| table_schema | table_name | data_length |
+----------------------+-------------------+-------------+
| dashboard_production | resource_statuses | 39730544640 |
| dashboard_production | metrics | 643825664 |
| dashboard_production | report_logs | 448675840 |
| dashboard_production | timeline_events | 65634304 |
| dashboard_production | reports | 50937856 |
| dashboard_production | resource_events | 38338560 |
| glpidb | glpi_crontasklogs | 21204608 |
| ocsweb | softwares | 8912896 |
| ocsweb | deploy | 5044208 |
| phpipam | logs | 1269584 |
+----------------------+-------------------+-------------+
Also, I will be trying the reports:prune task without the "orphaned" option as mentionned as well as the other alternatives and will keep this question updated.
Edit 2
I ran the reports:prune rake task and, despite deleting 230000 reports, it kept on eating more space... I will therefore move on to the other options.
The solution
After deleting two thirds of the entries in the database, it only freed up 200MB of disk space which is senseless. We ended up dumping the content and re-importing it taking care to enable "innodb_file_per_table".
We will just have to wait and see if this fixes the solution in the long term but it seems to be the case for the moment.
I found this article which seems to address the issue pretty well
http://ximunix.blogspot.co.uk/2014/01/howto-cleanup-puppet-reports-and-db.html
posted by Ximena Cardinali
The short story is start deleting reports in small batches and then reclaim the space from MySQL
HOWTO Cleanup Puppet Reports and DB
If the database for Puppet Dashboard is using several GB and getting larger everyday, this is a way to get some of the space back.
There are two rake jobs you should be running everyday as part of daily maintenance for Puppet Dashboard.
You can change the RAILS_ENV and number of day (day), weeks (wk), months (mon), etc to match your system and its needs.
Stop incoming reports:
cd /path/to/puppet-dashboard
env RAILS_ENV=production script/delayed_job -p dashboard -m stop
Start deleting reports in small batches
Keep working your way in towards the length of time you want to keep reports for. The reason for this is Innodb tables have poor performance when deleting more than 10k rows at a time. If you try to deleting a few hundred thousand rows, it will timeout and you'll have to break it up into smaller deletes anyway. Also the Ruby rake process will use probably use all your RAM and likely get killed off by the kernel before it finishes. Something like this progression should work for most people, but if you have many months of data you may want to start with a month or two of your earliest records. In our case, we are keeping just 2 weeks reports (14 days).
There are two methods to reclaim space depending on how MySQL was configured. Run this command to determine if "innodb_file_per_table" is enabled. It should be set to "ON" if it is. NOTE: I recommend to use innodb on your MySQL for cases like this one.
You can also do a listing of the database to see if there are larger data files. The table most likely to be large is resource_statuses.ibd.
If MySQL was configured with innodb_file_per_table and your Dashoard DB shows that your data is in large table files, do the following:
This will create a new table based on the current data and copy it into place. If you do a listing while this is in progress you should see something like this:
And when it finished it'll copy the tmp file into place. In this case we went from 15GB to 708MB.
If your system was not configured with innodb_file_per_table or all the current data resides in a large ibdata file, the only way to reclaim space is to wipe the entire installation and reimport all the data. The overall method should be something like: First configure innodb_file_per_table, dump all the databases, then stop Mysql, delete /var/lib/mysql, run mysql_install_db to create /var/lib/mysql again, start MySQL, and finally reimport the data. There will be no need to the optimize steps because of the data import.
Finally, Restart the delayed_job:
cd /path/to/puppet-dashboard
env RAILS_ENV=production script/delayed_job -p dashboard -n 2 -m start
Daily Reports Cleanup and DB Maintenance:
For a daily Reports Cleanup you can create a simple BASH script who search the Reports on /var/lib/puppet/reports by time (mtime +14 in our case), remove them and then cleanup the DB with (upto=2 unit=wk) and set it in your crontab. An example of the script can be: