Running Centos 7 (various versions)
The rpm DB on my servers keeps getting corrupted. It seems like every few weeks I have to do an rpm rebuild on a server or two.
Where should I look to see what could be the culprit? I know how to fix this when it happens, but how can I identify if this is a specific package I'm installing or if something is triggering this?
There's an endless row of bugs where BDB environment getting corrupted, some of which have been BDB bugs (several found just in the last couple of years) that have been patched in Fedora/RHEL libdb but upstream BDB 5.x does not have, dunno about 6.x but there you run into the licensing side. This one is well know issue that has no permanent solution.
Root Cause:
If rpm or yum does not exit cleanly the lock files are left behind. The files (__db001 - __db005) are left behind in /var/lib/rpm. We can see the pid that left the files with. The problem tends to be that we have no logs or audit configure for what actually killed the process. The most common reason being an automation tool timed out and abruptly ends the process without letting rpm clear the lock files.
One possible workaround is to force use of private environment. That also means practically no locking, but at least it means queries will not corrupt anything (however queries themselves could return garbage if run in middle of write-operation). That's what happens if you run queries as non-privileged user, but since you can control permissions with sandboxing you can achieve the same by disallowing open of /var/lib/rpm/.dbenv.lock, which causes rpm to fall back to a private environment - meaning it wont open, much less write to those __db.* files.
The developers statement is that it won't be fixed completely:
They provide a suggestion to use dcrpm utility.
You can download it from Git repo. The official guide is available at the same place.
Here is what you need to do for instalaltion:
After the installation you can run the tool and add it to cron:
Unfortunately the installation always failed for me on CentOS 7 because of python dependencies never installed properly.
This is despite psutil got installed successfully. But some other people reported dcrpm worked well for them, so give it a try.
I have used another official solution from Red Hat (RHEL 7).