Occasionally, we run out of rollback segment space causing the query/update to fail. What are the best practices that I can expect from an Oracle DBA to ensure that this is monitored pro-actively and the appropriate measures taken before trouble occurs?
What version of Oracle are you using? Are you actually using manual rollback segments? Or are you using automatic UNDO management? If you're using a moderately recent version of Oracle (9i or later), you really ought to be using automatic UNDO management.
Assuming you're using automatic UNDO management, you really just need to know how much UNDO your database generates per unit of time and how long the longest query you need to execute should run (or the furthest back that a flashback query needs to go). You then just need to set the
UNDO_RETENTION
to whatever the longest query runtime should be and size theUNDO
tablespace to hold all the undo generated during that interval (though the specifics depend on the particular Oracle version).From there, it's relatively easy for the DBA to monitor changes in the rate of UNDO retention. It's possible but potentially harder for the DBA to monitor changes in the longest running "real" query, however, because the production database somewhat frequently encounters the dreaded runaway query where someone tries to get a report and accidentally does a Cartesian join or some other hideously costly operation that takes a long time to run. It may be impossible for the DBA to tell whether there are "real" long-running queries that are taking longer or whether the long-running queries are really just mistakes. It's generally easier for the application developers to collaborate with the DBAs to determine when queries are supposed to be running longer. This may be something the developers just know (i.e. because you're deploying a new report that you expect to run for a particularly long time) or something that developers log somewhere and review with the DBAs periodically.