When doing data warehouse bulk data loads into a table for which I don't require logging, typically my process is to:
- truncate table
- drop indexes
- insert /*+ append */
- re-create indexes without logging
This works fine but dropping and re-creating indexes can causes problems so I was wondering whether folks thought this step was necessary. I understood that from 10g onwards, such index creation was optmised so that the indexes were re-built following the data load itself.
Anyone have any experience of this?
you say that dropping and re-creating indexes can create problems. Another way to do it is the mark the indexes unusable and then rebuild them. This way their definition (name, columns, tablespace etc) is never "lost", even if some step fails.
So, in step 2 loop over all the indexes of the table and run:
ALTER INDEX my_index_name UNUSABLE;
and in step 4 loop over the indexes and rebuild. You can specify nologging, the desired parallel level, index compression etc:
ALTER INDEX my_index_name REBUILD NOLOGGING PARALLEL 16;
It is more efficient to build indexes in one clean sweep then "row by row". In the first case Oracle does one full table scan, sorts and writes the index. If the index is maintained during the bulk load, it gets updated all the time during the load, so it needs to constantly update its leaf block all over the place, doing more work and I/O. Why? because each index block will get updated several times during different points in time of the load.
BTW - hopefully, you have a "step 5" - collect statistics on table and indexes (in parallel)
We do almost the same way except that we bulk load to temporary staging tables and then transfer to the actual table. I think creating indexes after data loading is a quite normal ETL step and shouldn't cause any problem. It takes some time, but without indexes it goes much more slower. We've never faced to any issues creating indexes on already filled in tables.