Even after reading on Data Scrubbing on Wikipedia, I am still not clear on what Data Scrubbing really is when the term is used for database.
Is it a formal engineering principal that there is a pre-defined way to perform data scrubbing? If so, what is the keyword I should research for?
-- or --
Is it a general or a loose term for simply cleaning inconsistent data in database?
What IS Data Scrubbing?
In a database context, it's correction of data which is consistent with the schema but erroneous on a higher level, e.g. invalid credit card numbers and SSNs, duplicate records, format mismatches, and so on.
It is a general, loose term that only acquires specific meaning in a particular case context.
I have created "Data Scrubbing" routines to periodically check and fix database problems that may not be practical to check in real-time (i.e. check for errors, inconsistencies, or duplicates as the data is entered). A scrubbing routine can fix specific types of errors such as checking that zip code entry matches the city/state or maybe look for variations of a customers name (duplicate customer), given the address.
Sometimes when a database is de-normalized (for performance reasons), a scrubbing routine can check the database during "off-peak" times to make sure the data remains consistent.