Data cleansing allows for the review of incoming data sources to optimize the quality of the data in the context of the business objectives associated with the Jabatix implementation. This is an important part of Extract-Transform-Load (ETL) projects, and of the Transform step in particular.
Jabatix Server makes available some preconfigured tools to make data cleansing easy and efficient. Additional functions can readily implemented in Cantor scripts.
Typical methods used in data cleansing:
- Select one destination format for numeric fields, and convert all sources sporting another format to this agreed destination format
- Interpret mixed-format and text fields according to a “translation” reference and make all incoming data uniform according to the agreed standard format; sometimes, some form of interpolation can be used
- Remove unreadable or incorrect data that cannot be corrected
- Remove duplicates within and across data sources
- Normalize the data for one time/date zone
- Request the data provider to send corrections for unusable data