|
|
||||
| Overview Training Data Cleansing Implementation |
Data cleansing is the single largest issue facing any data warehouse, business intelligence or business performance management project in both its scope and level of difficulty. Many a project has failed to meet expectations because of data problems. Data issues can be grouped into two large segments - erroneous data in source systems and data integrity constraint violations. The former can result from data entry or faulty code and the latter is often from application or database design. In either case “bad” data has to be identified before it can be fixed. Typical transactional source systems are built on a relational database technology invented during the early 1970s, which strives to minimize the amount of stored data by splitting the information up into multiple tables that are related. The advent of relational databases brought with it two new concepts called “integrity constraints” and “5 levels of normalization”. Integrity constraints ensure that data stored in one table (such as, an invoice) always makes a valid reference to another table (such as, a valid customer stored in a second table). The five levels of normalization measures the level of data redundancy with level 5 being the least amount of duplicated information. Database designers typically strive to achieve third normal form but will regularly drop to second normal form to improve database performance. The data problems often appear when the integrity constraint is violated thus creating inconsistencies between tables within a single operational system or inconsistent data is stored in two or more separate systems. These types of data problems necessitate that data be cleaned before it can be successfully loaded into virtually all Business Intelligence tools. Here’s where the power of the ISIS Data Array comes in as it is not built on a relational database. ISIS takes a copy of any table from any source system and immediately denormalizes all data. The denormalized data is mapped into the Data Array where relations can then be developed. All the data that does not follow the relation is immediately identified via the ISIS Site Maps. Further analysis is done via ISIS statistical calculations that immediately find and segment the data that falls outside its established “norm”. As such, ISIS can be an effective tool as a prerequisite for data cleaning in large data warehousing and business intelligence projects. |
|||
| Copyright © 2010 ISIS Solutions Inc. | ||||