Data warehousing – Concept

In the present day competitive & technology centric information age, business enterprises are realising that the enterprise data is valuable resource just like other resources of the enterprise such as man, machine, raw materials & money. The great value of data lies in the fact that enterprises could manipulate this data and create valuable information, which determines the enterprise’s workflow, management & its stability and sustainability in the competitive business environment.

We shall make the concept of Data warehousing clear in simple words. Consider the old OLTPs and other aging database systems as having a database like a storehouse where the data is dumped Eventhough there are separate shelves in this storehouse to store the data, the storehouse is not structured and the shelves themselves are not structured to store massive and distributed data (from other remote OLTPs) in structured manner so that required data can be searched and retrieved quickly and efficiently. As a result when we enter this storehouse to search for a data item (especially in case of analysis queries that require searching and linking of massive current, historic & distributed data), we have to search blindly & tediously inside the storehouse. We may get the data but with unwanted and less optimized searches through the data dumped in the storehouse. We may not get the data if the search is for a historic data which has already being removed from the OLTP database or if the search is for a data item which is not in the OLTP database but in a distributed remote database.

Now while applying Data warehousing, we first structure the storehouse as a warehouse which has specific shelves with specific names and further structure the shelves themselves with named partitions. We then apply other structuring and refining methods to construct the data warehouse and the related system architecture o that the Data warehouse is now able to store bulk of current, historic & distributed data in pre-determined and easily accessible warehouse partitions. The result is that –

  • We are able to store valuable enterprise data (both current and historic in bulk amounts in the structured warehouse partitions
  • Data is structured
  • Such a structuring of the data warehouse will enable us to search and retrieve the intelligent data as information, quickly and efficiently, while performing query analysis.
  • Data is integrated: Integration aspect of data warehouse is that data found within the data warehouse is integrated always with no exceptions. The integration is in consistent naming conventions, in consistent measurement of variables, in consistent encoding structures, in consistent physical attributes of data, location transparency of the enterprise data via distributed approach and so forth. The idea is that the data needs to be stored in data- warehouse in a singular, globally acceptable fashion even when the underlying operational systems (eg. OLPTs) store the data differently, so that the end user at the data warehouse front end is fed with integrated information.
  • Distributed OLTP data is centrally stored: The Data warehouse is capable of centrally storing the data loaded into it from the distributed routine OLTP systems of the enterprise. The advantage is that all relevant enterprise data is available at the data warehouse for centralized data analysis & decision making.
  • Data is non-volatile: This is the most important feature of a data warehouse. It means that the data in the data warehouse is never changed manually by a user It also means that the data is maintained non-volatile in the data warehouse for a specified historic period (may be for 15 or 20 years according to the storage capacity of the data warehouse). The data in data warehouse gets loaded in bulk amounts, daily during the off hours, from the operational systems with the intention to be stored in the data warehouse for the specified historic periods, The data is never deleted or changed by a user. But however we set proper updation routines to reflect any updations to those data values handled by the operational systems and which have already been loaded into the data warehouse. This is the basic concept of Data warehousing. We usually call the data warehouse database as the Data warehouse and the overall system architecture as the Data warehousing system.