3.4.2 Data warehousing

Transaction processing systems contain large quantities of data that can be analysed to improve the organisation’s performance. Past trends can be examined, the current state of the organisation’s finances can be analysed and information can be used as evidence to assist decision makers.

Such analysis can be performed on the operational data or on a data warehouse.


A data warehouse is a database that collects information from different data sources. 

Data gathered in real-time transactions can be used for analysis in an efficient manner if it is stored in a data warehouse. 

A data warehouse provides data that is consolidated, subject-oriented, historical and read-only:

  • Consolidated
    • Data is organised using consistent naming conventions, measurements, attributes and semantics. 
    • Organisations can use similar data in different formats. 
    • For example, true or false data can be represented as one/zero, on/off, true/false, or positive/negative. 
    • Data in the data warehouse is stored in a single, acceptable format.
    • Data warehousing allows data from across the organisation to be effectively used in a consistent manner.
  • Subject-oriented. 
    • A large amount of data is stored across an organisation. 
    • Some of this data is irrelevant for executive reporting and makes querying the data difficult. 
    • A data warehouse organises only the key business information from operational sources so that it is available for analysis.
  • Historical. 
    • Real-time transaction processing systems represent the current value at any moment in time for various aspects of the business, such as the stock inventory. 
    • They do not show the inventory at some time in the past.
    • Querying the stock inventory a moment later may return a different response.
    • However, data stored in a data warehouse is accurate for a specific moment in time, as it represents historical information and cannot change. 
    • The data warehouse stores a series of snapshots of an organisation’s operational data generated over a long period of time.
  • Read-only. 
    • After data has been moved to the data warehouse successfully, it does not change unless the data was incorrect. 
    • The data stored in a data warehouse represents a particular point in time; it must never be updated. 
    • Deletes, inserts, and updates are not applicable in a data warehouse. 
    • The only operations that occur in a data warehouse are loading and querying data.


Data Warehousing - An Overview

Data Warehousing - An Overview


Benefits of a Data Warehouse

Benefits of a Data Warehouse




Advantages of data warehouses include:
  • Old transaction data can be purged from the operational system and archived within the data warehouse. 
    • This improves the performance of the operational system, as less data needs to be examined during transaction processing. 
  • Analysis processes performed on the data warehouse do not degrade performance of the operational system. 
    • Data warehouses are generally maintained on their own hardware and software; hence they have no effect on the performance of the operational systems.
  • A data warehouse includes historical transaction data, often over 10 or more years. 
    • Systems change completely and are regularly upgraded, however data warehouses are designed such that all data is stored using a similar format. 
    • This common format greatly simplifies analysis processes.
  • Data warehouses are snapshot copies of the real data. 
    • This data does not and should not change. 
    • Therefore analysis processes can proceed more efficiently. 
      • There is no need to be concerned with record locks, ACID properties and data integrity issues.
  • Data warehouses centralise data from within the entire organisation. 
    • Commonly this includes customer, sales, employee, payroll, production, marketing and any other data created within an organisation. 
    • Having all such data in a central repository means analysis can take place across the entire organisation.
  • As a data warehouse is completely separate to the operational data it can be organised differently to the operational data. 
    • For instance indexes can be created on particular fields to improve the performance of analysis processes without risk of degrading the operational system’s performance.


Comments