Data Warehousing Methodology

Steps to Data Warehousing and Business Intelligence Success

David Haertzen David Haertzen, Principal Enterprise Architect

This tutorial article will show you the steps needed to develop and deploy a data warehouse.  In addition it will show differences between DW Methodology and Traditional IT Methodologies.



The Data Warehousing Methodology is organized into the following phases:

  • Initiation : Evaluating Readiness and Opportunities
  • Analysis : Analysis and Requirements Determination
  • Design : Data Warehouse and Data Mart Models (Star Schema/Multidimensional Model)
  • Design : Technical Architecture
  • Design : Obtain Data Warehouse Inputs
  • Construct : Data Load
  • Construct : Presentation/Analysis Tools
  • QA : Test
  • Rollout : Deploy in Production
  • Iterate : Make Incremental Changes

Differences between DW Methodology and Traditional IT Methodology

Developing data warehouses is definitely different than developing other IT systems and so requires a different methodology.

Data Warehousing Methodology: 

  • Use of data is exploratory and less predictable
  • Multidimensional Modeling
  • Focus is on loading and presenting data

Traditional IT Methodology: 

  • Automated processes are repeated and predictable
  • ERD Data Modeling
  • Focus is on rapid on-line updating of data

Data warehousing is not simply creating a set of reports that are run periodically.  It involves questions that may lead to initially unpredicted places.

Initiation: Evaluating Readiness and Opportunities

First it is important to know if your organization is ready for data warehousing.  If there are business drivers, an appropriate project sponsor and an organizational culture that includes use of data for decisions and cooperation between the business and IT, then the organization is ready.

Business drivers are the business reasons for pursuing data warehousing or any other business objective.  Examples of business drivers include

  • Competitive advantage or survival
  • High potential revenue gain (need new markets, products, etc.)
  • High potential cost savings

The Project Sponsor is key to the success of the project.  Is the following true for your project sponsor?

  • Sponsor is aware of business drivers / benefits / savings
  • Sponsor has clout / authority / influence
  • Sponsor is part of the business / not IT only person
  • Sponsor has time to participate
  • Sponsor is knowledgeable about IT and DW

The organization needs a culture where:

  • Business and IT work together
  • Business uses data to make decisions and seeks more data

In addition, source data and information technology must be in place.  If your organization wants cross selling data in its data warehouse, it needs an operational system that tracks that information which can in turn be provided to the data warehouse.

Analysis: Requirements Definition

Requirements describe the needed solution in business terms.  In the analysis phase detailed Requirements for Data Warehousing are defined. 

 Requirements Diagram

Design: Technical Architecture

 Data warehousing Technical Architecture is the high level design for the data warehousing system.  It includes technical specifications for:

  • Metadata Management
  • Input Sources
  • Extracting
  • Middleware
  • Physical Storage and Operation
  • Database Management System
  • Mapping, Transforming, Enriching, and Loading
  • Communicating
  • Analyzing and Presenting
  • Managing, Operating, and Securing

One of the deliverables of this effort will be the Data Warehousing Technical Architecture Diagrams:

Technical Architecture

Design: Data Warehouse Modeling

The data warehouse  Data Models are visual representations of the databases that make up data warehousing system.  These models are useful for bridging from business requirements to the physical system that carries out those requirements.

This tutorial provides a mini-course in data modeling that shows approaches for each of the databases needed:

  • Data Mart : Dimensional / Star Schema Data Model
  • Data Warehouse : Entity Relationship Diagram
  • Staging : Entity Relationship Diagram

Orders Star Schema

Construct: Obtain Data Warehouse Inputs

A data warehouse system is only as good as its Input.  In this phase we select that data that will be included in the data warehousing system.  To learn how good the data is we use data profiling and data assessment.

Construct: Extract, Transform and Load (ETL)

In the ETL - Extract Transform Load phase we build Extracts which pull data from the data sources and Transforms which modify the data so that it is ready to be loaded into the data warehousing system. Then we build loads that place the data into the appropriate databases.

Construct : Presentation/Analysis Tools

In the phase where Construction is done for Presentation and Analysis, the groundwork is put in place for BI - Business Intelligence.  Metadata must defined so that data warehousing users can perform analytical functions such as:

  • Query from Multiple Dimensions
  • Drill Down - explore greater detail
  • Roll Up - summarize
  • Pivot - change query driections

QA : Test

To successfully implement a data warehouse, an organization must be confident that the analytics produced include the proper data.  The focus of the data warehouse Testing the Data Warehouse Phase is to ensure that the data in the data warehouse is correctly loaded from the source system(s).

Rollout : Deploy in Production

The Rollout the Data Warehouse phase includes:

  • Data and Application Readiness
  • Installation in the Production Environment 
  • Training

Iterate : Make Incremental Changes

After the data warehouse in put into production, our work has just begun.  In the Sustaining Data Warehousing and Business Intelligence Phase, the data warehouse is operated and incrementally improved. 

 

Copyright© 1999-2015, First Place Software, Inc.