Infogoal Logo
GOAL DIRECTED LEARNING
Master DW

Data and Analytics Tutorial

Data and Analytics Overview
Under Construction

Data and Analytics Success

Data and Analytics Strategy
Project Management
Data Analytics Methodology
Quick Wins
Data Science Methodology

Requirements

BI Requirements Workshop

Architecture and Design

Architecture Patterns
Technical Architecture
Data Attributes
Data Modeling Basics
Dimensional Data Models

Enterprise Information Management

Data Governance
Metadata
Data Quality

Data Stores and Structures

Data Sources
Database Choices
Big Data
Atomic Warehouse
Dimensional Warehouse
Logical Data Warehouse
Data Lake
Operational Datastore (ODS)
Data Vault
Data Science Sandbox
Flat Files Data
Graph Databases
Time Series Data

Data Integration

Data Pipeline
Change Data Capture
Extract Transform Load
ETL Tool Selection
Data Warehoouse Automation
Data Wrangling
Data Science Workflow

BI and Data Visualization

BI - Business Intelligence
Data Viaulization

Data Science

Statistics
Descriptive Analytics
Predictive Analytics
Prescriptive Analytics

Test and Deploy

Testing
Security Architecture
Desaster Recovery
Rollout
Sustaining DW/BI

Data Warehouse Modeling Techniques

Business Intelligence and Data Warehousing Data Models are Key to Database Design

A data model is a graphical view of data created for analysis and design purposes. Data modeling includes designing data warehouse databases in detail, it follows principles and patterns established in Architecture for Data Warehousing and Business Intelligence.

If you need to understand this subject from the beginning check the article, Data Modeling Basics to learn key terms and concepts.

Data warehouse modeling includes:

  • Top Down / Requirements Driven Approach
  • Fact Tables and Dimension Tables
  • Multidimensional Model/Star Schema
  • Support Roll Up, Drill Down, and Pivot Analysis
  • Time Phased / Temporal Data
  • Operational Logical and Physical Data Models
  • Normalization and Denormalization
  • Model Granularity : Level of Detail

Areas that require specialized patterns are:

  • Data Mart / Frontroom - uses dimension modeling - the ROLAP star schema or the MOLAP cube
  • Data warehouse / Backroom - uses normalized ERD
  • Staging / Landing Area - looks like source system

Data Modeling Tools

Data modeling involves visualizing data through use of graphical tools, so you will want to obtain a data modeling software package or use graphical capabilities in existing software. See the Data Management Center Data Modeling Directory for a list of data modeling tools and other resources.

We have used open source software to develop examples for this article so that readers will able to learn on their own without licensing fees. Data models have been developed using DBDesigner4 from FabForce.Net. The target database is MySQL from Oracle Corporation.

Data Modeling for Business Intelligence

It is best to organize data to best meet the needs of its users.  Business intelligence commonly performs analytic operations on data such as:

  • Query by multiple criteria
  • "Slice and dice"
  • Drill Down
  • Roll Up

The "Dimensional Data Model" otherwise known as the "Star Schema" was developed by Ralph Kimball in the 1980s to support these business needs.  This approach has stood the test of time and is the recommended way to organize data for business query and analysis.

The two major table types of the Star Schema are the Fact and the Dimension.  The Fact contains quantitative measurements while the Dimension contains classification information.  Each Fact is surrounded by the Dimensions that provide context to it, given the appearance of a star.

Star Schema Badge

The Order Fact with dimensions is a classic example.  In this case the Order Fact measurers order quantity and currency amount.  Dimensions of Calendar Date, Product, Customer, Geo Location and Sales Organization put the Order Fact into context.

Star Schema Orders

 

This star schema supports looking orders like a cube, enabling slicing and dicing by customer, time and product.

Data Mart Cube

Surrogate Keys Improve Data Mart Efficiency and Performance

Surrogate keys, typical stored as integers, improve efficiency and increase performance.  Joins between facts and dimensions are faster with integers.  Indexes on integers are compact and provide rapid access.

Facts - the Data Mart Measuring Stick

Facts contain quantitative measurements.  They focus on the answering the questions: how much and how many.

The grain is a determinant of the level of detail of the data mart fact.  A fact can be fine grained and represent a single event or transaction or it can be course grained and aggregate measurements over a period of time.

 

Dimensions Put Data Mart Facts in Context

Dimensions enable business intelligence users to analyze data using simple queries.  They focus on questions of: who, when, where and what.  Typical dimensions include:

  • Time period / calendar
  • Product
  • Customer
  • Household
  • Market Segment
  • Geographic Area

The primary key of a dimension should a surrogate key and is typically an integer.  This primary key is then related to facts to put the facts into context.  Much of the data in a dimension is descriptive and stored character format.  It often contains both code and expanded values such as territory_code and territory_name to simplify and speed up query.

Time Dimensions are an important part of almost every dimensional model.  We recommend that you establish Time Dimension tables rather hard coded date logic.
Use the finest grain - probably daily and account for days, weeks, quarters, seasons, holidays, etc.

Time Dimension Data Model

 

Avoid Data Mart Snowflakes

Dimensions are directly related to facts to enable simple and rapid query.  The snowflake is an extension to a dimension intended to reduce storage and duplication.  It has the undesirable side effect of complicating and slowing queries. 

Bridge Tables Implement Data Mart Hierarchies

The bridge table supports a many to many relationship between facts and dimensions.  For example, a bridge table could show the percentage commission split between multiple sales reps (dimension) and a sale (fact).

Data Modeling Slowly Changing Dimensions

At times, dimension data must change and that must be handled in the data mart.  Ralph Kimball has identified the following slowly changing dimension (SCD) types that are widely recognized in data mart design:

 

SCD Type 1

Data is overwritten and prior data is not retained.

SCD Type 2 A new row with the changed data.
SCD Type 3

Update attributes with in the dimension row.  For example, we could maintain both current customer status code and prior customer status code.




Advertisements

Advertisements:
 


Infogoal.com is organized to help you gain mastery.
Examples may be simplified to facilitate learning.
Content is reviewed for errors but is not warranted to be 100% correct.
In order to use this site, you must read and agree to the terms of use, privacy policy and cookie policy.
Copyright 2006-2020 by Infogoal, LLC. All Rights Reserved.

Infogoal Logo