Infogoal Logo
GOAL DIRECTED LEARNING
Master DW

Data and Analytics Tutorial

Data and Analytics Overview
Under Construction

Data and Analytics Success

Data and Analytics Strategy
Project Management
Data Analytics Methodology
Quick Wins
Data Science Methodology

Requirements

BI Requirements Workshop

Architecture and Design

Architecture Patterns
Technical Architecture
Data Attributes
Data Modeling Basics
Dimensional Data Models

Enterprise Information Management

Data Governance
Metadata
Data Quality

Data Stores and Structures

Data Sources
Database Choices
Big Data
Atomic Warehouse
Dimensional Warehouse
Logical Data Warehouse
Data Lake
Operational Datastore (ODS)
Data Vault
Data Science Sandbox
Flat Files Data
Graph Databases
Time Series Data

Data Integration

Data Pipeline
Change Data Capture
Extract Transform Load
ETL Tool Selection
Data Warehoouse Automation
Data Wrangling
Data Science Workflow

BI and Data Visualization

BI - Business Intelligence
Data Viaulization

Data Science

Statistics
Descriptive Analytics
Predictive Analytics
Prescriptive Analytics

Test and Deploy

Testing
Security Architecture
Desaster Recovery
Rollout
Sustaining DW/BI

Data Analytics Architecture Patterns

Data Architecture is a complex topic which may seem to be difficult to organize and learn. What can be done to enable an order approach? Part of the answer is to start a high level of detail (Level 0) and progress to a finer level of detail (Level 0):

  • Level 0: Database Symbol - a single disk symbol with a title of the database is used for high level diagrams. (This is an architecture level.)
  • Level 1: Ingest and Expose - a rectange showing input and output connectors. (This is an architecture level.)
  • Level 2: Inner Zones - major zones of functionality within a database
  • Level 3: Nested Inner Zones - zones within inner zones – a lower level of detail. (This is an architecture level.)
  • Level 4: Information Domains - information topics in scope for the database
  • Level 5: Logical Data Model and Procedure Signature - data model rendered as Entities, Attributes and Relationships plus description of procedures that includes procedure name and parameters. (This is a design level.)
  • Level 6: Physical Data Model and Procedure Specifications - data model rendered and tables, columns and foreign keys plus specification of procedures that includes detailed logic. (This is a design level.)

Another part of the answer is to apply consistent design patterns and principles to each database and/or database system as appropriate. This will result in fewer architectural components and enable team members to work on multiple databases. Databases and their associated applications tend to include the following architectural elements ("inner zones"):

  • Ingest: zone where data is input to the database.
  • Core: zone where subject content is stored. This is central focus of the database.
  • Expose: zone where data is made available outside of the database.
  • Process: zone where database processes are tracked and controlled.
  • Archive: zone where history data is stored.
  • Metadata: zone where data describing database content and structure are stored. Glossaries and Data Lineage are examples of data managed here.
  • Notify: zone where logs of events are stored.
What follows is a description of each architectural level along with a description of the inner zones. The Atomic Warehouse database is used an example.

Level 0: Database Symbol

The Level 0 diagram shows each database as a disk symbol which is the commonly used method of show databases on diagrams. The following diagram shows a single database.

Atomic Data Warehouse - Level 0

Level 0 diagrams often depict data flows between databases.

Data Flow - Level 0

Level 1: Ingest and Expose

The Level 1 diagram shows the paths data follows as input to and output from the database. In current data architecture terminology data is ingested - acquired from outside the database and it is exposed - provided to consumers outside of the database. Data flows from one database or systems Expose to the next system Ingest.

Atomic Data Warehouse - Level 1

These are examples of Ingest methods:

  • Services/APIs: a defined interface for interacting with a system. The external system calls the API to send data to the target database. APIs tend to be implemented using programming languages external to the database such as Java and Python. Message payloads are often rendered in XML or JSON.
  • Procedures: a procedural method for interfacting with a database. An external system calls a database stored procedure to provide data to the target database.
  • Views / Virtual: the target database obtains data through an exposed view of the source database or through a exposed view of a data virtualization layer such as: Dremio, Presto or Denodo.
  • Landing Zone: data is copied from a data source to an set of designated Ingest tables. The Landing Zone may be populated through tools such as: Extract Transform Load (ETL), Change Data Capture (CDC), Data Warehouse Automation (DWA) and webscraping.
  • Subscribe: data is acquired through a Publish-Subscribe (Pub-Sub) messaging system. Data might enter via a Message Business or output from a Data Pipe like Kafka.

These are examples of Expose methods which tend to match up to Ingest methods:

  • Services/APIs: a defined interface for interacting with a system. The external system calls the API to send data to the target database. APIs tend to be implemented using programming languages external to the database such as Java and Python. Message payloads are often rendered in XML or JSON.
  • Procedures: a procedural method for interfacting with a database. An external system calls a database stored procedure to provide data to the target database.
  • Views / Virtual: the target database obtains data through an exposed view of the source database or through a exposed view of a data virtualization layer such as: Dremio, Presto or Denodo. This might be thought of as a Virtual Drop Zone.
  • Drop Zone: data is copied from the core to a set of designated Expose tables. The Drop Zone may be populated through tools such as: Extract Transform Load (ETL), stored procedures, external programming languages such as Python.
  • Publish: data is exposed through a Publish-Subscribe (Pub-Sub) messaging system. Data might exit via a Message Business or output to a Data Pipe like Kafka.

Level 1 diagrams often depict data flows between databases where detail concerning Ingest or Expose of data is desired.

Data Flow - Level 0

Level 2: Inner Zones

Level 2 reveals the internal structure of the database at zone level. These zones are often included in the internal structures of databases:

  • Ingest: zone where data is input to the database. It is best practice to limit database inputs to the Ingest Zone.
  • Core: zone where subject content is stored. This is central focus of the database. It is best practice to isolate the Core Zone from external processes and databases.
  • Expose: zone where data is made available outside of the database. It is best practice to limit database outputs to the Expose zone.
  • Process: zone where database processes are tracked and controlled. It is best practice to use the same Process Zone schema across databases.
  • Archive: zone where history data is stored in Raw Immutable form. This means that data is stored in the Ingest form and not is not altered which makes for an effective audit trail.
  • Metadata: zone where data describing database content and structure are stored. Glossaries and Data Lineage are examples of data managed here. It is best practice to use the same Metadata Zone schema across databases or to share a Metadata repository across the enterprise.
  • Notify: zone where logs of events are stored. It is best practice to use the same Process Zone schema across databases. Notifications may be sent to a centralized Notifications System or Database.

Atomic Data Warehouse - Level 2

Level 3: Nested Inner Zones

Level 3 drills into the inner structure to show internal design patterns. For example, the Atomic Warehouse Core tends to have these structures or similar structures with different names:

  • Object: tables that are identified through business keys. The Object contains enough information to be identified. The primary key of the object is inherited by the Tie and Properties tables. Similar tables in other approaches include: header, master and hub (Data Vault).
  • Tie: tables that associate one or more Object tables. Similar tables in other approaches include: association, relationship and link (Data Vault).
  • Properties: tables that contain data elements that describe an Object or Tie.
  • Guide: tables that enable efficient and effective data access such as: hierarchy navigation, supertype / subtype and use conditions.
  • Reference: tables that contains static lookup data such as: calendards, currencies, countries and transaction codes.

Atomic Data Warehouse - Level 3

Level 4: Information Domain Model

Level 4 describes the Information Domains within the database.

Atomic Data Warehouse - Level 4

Level 5: Logical Data Model and Procedure Signature

Level 5 describes the solution in terms of Logical Data Models and Procedure External Description. Level 5 is more Design Level than Architectural Level. A Logical Data Model is a detailed data model that describes a solution using business terms. The Logical Data Model:

  • Is created on a detailed level using entities, relationships, and attributes
  • Uses Entity-Relationship Diagramming (ERD)
  • Is used during requirements and design phases
  • Is indpendent of a specific database or technology.

The Level 5 Procedure Description provides the following information about each Procedure (such as stored procedure or exposed service method.

  • Procedure Name: descriptive name or title for the procedure.
  • Description / Purpose: short narrative that describes the procedure.
  • Inputs: list of input parameter descriptive names.
  • Outputs: list of output descriptive names.

Atomic Data Warehouse - Level 5

Level 6: Physical Data Model and Procedure Specifications

Level 6 describes the solution in terms of Physical Data Models, Data Mapping Specifications and Procedure Specifications. Level 6 is more Design Level than Architectural Level. A Physical Data Model is a detailed data model that describes a solution in technical for a specific database. The Physical Data Model is derived from the Logical Data Model:

  • Tables are derived from Entities.
  • Foreign keys are derived from Relations.
  • Columns are derived from Logical Data Model Attributes.
  • Uses Entity-Relationship Diagramming (ERD)
  • Is used during design and build project phases
  • Is specific to a particular database, including optimization and constraints.

Data Mapping Specifications identify sources and targets of data at the attribute level. The specification which may be developed using a graphic tool includes:

  • Data Source: at the database, message and table level.
  • Data Target: at the database, message and table level.
  • High Level Actions: copy, filter, validate, edit and enrich for example.
  • Source Attributes: at the column and field level.
  • Target Attributes: at the column and field level.
  • Detail Level Actions: manipulate, sum and encode for example.

The Level 6 Procedure Specification provides the following information about each Procedure (such as stored procedure or exposed service method.

  • Procedure Name: technical name or title for the procedure.
  • Description / Purpose: short narrative that describes the procedure.
  • Inputs: list of input parameter technical names with datatypes
  • Outputs: list of output technical names with datatypes
  • Logic: description of procedure logic including actions such as: query, filter, validate, aggregate and calculate.

Atomic Data Warehouse - Level 6


Advertisements

Advertisements:
 


Infogoal.com is organized to help you gain mastery.
Examples may be simplified to facilitate learning.
Content is reviewed for errors but is not warranted to be 100% correct.
In order to use this site, you must read and agree to the terms of use, privacy policy and cookie policy.
Copyright 2006-2020 by Infogoal, LLC. All Rights Reserved.

Infogoal Logo