Infogoal Logo
GOAL DIRECTED LEARNING
Master DW

Data and Analytics Tutorial

Data and Analytics Overview
Under Construction

Data and Analytics Success

Data and Analytics Strategy
Project Management
Data Analytics Methodology
Quick Wins
Data Science Methodology

Requirements

BI Requirements Workshop

Architecture and Design

Architecture Patterns
Technical Architecture
Data Attributes
Data Modeling Basics
Dimensional Data Models

Enterprise Information Management

Data Governance
Metadata
Data Quality

Data Stores and Structures

Data Sources
Database Choices
Big Data
Atomic Warehouse
Dimensional Warehouse
Logical Data Warehouse
Data Lake
Operational Datastore (ODS)
Data Vault
Data Science Sandbox
Flat Files Data
Graph Databases
Time Series Data

Data Integration

Data Pipeline
Change Data Capture
Extract Transform Load
ETL Tool Selection
Data Warehoouse Automation
Data Wrangling
Data Science Workflow

BI and Data Visualization

BI - Business Intelligence
Data Viaulization

Data Science

Statistics
Descriptive Analytics
Predictive Analytics
Prescriptive Analytics

Test and Deploy

Testing
Security Architecture
Desaster Recovery
Rollout
Sustaining DW/BI

ETL Tool Selection for Data Warehousing

Learn How To Select ETL Tools Your Data Warehouse

The selection of tools for ETL is key to the successful. At a high level there is a choice between programmer written applications and off the shelf tools. There is also the selection of Commercial Off The Shelf (COTS) versus Open Source.  There is also the option of using "Tools At Hand".

Consideration factors in tool selection include:

ETL tool functions include:

Programmer Written ETL Applications

Programmer written applications may be useful because they provide capabilities beyond standard tools. They are good for complex logic that off the shelf solutions may not provide. In addition, existing development staff is often skilled with development languages such as: SQL, COBOL, C#, Java and PERL. The following chart shows some of the pros and cons of this approach.

Factor

Pro

Con

Budget Additional software licensing fees are avoided. More development labor may be required.
Metadata Metadata is not automatically included.
Performance Standard languages have low overhead. Standard languages may not support parallelism in processing like COTS ETL tools.
Skill Sets Current staff tends to have skills in language use. Data warehousing requires specialized techniques for populating facts and dimensions that staff must learn.
Solution Complexity New tools are not introduced to the environment. Total solution may consist of many "moving parts" which must be managed and maintained.
Vendor Relations Standard languages are available from multiple vendors, avoiding vendor lock in. Vendor support is not available.




 

Data Warehousing ETL Tools At Hand

Your organization may already have tools which could be used to perform ETL functions.  Desktop tools like Microsoft Access and Excel include data movement functionality.  Database software like Microsoft SQL Server and Oracle include ETL functionality competitive with COTS Dedicated ETL Tools.

Vendor

Software

Description

Microsoft Access Provides an import and export capability for both flat files and databases.  Jobs can be built using macros.  Data can be transformed using SQL and Visual Basic scripting language.  Suitable for small scale systems or proof of value demonstrations.
Microsoft SSIS SSIS is included as part of SQL Server 2005 and 2008.  It has features competitive with COTS Dedicated ETL Tools.  Many file types and database types are supported.

Fast load utilities are included with many databases: 

Vendor

Software

Description

IBM DB2 Bulk Provides for rapid loading of IBM DB2 databases.
Microsoft SQL Server BCP Provides for rapid loading of Microsoft SQL Server databases.
Oracle Oracle Data Pump Provides for rapid loading and unloading of Oracle databases.
Sybase BCP Provides for rapid loading of SYBASE Adaptive Server databases.







COTS Dedicated ETL Tools

If you are part of a large organization with a big budget, Commercial Off the Shelf (COTS) software is probably the way to go for the bulk of ETL work.  Here are some toolsets that you should consider:

Vendor

Software

Description

IBM DataStage Multiplatform software enables visual specification of ETL.  Integrates with suite of products including data quality and metadata management.  Supports numerous data sources, including IBM mainframe legacy databases.
Informatica PowerCenter Flexible software supports numerous data sources and targets.  Design through point and click visual approach.  Many transformations are built in.  Supports both batch and real time.  Large product family with many options and components.
SAS ETL Studio Graphical software enables development of ETL applications.  Over 300 transformation types along with numerous source and target types.




 Open Source ETL

Open Source ETL tools have been steadily improving and are in use by many organizations, large and small.  The tools are graphical and metadata driven.  They tend to be a bit simpler than the COTS Dedicated ETL Tools.  The base systems are available without licensing fees.  Support and other services are available for a fee.

Vendor

Software

Description

Jitterbit Jitterbit Graphical systems supports real-time and batch integration.  Strong support for web services and XML.  User and vendor community supplies JitterPaks, specialized interfaces to systems such as salesforce.com and SugarCRM.
Pentaho Data Integrator Visual development of ETL.  Supports many database and file formats.  Integrates with other products in Pentaho suite.
Talend Open Studio Combines graphical design with a metadata driven approach.  This product and approach are very flexible.

Advertisements

Advertisements:
 


Infogoal.com is organized to help you gain mastery.
Examples may be simplified to facilitate learning.
Content is reviewed for errors but is not warranted to be 100% correct.
In order to use this site, you must read and agree to the terms of use, privacy policy and cookie policy.
Copyright 2006-2020 by Infogoal, LLC. All Rights Reserved.

Infogoal Logo