ETL Tool Selection Extract Transform Load

ETL Tool Selection for Data Warehousing

Learn How To Select ETL Tools Your Data Warehouse

David Haertzen David Haertzen, Principal Enterprise Architect




The selection of tools for ETL is key to the successful. At a high level there is a choice between programmer written applications and off the shelf tools. There is also the selection of Commercial Off The Shelf (COTS) versus Open Source.  There is also the option of using "Tools At Hand".

Consideration factors in tool selection include:

  • Business requirements
  • Security requirements
  • Data latency requirements
  • Complexity
  • Data volume
  • Staff skill sets
  • Standards
  • Existing tools
  • Technology of data sources and data targets
  • Data quality level of data sources
  • Budget
  • Schedule


ETL tool functions include:

  • Extract Utilities
  • DB Utilities
  • Extract Manager
  • Controls How and When Extracts Are Run
  • Records Activity
  • Reports on Activity
  • Extract Builder
  • Metadata Manager
  • Code Generator

Programmer Written ETL Applications

Programmer written applications may be useful because they provide capabilities beyond standard tools. They are good for complex logic that off the shelf solutions may not provide. In addition, existing development staff is often skilled with development languages such as: SQL, COBOL, C#, Java and PERL. The following chart shows some of the pros and cons of this approach.

Factor

Pro

Con

Budget Additional software licensing fees are avoided. More development labor may be required.
Metadata Metadata is not automatically included.
Performance Standard languages have low overhead. Standard languages may not support parallelism in processing like COTS ETL tools.
Skill Sets Current staff tends to have skills in language use. Data warehousing requires specialized techniques for populating facts and dimensions that staff must learn.
Solution Complexity New tools are not introduced to the environment. Total solution may consist of many "moving parts" which must be managed and maintained.
Vendor Relations Standard languages are available from multiple vendors, avoiding vendor lock in. Vendor support is not available.




 

Data Warehousing ETL Tools At Hand

Your organization may already have tools which could be used to perform ETL functions.  Desktop tools like Microsoft Access and Excel include data movement functionality.  Database software like Microsoft SQL Server and Oracle include ETL functionality competitive with COTS Dedicated ETL Tools.

Vendor

Software

Description

Microsoft Access Provides an import and export capability for both flat files and databases.  Jobs can be built using macros.  Data can be transformed using SQL and Visual Basic scripting language.  Suitable for small scale systems or proof of value demonstrations.
Microsoft SSIS SSIS is included as part of SQL Server 2005 and 2008.  It has features competitive with COTS Dedicated ETL Tools.  Many file types and database types are supported.

Fast load utilities are included with many databases: 

Vendor

Software

Description

IBM DB2 Bulk Provides for rapid loading of IBM DB2 databases.
Microsoft SQL Server BCP Provides for rapid loading of Microsoft SQL Server databases.
Oracle Oracle Data Pump Provides for rapid loading and unloading of Oracle databases.
Sybase BCP Provides for rapid loading of SYBASE Adaptive Server databases.







COTS Dedicated ETL Tools

If you are part of a large organization with a big budget, Commercial Off the Shelf (COTS) software is probably the way to go for the bulk of ETL work.  Here are some toolsets that you should consider:

Vendor

Software

Description

IBM DataStage Multiplatform software enables visual specification of ETL.  Integrates with suite of products including data quality and metadata management.  Supports numerous data sources, including IBM mainframe legacy databases.
Informatica PowerCenter Flexible software supports numerous data sources and targets.  Design through point and click visual approach.  Many transformations are built in.  Supports both batch and real time.  Large product family with many options and components.
SAS ETL Studio Graphical software enables development of ETL applications.  Over 300 transformation types along with numerous source and target types.




 Open Source ETL

Open Source ETL tools have been steadily improving and are in use by many organizations, large and small.  The tools are graphical and metadata driven.  They tend to be a bit simpler than the COTS Dedicated ETL Tools.  The base systems are available without licensing fees.  Support and other services are available for a fee.

Vendor

Software

Description

Jitterbit Jitterbit Graphical systems supports real-time and batch integration.  Strong support for web services and XML.  User and vendor community supplies JitterPaks, specialized interfaces to systems such as salesforce.com and SugarCRM.
Pentaho Data Integrator Visual development of ETL.  Supports many database and file formats.  Integrates with other products in Pentaho suite.
Talend Open Studio Combines graphical design with a metadata driven approach.  This product and approach are very flexible.
Copyright© 1999-2015, First Place Software, Inc.