The selection of tools for ETL is key to the successful. At a high level there is a choice between programmer written applications and off the shelf tools. There is also the selection of Commercial Off The Shelf (COTS) versus Open Source. There is also the option of using "Tools At Hand".
Consideration factors in tool selection include:
ETL tool functions include:
Programmer written applications may be useful because they provide capabilities beyond standard tools. They are good for complex logic that off the shelf solutions may not provide. In addition, existing development staff is often skilled with development languages such as: SQL, COBOL, C#, Java and PERL. The following chart shows some of the pros and cons of this approach.
Factor |
Pro |
Con |
Budget | Additional software licensing fees are avoided. | More development labor may be required. |
Metadata | Metadata is not automatically included. | |
Performance | Standard languages have low overhead. | Standard languages may not support parallelism in processing like COTS ETL tools. |
Skill Sets | Current staff tends to have skills in language use. | Data warehousing requires specialized techniques for populating facts and dimensions that staff must learn. |
Solution Complexity | New tools are not introduced to the environment. | Total solution may consist of many "moving parts" which must be managed and maintained. |
Vendor Relations | Standard languages are available from multiple vendors, avoiding vendor lock in. | Vendor support is not available. |
Your organization may already have tools which could be used to perform ETL functions. Desktop tools like Microsoft Access and Excel include data movement functionality. Database software like Microsoft SQL Server and Oracle include ETL functionality competitive with COTS Dedicated ETL Tools.
Vendor |
Software |
Description |
Microsoft | Access | Provides an import and export capability for both flat files and databases. Jobs can be built using macros. Data can be transformed using SQL and Visual Basic scripting language. Suitable for small scale systems or proof of value demonstrations. |
Microsoft | SSIS | SSIS is included as part of SQL Server 2005 and 2008. It has features competitive with COTS Dedicated ETL Tools. Many file types and database types are supported. |
Vendor |
Software |
Description |
IBM | DB2 Bulk | Provides for rapid loading of IBM DB2 databases. |
Microsoft | SQL Server BCP | Provides for rapid loading of Microsoft SQL Server databases. |
Oracle | Oracle Data Pump | Provides for rapid loading and unloading of Oracle databases. |
Sybase | BCP | Provides for rapid loading of SYBASE Adaptive Server databases. |
If you are part of a large organization with a big budget, Commercial Off the Shelf (COTS) software is probably the way to go for the bulk of ETL work. Here are some toolsets that you should consider:
Vendor |
Software |
Description |
IBM | DataStage | Multiplatform software enables visual specification of ETL. Integrates with suite of products including data quality and metadata management. Supports numerous data sources, including IBM mainframe legacy databases. |
Informatica | PowerCenter | Flexible software supports numerous data sources and targets. Design through point and click visual approach. Many transformations are built in. Supports both batch and real time. Large product family with many options and components. |
SAS | ETL Studio | Graphical software enables development of ETL applications. Over 300 transformation types along with numerous source and target types. |
Open Source ETL tools have been steadily improving and are in use by many organizations, large and small. The tools are graphical and metadata driven. They tend to be a bit simpler than the COTS Dedicated ETL Tools. The base systems are available without licensing fees. Support and other services are available for a fee.
Vendor |
Software |
Description |
Jitterbit | Jitterbit | Graphical systems supports real-time and batch integration. Strong support for web services and XML. User and vendor community supplies JitterPaks, specialized interfaces to systems such as salesforce.com and SugarCRM. |
Pentaho | Data Integrator | Visual development of ETL. Supports many database and file formats. Integrates with other products in Pentaho suite. |
Talend | Open Studio | Combines graphical design with a metadata driven approach. This product and approach are very flexible. |
Infogoal.com is organized to help you gain mastery.
Examples may be simplified to facilitate learning.
Content is reviewed for errors but is not warranted to be 100% correct.
In order to use this site, you must read and agree to the
terms of use, privacy policy and cookie policy.
Copyright 2006-2020 by Infogoal, LLC. All Rights Reserved.