Technical architecture is all about making the right choices for the data and analytics effort. This article will help you to set the foundation for the successful Data Analytics Solution.
According to IEEE standard 1471-2000, "Software
architecture is the fundamental organization of a system,
embodied in its components, their relationships to each other
and the environment, and the principles governing its design
and evolution".
Data Analytics technical architecture includes:
This is a large topic, so there are many references to supporting data analytics articles at this and other websites.
At another level, data warehousing architecture builds on the classic system pattern: input, process and output:
The recommendation "Begin with the end in mind" is very true for Data Analytics. The end that we have in mind is a system that satisfies both functional and non-functional requirements, that is, a system that does what it is supposed to do.
Functional requirements (business requirements) are needs identified by the business relating to data and business processes. For example, we may be looking for a system that provides information about customers, territories and products and that supports business processes of selling and customer support. See the article, Requirements for Data Analytics, for guidance on how to gather and organize business requirements.
Non-functional requirements are needs about performance and IT chosen practices. Performance includes issues such as required system availability and recoverability. It depends on the volume of data and number of users expected for the Data Analytics Solution. IT chosen practices include selected technologies (the "tech stack") and standards.
Data Analytics architectural principles, largely complementing enterprise architecture set a framework for decision making. One set of architectural principles are the "ilities":
Some additional principles might include:
A critical question to Data Analytics efforts, is how to obtain the resources that make up the Data Analytics Solution. There are three options:
Re-using existing resources can often save money and deliver a superior and more maintainable solution. If we buy or build new components every time that there is a new project, then the portfolio of resources will soon become bloated and expensive to maintain. Re-use can have drawbacks. Existing resources may not meet current function or non-functional requirements.
Buying a new Data Ama;utocs resource can save time and money over building a resource. Buying is a good choice when products are available for a price less than building and meet a large percentage of requirements. Purchased software may have more features and fewer problems than home grown software for example.
Building a resource can be a good answer when there are no existing resources to re-use and purchased resources that meet requirements are not available for a reasonable price. Building a solution or part of a solution can result in a competitive advantage where your organization has a capability that is not readily duplicated by competitors. Cost is also a factor. Purchased software often has a per user or per computer charge while in-house developed software can be made available to internal users without additional licensing fees.
In general, we recommend "Re-use before buy and buy before build". Some combination is likely. Create a list of needed resources and specifying the type of sourcing for each item.
Metadata is often defined as "data about data". In practice, Data Analytics metadata is any data that describes or controls the system that is not procedural programming code. Examples of metadata include:
Defining data once through metadata and then re-using those data definitions can save much development and support time while resulting in more consistent Data Analytics Solutions.
Metadata is typically created in tools such as the data modeling tool and the ETL tool. It may then be stored in metadata repository that manages and coordinates this information.
See the article, Metadata for Data Analytics, for further insights.
[PAGE_BREAK]
The information provided by the Data Analytics Solution is only as good as its inputs. Finding, understanding, selecting and improving data sources are critical to the success of any Data Analytics project.
A leading cause of Data Analytics project failures is a lack of understanding of data sources and poor data quality of data sources. We recommend gaining an understanding of the data by use of data profiling tools and the improvement of its data quality through data cleansing approaches.
The article Data Sources for Data Analytics provides further information.
The Data Analytics extract process pulls data out of data sources so that is available for later transformation and then load into the Data Analytics and other databases. Architectural choices include choice of extract tool and timing of extracts.
Data Analytics extracts are further explained the article ETL - Extract Transform Load for Data Analytics.
The choice of where and how to store the data for the Data Analytics Solution is a critical architectural question. Part of the issue is Data Analytics "style":
The basic Data Analytics system calls for the creation of the following types of databases:
The article, Database Choices for Data Analytics Solutions, describes each of these alternatives in detail.
A data model is a graphical view of data created for analysis and design purposes. While architecture does not include designing Data Analytics database in detail, it does include defining principles and patterns modeling specialized parts of the Data Analytics system.
Areas that require specialized patterns are:
In addition to these specialized patterns, the architecture should include other pattern descriptions for:
These patterns are described in the article Data Models for Data Warehousing in greater detail.
After data has been extracted and the physical storage areas created, it is time to pump the data through the data analytics solutions - from data sources to staging to atomic data warehouse to dimensional data warehouse to BI query to the business user.
These key activities are needed to support this process:
The article ETL - Extract Transform Load for Data Warehousing and Business Intelligence provides further information on this subject.
Business Intelligence is the part of the data warehousing system where business users analyze the data and prepare presentations. The Data Analytics architecture must provide for the needs of the business people who will access the system.
Business people are likely to act like farmers who harvest a crop of known information or explorers who are seeking new patterns. Both types of access must be supported.
Data warehouse architecture includes the selection and use of the following types of tools:
The article BI - Business Intelligence supplies further information about each tool category.
To achieve benefits from business intelligence, it is important to manage the Data Analytics to make sure that it continues to provide value and avoids risk of loss. This includes activities like:
The management of data warehousing is further described in the article Operations.
Data warehousing architecture helps to answer critical questions:
Roadmaps identify actions that must be taken to achieve the desired future state. In addition, roadmaps specify intermediate future states that must be passed through to achieve the future state.
Infogoal.com is organized to help you gain mastery.
Examples may be simplified to facilitate learning.
Content is reviewed for errors but is not warranted to be 100% correct.
In order to use this site, you must read and agree to the
terms of use, privacy policy and cookie policy.
Copyright 2006-2020 by Infogoal, LLC. All Rights Reserved.