Infogoal Logo
GOAL DIRECTED LEARNING
Master DW

Data and Analytics Tutorial

Data and Analytics Overview
Under Construction

Data and Analytics Success

Data and Analytics Strategy
Project Management
Data Analytics Methodology
Quick Wins
Data Science Methodology

Requirements

BI Requirements Workshop

Architecture and Design

Architecture Patterns
Technical Architecture
Data Attributes
Data Modeling Basics
Dimensional Data Models

Enterprise Information Management

Data Governance
Metadata
Data Quality

Data Stores and Structures

Data Sources
Database Choices
Big Data
Atomic Warehouse
Dimensional Warehouse
Logical Data Warehouse
Data Lake
Operational Datastore (ODS)
Data Vault
Data Science Sandbox
Flat Files Data
Graph Databases
Time Series Data

Data Integration

Data Pipeline
Change Data Capture
Extract Transform Load
ETL Tool Selection
Data Warehoouse Automation
Data Wrangling
Data Science Workflow

BI and Data Visualization

BI - Business Intelligence
Data Viaulization

Data Science

Statistics
Descriptive Analytics
Predictive Analytics
Prescriptive Analytics

Test and Deploy

Testing
Security Architecture
Desaster Recovery
Rollout
Sustaining DW/BI

Data Science Sandbox Tutorial

Making data science breakthroughs through use of traditional databases such as application databases or data warehouses has its challenges. Application data needs to be protected and not harmed through experimentation. Individual workstations may lack scale and potential for team collaboration. Spreadsheets also lack scale, support simpler problems and have limtted shareability. Problems to be solved are often complex and require large volumes of data and computing power to solve. The answer is provided by the Data Science Sandbox.

Data Science Sandbox Architecture

What is a Data Science Sandbox

The Data Science Sandbox is an environment specifically designed for data science and analytics. It gives data scientists and analysts a protected, shared environment where models can be built and experiments conducted without harm to application databases. The Data Science Sandbox often has these characteristics and features:

  • Scalability: can grow and shrink to accomodate the volume of data and computation needed. Cloud environments provide powerful scalability.
  • Shareable Code and Models: a combination of source code repository and sharable code snippets enable model management. Code sharing via Python Notebooks and Zepelin Notebooks is a best practice.
  • Data Science Platforms and Languages: provide the data scientist with base to develop solutions. Platforms can provide a graphical interface, such as: Alteryx, Knime and Rapid Miner. Programming languages provide detailed control, such as: Python, R, Scala, C++ and Julia. Julia has 10x to 30x faster execution speeed compared to Python and R.
  • Data Science Libraries: provide prebuilt solutions to data science challenges. Python core libraries like Numpy and Pandas are a given, plus data science libraries like Scikit-Learn, TensorFlow and Keras provide a solution framework.
  • Data Protection: provides security for at risk data such as customer personal information. Data protection measures can include: user authorization, user authentication, firewalls, data encryption and data obfuscation. In many cases, senstive data can be removed before loading to the Data Science Sandbox environment.
  • Data Engineering: supplies the data environment: datastores and data pipelines. This function is performed by Data Engineers.
  • Data Wrangling: enables data to be "mssaged" at a detail level. It includes: data cleansing, filtering and organizing. This function is performed by Data Scientists.
  • Flexible Data Access: data virtualization, data lake, data import
  • On Demand: enables new projects and research efforts to start quickly without
  • Play and Experimentation: enable creative juices to flow for the development of innovative solutions. Data Scientists can quickly develop and test hypothesis through experimentaiton.

Data Science Sandbox References and Links

Check out these Data Science Sandbox links:


Advertisements

Advertisements:
 


Infogoal.com is organized to help you gain mastery.
Examples may be simplified to facilitate learning.
Content is reviewed for errors but is not warranted to be 100% correct.
In order to use this site, you must read and agree to the terms of use, privacy policy and cookie policy.
Copyright 2006-2020 by Infogoal, LLC. All Rights Reserved.

Infogoal Logo