You are on page 1of 23

Data Analytics and

Visualization
Course Code- CSC601
Module I- Introduction to Data analytics and life cycle (5Hr CO1)
Data Analytics Lifecycle overview:Key Roles for a Successful Analytics, Background and Overview
of Data Analytics Lifecycle Project
Phase 1: Discovery: Learning the Business Domain, Resources Framing the Problem, Identifying
Key Stakeholders. Interviewing the Analytics Sponsor, Developing Initial Hypotheses Identifying
Potential Data Sources
Phase 2: Data Preparation: Preparing the Analytic Sandbox, Performing ETLT, Learning About the
Data, Data Conditioning, Survey and visualize, Common Tools for the Data Preparation Phase
Phase 3: Model Planning: Data Exploration and Variable Selection, Model Selection ,Common
Tools for the Model Planning Phase
Phase 4: Model Building: Common Tools for the Model Building Phase
Phase 5: Communicate Results
Phase 6: Operationalize
Introduction to Data analytics-
Analytics is the discovery and communication of meaningful patterns in data. Especially, valuable
in areas rich with recorded information, analytics relies on the simultaneous application of
statistics, computer programming, and operation research to qualify performance. Analytics often
favors data visualization to communicate insight.
Firms may commonly apply analytics to business data, to describe, predict, and improve business
performance. Especially, areas within include predictive analytics, enterprise decision
management, etc. Since analytics can require extensive computation(because of big data), the
algorithms and software used to analytics harness the most current methods in computer science.
In a nutshell, analytics is the scientific process of transforming data into insight for making better
decisions. The goal of Data Analytics is to get actionable insights resulting in smarter decisions
and better business outcomes.
It is critical to design and built a data warehouse or Business Intelligence(BI) architecture that
provides a flexible, multi-faceted analytical ecosystem, optimized for efficient ingestion and
analysis of large and diverse data sets.
There are four types of data analytics

1. Predictive (forecasting)
2. Descriptive (business intelligence and data mining)
3. Prescriptive (optimization and simulation)
4. Diagnostic analytics
Predictive Analytics

Predictive analytics turn the data into valuable, actionable information. predictive
analytics uses data to determine the probable outcome of an event or a likelihood
of a situation occurring.
Predictive analytics holds a variety of statistical techniques from modeling,
machine, learning, data mining, and game theory that analyze current and
historical facts to make predictions about a future event. Techniques that are used
for predictive analytics are:

● Linear Regression
● Time series analysis and forecasting
● Data Mining
Descriptive Analytics

Descriptive analytics looks at data and analyze past event for insight as to how to approach future
events. It looks at the past performance and understands the performance by mining historical data to
understand the cause of success or failure in the past. Almost all management reporting such as sales,
marketing, operations, and finance uses this type of analysis.
The descriptive model quantifies relationships in data in a way that is often used to classify customers or
prospects into groups. Unlike a predictive model that focuses on predicting the behavior of a single
customer, Descriptive analytics identifies many different relationships between customer and product.
Common examples of Descriptive analytics are company reports that provide historic reviews
like:

● Data Queries
● Reports
● Descriptive Statistics
● Data dashboard
Prescriptive Analytics

Prescriptive Analytics automatically synthesize big data, mathematical science, business


rule, and machine learning to make a prediction and then suggests a decision option to take
advantage of the prediction.

Prescriptive analytics goes beyond predicting future outcomes by also suggesting action
benefit from the predictions and showing the decision maker the implication of each decision
option. Prescriptive Analytics not only anticipates what will happen and when to happen but
also why it will happen. Further, Prescriptive Analytics can suggest decision options on how
to take advantage of a future opportunity or mitigate a future risk and illustrate the implication
of each decision option.

For example, Prescriptive Analytics can benefit healthcare strategic planning by using
analytics to leverage operational and usage data combined with data of external factors such
as economic data, population demography, etc.
Diagnostic Analytics

In this analysis, we generally use historical data over other data to answer any
question or for the solution of any problem. We try to find any dependency and pattern
in the historical data of the particular problem.
For example, companies go for this analysis because it gives a great insight into a
problem, and they also keep detailed information about their disposal otherwise data
collection may turn out individual for every problem and it will be very time-consuming.
Common techniques used for Diagnostic Analytics are:

● Data discovery
● Data mining
● Correlations
Key Roles for a Successful Analytics Project-

Seven roles are as follows-

1. Business User
2. Project Sponsor
3. Project Manager
4. Business Intelligence Analyst
5. Database Administrator (DBA)
6. Data Engineer
7. Data Scientist
1. Business User
● The business user is the one who understands the main area of the project and is
also basically benefited from the results.
● This user gives advice and consult the team working on the project about the value
of the results obtained and how the operations on the outputs are done.
● The business manager, line manager, or deep subject matter expert in the project
mains fulfills this role.

2. Project Sponsor
● The Project Sponsor is the one who is responsible to initiate the project. Project
Sponsor provides the actual requirements for the project and presents the basic
business issue.
● He generally provides the funds and measures the degree of value from the final
output of the team working on the project.
● This person introduce the prime concern and brooms the desired output.
3. Project Manager:
● This person ensures that key milestone and purpose of the project is met on
time and of the expected quality.

4. Business Intelligence Analyst :


● Business Intelligence Analyst provides business domain perfection based on
a detailed and deep understanding of the data, key performance indicators
(KPIs), key matrix, and business intelligence from a reporting point of view.
● This person generally creates fascia and reports and knows about the data
feeds and sources.
5. Database Administrator (DBA):
● DBA facilitates and arrange the database environment to support the analytics
need of the team working on a project.
● His responsibilities may include providing permission to key databases or
tables and making sure that the appropriate security stages are in their
correct places related to the data repositories or not.
6. Data Engineer:
● Data engineer grasps deep technical skills to assist with tuning SQL queries for data
management and data extraction and provides support for data intake into the analytic
sandbox.
● The data engineer works jointly with the data scientist to help build data in correct
ways for analysis.

7. Data Scientist:
● Data scientist facilitates with the subject matter expertise for analytical techniques,
data modelling, and applying correct analytical techniques for a given business issues.
● He ensures overall analytical objectives are met.
● Data scientists outline and apply analytical methods and proceed towards the data
available for the concerned project.
Data Analytics Lifecycle
Phase 1- Discovery
● The data science team learn and investigate the problem.
● Develop context and understanding.
● Come to know about data sources needed and available for the project.
● The team formulates initial hypothesis that can be later tested with data.

In Phase 1, the team learns the business domain, including relevant history such as whether
the organization or business unit has attempted similar projects in the past from which they
can learn. The team assesses the resources available to support the project in terms of
people, technology, time, and data. Important activities in this phase include framing the
business problem as an analytics challenge that can be addressed in subsequent phases and
formulating initial hypotheses (IHs) to test and begin learning the data.
Phase 1: Discovery
2.2.1 Learning the Business Domain

2.2.2 Resources

2.2.3 Framing the Problem

2.2.4 Identifying Key Stakeholders

2.2.5 Interviewing the Analytics Sponsor

2.2.6 Developing Initial Hypotheses

2.2.7 Identifying Potential Data Sources


Phase 2- Data preparation:

● Steps to explore, preprocess, and condition data prior to modeling and analysis.
● It requires the presence of an analytic sandbox, the team execute, load, and transform,
to get data into the sandbox.
● Data preparation tasks are likely to be performed multiple times and not in predefined
order.
● Several tools commonly used for this phase are – Hadoop, Alpine Miner, Open Refine,
etc.
Phase 2 requires the presence of an analytic sandbox, in which the team can work with data
and perform analytics for the duration of the project. The team needs to execute extract,
load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox.
The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the
ETLT process so the team can work with it and analyze it. In this phase, the team also needs
to familiarize itself with the data thoroughly and take steps to condition the data
Phase 2: Data Preparation

2.3.1 Preparing the Analytic Sandbox

2.3.2 Performing ETLT

2.3.3 Learning About the Data

2.3.4 Data Conditioning

2.3.5 Survey and Visualize

2.3.6 Common Tools for the Data Preparation Phase


Phase 3-Model planning:

● Team explores data to learn about relationships between variables and


subsequently, selects key variables and the most suitable models.
● In this phase, data science team develop data sets for training, testing, and
production purposes.
● Team builds and executes models based on the work done in the model planning
phase.
● Several tools commonly used for this phase are – Matlab, STASTICA.
Phase 3 is model planning, where the team determines the methods, techniques, and
workflow it intends to follow for the subsequent model building phase. The team explores
the data to learn about the relationships between variables and subsequently selects key
variables and the most suitable models.
Phase 3: Model Planning

2.4.1 Data Exploration and Variable Selection

2.4.2 Model Selection

2.4.3 Common Tools for the Model Planning Phase


Phase 4-Model building:

● Team develops datasets for testing, training, and production purposes.


● Team also considers whether its existing tools will suffice for running the models or if
they need more robust environment for executing models.
● Free or open-source tools – Rand PL/R, Octave, WEKA.
● Commercial tools – Matlab , STASTICA.
In Phase 4, the team develops datasets for testing, training, and production purposes. In
addition, in this phase the team builds and executes models based on the work done in
the model planning phase. The team also considers whether its existing tools will suffice
for running the models, or if it will need a more robust environment for executing models
and work flows (for example, fast hardware and parallel processing, if applicable).
Phase 5-Communicate results:

● After executing model team need to compare outcomes of modeling to criteria


established for success and failure.
● Team considers how best to articulate findings and outcomes to various team
members and stakeholders, taking into account warning, assumptions.
● Team should identify key findings, quantify business value, and develop
narrative to summarize and convey findings to stakeholders.
In Phase 5, the team, in collaboration with major stakeholders, determines if the
results of the project are a success or a failure based on the criteria developed in
Phase 1. The team should identify key findings, quantify the business value, and
develop a narrative to summarize and convey findings to stakeholders.
Phase 6-Operationalize:

● The team communicates benefits of project more broadly and sets up pilot project to
deploy work in controlled way before broadening the work to full enterprise of users.
● This approach enables team to learn about performance and related constraints of
the model in production environment on small scale  , and make adjustments before
full deployment.
● The team delivers final reports, briefings, codes.
● Free or open source tools – Octave, WEKA, SQL, MADlib.
In Phase 6, the team delivers final reports, briefings, code, and technical documents. In
addition, the team may run a pilot project to implement the models in a production
environment.

You might also like