DATA ANALYTICS Syllabus 3 Units

DATA ANALYTICS
DATA ANALYTICS
SYLLABUS
UNIT - I : Data Management : Design data Architecture and manage the data for
analysis,Understanding various sources of data like Sensors/Signals/GPS etc.Data
Management ,Data Quality ( Noise , outliers,Missing Values , Duplicate Data ) and data
processing and processing.
UNIT - II : Data Analytics : Introduction to analytics,Introduction to tools and

Environment,Benefits and Limitations of Data Analytics,Data Analytics Life
cycle,Applications of Data Analytics & Applications of Business,Data bases &
types/fundamentals of data analytics,types of data & variables,Data modelling
techniques,Missing imputations etc.,Need for Business modelling.
UNIT - III : Data Visualizations : Pixel - Oriented Visualization techniques,Geometric

projection visualization techniques , Icon - Based visualization techniques, hierarchical
visualization techniques , visualizing complex data and relations.
*********
Page 1
DATA ANALYTICS
1. What Is Data Management?

Data management is the practice of collecting, keeping, and using data securely, efficiently, and cost-
effectively. The goal of data management is to help people, organizations, and connected things optimize the
use of data within the bounds of policy and regulation so that they can make decisions and take actions that
maximize the benefit to the organization. A robust data management strategy is becoming more important than
ever as organizations increasingly rely on intangible assets to create value.
Design data Architecture and manage the data for analysis :

Data architecture design is set of standards which are composed of certain policies, rules, models and
standards which manages, what type of data is collected, from where it is collected, the arrangement
of collected data, storing that data, utilizing and securing the data into the systems and data
warehouses for further analysis.
Data is one of the essential pillars of enterprise architecture through which it succeeds in the
execution of business strategy.
Data architecture design is important for creating a vision of interactions occurring between data
systems, like for example if data architect wants to implement data integration, so it will need
interaction between two systems and by using data architecture the visionary model of data
interaction during the process can be achieved.
Data architecture also describes the type of data structures applied to manage data and it provides an
easy way for data preprocessing. The data architecture is formed by dividing into three essential
models and then are combined :
 Conceptual model – It is a business model which uses Entity Relationship (ER) model for relation
between entities and their attributes.
 Logical model – It is a model where problems are represented in the form of logic such as rows
and column of data, classes, xml tags and other DBMS techniques.
 Physical model – Physical models holds the database design like which type of database
technology will be suitable for architecture.
Page 2
DATA ANALYTICS
A data architect is responsible for all the design, creation, manage, deployment of data architecture
and defines how data is to be stored and retrieved, other decisions are made by internal bodies.
Factors that influence Data Architecture :
Few influences that can have an effect on data architecture are business policies, business
requirements, Technology used, economics, and data processing needs.
 Business requirements – These include factors such as the expansion of business, the
performance of the system access, data management, transaction management, making use of
raw data by converting them into image files and records, and then storing in data warehouses.
Data warehouses are the main aspects of storing transactions in business.
 Business policies – The policies are rules that are useful for describing the way of processing data.
These policies are made by internal organizational bodies and other government agencies.
 Technology in use – This includes using the example of previously completed data architecture
design and also using existing licensed software purchases, database technology.
 Business economics – The economical factors such as business growth and loss, interest rates,
loans, condition of the market, and the overall cost will also have an effect on design architecture.
 Data processing needs – These include factors such as mining of the data, large continuous
transactions, database management, and other data preprocessing needs.
--------------------------------------------------------------------------------------------------------------------------------------
2. Understanding various sources of data like Sensors/Signals/GPS etc.

Data is the backbone of any data analysis work done in the research process. Data is a collection of
unorganized facts and numbers from different sources. The sources of data can be different depending on
what the research needs. Data analysis and interpretation are based solely on gathering different kinds of
data from their sources. Researchers or analysts do the work of data collection to collect information.
What are the sources of data?

In short, the sources of data are physical or digital places where information is stored in a data table, data
object, or some other storage format.
Data can be gathered from two places: internal and external sources. The information collected from
internal sources is called “primary data,” while the information gathered from outside references is called
“secondary data.”
For data analysis, it all must be collected through primary or secondary research. A data source is a pool of
statistical facts and non-statistical facts that a researcher or analyst can use to do more work on their
research.
There are mostly two kinds of origins of information:The Sources of Data: Definition, types,
and examples
 Statistical
 Census
Page 3
DATA ANALYTICS
Researchers use both data sources a lot in their work. The data is collected from these using
either primary or secondary research methods.
Types of data sources

1. Statistical data sources
Statistical data sources are surveys and other statistical reports used for official purposes. Here,
people are asked several questions, which can be either qualitative or quantitative. Qualitative
data sources don’t use numbers, while quantitative data do.
The data sampling method uses both kinds of statistical data. Usually, a sample survey is used to
do a statistical survey. In this method, sample data is collected and then analyzed using
statistical tools and techniques. The surveys can also be done using the questionnaire method.
2. Census data sources

According to this method, the data are taken from the census report that was published earlier.
It’s the opposite of statistical surveys. The Census method closely examines all parts of the
population during the research process. Here, the data is collected over a certain amount of
time, called the reference time. The researchers do their research at a particular time and then
analyze it to conclude.
Census is done in the country for official purposes. The respondents are asked questions, which
they answer. This interaction can take place in person or over the phone. However, the census is
a source of data that takes a lot of time and effort because it involves the whole population.
Additional sources of data

In addition to the above data sources, other origins are also considered when collecting data.
These are what they are:
1. Internal sources of data
Internal data references are things like reports and records that are published within the
organization.
Internal data references are used to do primary research on a given topic. As a researcher, you
can go to internal sources to get information. All the work of the study is easy for it.
Some of the different internal data are accounting resources, sales force reports, internal
experts, and miscellaneous reports.
Page 4
DATA ANALYTICS
2. External sources of data

When data collection happens outside of the organization, it is called an external data source. In
every way, they are outside of the company. As a researcher, you can work for external data
collection.
The data from external origins is harder to gather because it is much more varied, and there can
be many of them. There are different groups into which external data can be put. They are given
below:
 Government publications
Researchers can get a massive amount of information from government sources. Also, you can
get much of this information for free on the Internet.
 Non-government publications
Researchers can also find industry-related information in non-government publications. The
only problem with non-government publications is that their data may sometimes be biased.
 Syndicate services
Some companies offer Syndicate services. As part of this, they collect and organize the same
marketing information for all their clients. Surveys, mail diary panels, electronic services,
wholesalers, industrial firms, retailers, etc., are ways they get information from households.
1. Experimental sources of data

In this data source, the information comes from related experiments and related tools. The
researcher experiments to get all of the information they need.
Researchers can find out about the different ways that experiments can be set up. The four
most common ways to run an experiment are:
 CRD – Completely randomized design - A Completely Randomized Design is a simple

experimental outline used in data analytics. It is based on randomization and replication.
It is mostly used to compare the experiments.
 RBD –Randomized block design - Randomized Block Design is an experimental design
that divides the experiment into small units called blocks. Random experiments are run on
each block, and the results are analyzed using the analysis of variance technique (ANOVA).
RBD originated in the agricultural sector.
Page 5
DATA ANALYTICS
 LSD – Latin square design - Latin Square Design is an experimental design similar to CRD
and RBD blocks but also has rows and columns. It comprises NxN squares with the same
number of rows, columns, and letters that only appear once in a row. So, the differences
are easy to find, and the experiment is less likely to go wrong. A Latin square design is
something like a Sudoku puzzle.
 FD – Factorial designs - Factorial design is an experimental design in which each
experiment has two factors, each with a possible value, and additional combinational
elements are derived from the results of the previous trial.
Conclusion
The sources of data are a complicated term. Simply put, a data source is a physical or digital
place where the data in question is kept in a data table, data object, or another storage format.
===========================================================================
=
Page 6
DATA ANALYTICS
3 . Data Management ,Data Quality ( Noise , outliers,Missing Values , Duplicate

Data )
Page 7
DATA ANALYTICS
Page 8
DATA ANALYTICS
Page 9
DATA ANALYTICS
Page 10
DATA ANALYTICS
Page 11
DATA ANALYTICS
Page 12
DATA ANALYTICS
Page 13
DATA ANALYTICS
Page 14
DATA ANALYTICS
Examples of data quality problems:
 Noise and outliers

 Missing values
 Duplicate data
 Wrong data
The first two are considered in more detail below.
Noisy data
For objects, noise is considered an extraneous object.
For attributes, noise refers to modification of original values.
 Examples: distortion of a person’s voice when talking on a poor phone and “snow” on
television screen
 We can talk about signal to noise ratio.
Left image of 2 sine waves has low or zero SNR; the right image are the two waves
combined with noise and has high SNR
Page 15
DATA ANALYTICS
Origins of noise
 outliers -- values seemingly out of the normal range of data
 duplicate records -- good database design should minimize this (use DISTINCT on SQL
retrievals)
 incorrect attribute values -- again good db design and integrity constraints should
minimize this
 numeric only, deal with rogue strings or characters where numbers should be.
 null handling for attributes (nulls=missing values)
Outliers--be careful!
Outliers are data objects with characteristics that are considerably different than
most of the other data objects in the data set
Case 1: Outliers are noise that interferes with data analysis
Case 2: Recognizing outliers can be the goal of our analysis
 Credit card fraud

 Intrusion detection
Causes for case 1?
Page 16
DATA ANALYTICS
Missing Data Handling

Many causes: malfunctioning equipment, changes in experimental design, collation of different
data sources, measurement not possible. People may wish to not supply information. Information
is not applicable (childen don't have annual income)
 Discard records with missing values

 Ordinal-continuous data, could replace with attribute means
 Substitute with a value from a similar instance
 Ignore missing values, i.e., just proceed and let the tools deals with them
 Treat missing values as equals (all share the same missing value code)
 Treat missing values as unequal values
BUT...Missing (null) values may have significance in themselves (e.g. missing test in a medical
examination, deathdate missing means still alive!)
Missing completely at random (MCAR)
 Missingness of a value is independent of attributes

 Fill in values based on the attribute as suggested above (e.g. attribute mean)
 Analysis may be unbiased overall
Missing at Random (MAR)
 Missingness is related to other variables

 Fill in values based other values (e.g., from similar instances)
 Almost always produces a bias in the analysis
Page 17
DATA ANALYTICS
Missing Not at Random (MNAR)
 Missingness is related to unobserved measurements

 Informative or non-ignorable missingness
Not possible to know the situation from the data. You need to know the context, application field,
data collection process, etc.
Inaccurate values
Issues and consideration
 Data may not been collected for mining purposes

 Errors and omissions don't affect original purpose of data (e.g. age of customer)
 Typographical errors in nominal attributes => values need to be checked for consistency
 Typographical and measurement errors in numeric attributes => outliers need to be
identified
 Errors may be deliberate (e.g. wrong postal codes, birthdates)
 Other problems: duplicates, stale data
Duplicate Data
Data set may include data objects that are duplicates, or almost duplicates of one another
A major issue when merging data from multiple, heterogeneous sources
 Examples: Same person with multiple email addresses
When should duplicate data not be removed?
We will address this further in the later sections on similarity and dissimilarity in the chapter.
Data Preprocessing
Aggregation - combining two or more attributes (or objects) into a single attribute (or
object)
Sampling - the main technique employed for data set reduction (reduce number of rows)
Dimensionality Reduction - identify "important" variables
Feature subset selection - remove redundant or irrelevant attributes
Page 18
DATA ANALYTICS
Feature creation- new attributes that can capture the important information in a data set
much more efficiently than the original attributes
Discretization and Binarization
Attribute Transformation - a function that maps the entire set of values of a given
attribute to a new set of replacement values such that each old value can be identified with
one of the new value
4 . Data processing and processing :

What is data processing in data analytics?
Data processing occurs when data is collected and translated into usable information.
Usually performed by a data scientist or team of data scientists, it is important for data
processing to be done correctly as not to negatively affect the end product, or data output.
Data processing starts with data in its raw form and converts it into a more readable
format (graphs, documents, etc.), giving it the form and context necessary to be
interpreted by computers and utilized by employees throughout an organization.
Six stages of data processing

1. Data collection
Collecting data is the first step in data processing. Data is pulled from available sources,
including data lakes and data warehouses. It is important that the data sources available are
trustworthy and well-built so the data collected (and later used as information) is of the
highest possible quality.
2. Data preparation
Once the data is collected, it then enters the data preparation stage. Data preparation, often
referred to as “pre-processing” is the stage at which raw data is cleaned up and organized
for the following stage of data processing. During preparation, raw data is diligently checked
for any errors. The purpose of this step is to eliminate bad data (redundant, incomplete, or
incorrect data) and begin to create high-quality data for the best business intelligence.
Page 19
DATA ANALYTICS
3. Data input
The clean data is then entered into its destination (perhaps a CRM like Salesforce or a data
warehouse like Redshift), and translated into a language that it can understand. Data input is
the first stage in which raw data begins to take the form of usable information.
4. Processing
During this stage, the data inputted to the computer in the previous stage is actually
processed for interpretation. Processing is done using machine learning algorithms, though
the process itself may vary slightly depending on the source of data being processed (data
lakes, social networks, connected devices etc.) and its intended use (examining advertising
patterns, medical diagnosis from connected devices, determining customer needs, etc.).
5. Data output/interpretation
The output/interpretation stage is the stage at which data is finally usable to non-data
scientists. It is translated, readable, and often in the form of graphs, videos, images, plain
text, etc.). Members of the company or institution can now begin to self-serve the data for
their own data analytics projects.
6. Data storage
The final stage of data processing is storage. After all of the data is processed, it is then
stored for future use. While some information may be put to use immediately, much of it
will serve a purpose later on. Plus, properly stored data is a necessity for compliance with
data protection legislation like GDPR. When data is properly stored, it can be quickly and
easily accessed by members of the organization when needed.
*******
UNIT - II
Page 20
DATA ANALYTICS
Data Analytics :
What is data analytics?
Introduction to analytics :
An introduction to the field of analytics, including the process of identifying an analytics problem in context,
identifying sources and acquiring data, preparing data for analysis to address the problem.
Data analytics is the process of turning raw data into meaningful, actionable insights. You can think of it as a
form of business intelligence, used to solve specific problems and challenges within an organization. It’s all
about finding patterns in a dataset which can tell you something useful and relevant about a particular area
of the business—how certain customer groups behave, for example, or why sales dipped during a given time
period.
A data analyst takes the raw data and analyzes it to draw out useful insights. They then present these
insights in the form of visualizations, such as graphs and charts, so that stakeholders can understand and act
upon them. The kinds of insights gleaned from the data depends on the type of analysis performed. There
are four main types of analysis used by data experts:
 descriptive
 Diagnostic
 predictive
 Prescriptive
Descriptive analytics looks at what happened in the past, while diagnostic analytics looks at why it might
have happened. Predictive and prescriptive analytics consider what is likely to happen in the future and,
based on these predictions, what the best course of action might be.
In all, data analytics helps you to make sense of the past and to predict future trends and behaviors. So,
rather than basing your decisions and strategies on guesswork, you’re making informed choices based on
what the data is telling you. With a data-driven approach, businesses and organizations are able to develop
a much deeper understanding of their audience, their industry, and their company as a whole—and, as a
result, are much better equipped to make decisions, plan ahead, and compete in their chosen market.
Ways to Use Data Analytics:

Page 21
DATA ANALYTICS
1. Improved Decision Making: Data Analytics eliminates guesswork and manual tasks. Be it choosing the
right content, planning marketing campaigns, or developing products. Organizations can use the insights
they gain from data analytics to make informed decisions. Thus, leading to better outcomes and customer
satisfaction.
2. Better Customer Service: Data analytics allows you to tailor customer service according to their needs. It
also provides personalization and builds stronger relationships with customers. Analyzed data can reveal
information about customers’ interests, concerns, and more. It helps you give better recommendations for
products and services.
3. Efficient Operations: With the help of data analytics, you can streamline your processes, save money, and
boost production. With an improved understanding of what your audience wants, you spend lesser time
creating ads and content that aren’t in line with your audience’s interests.
4. Effective Marketing: Data analytics gives you valuable insights into how your campaigns are performing.
This helps in fine-tuning them for optimal outcomes. Additionally, you can also find potential customers who
are most likely to interact with a campaign and convert into leads.
Data Analytics Tools
Page 22
DATA ANALYTICS
Now that we looked at the different steps involved in data analytics, let’s see the tools involved in data
analytics, to perform the above steps. In this blog, we will discuss 7 data analytics tools, including a couple of
programming languages that can help you perform analytics better.
Fig: Data Analytics for Beginners - Tools used
1. Python: Python is an object-oriented open-source programming language. It supports a range of

libraries for data manipulation, data visualization, and data modeling.
2. R: R is an open-source programming language majorly used for numerical and statistical analysis. It
provides a range of libraries for data analysis and visualization.
3. Tableau: It is a simplified data visualization and analytics tool. This helps you create a variety of
visualizations to present the data interactively, build reports, and dashboards to showcase insights and
trends.
4. Power BI: Power BI is a business intelligence tool that has an easy ‘drag and drop functionality. It
supports multiple data sources with features that visually appeal to data. Power BI supports features that
help you ask questions to your data and get immediate insights.
5. QlikView: QlikView offers interactive analytics with in-memory storage technology to analyze vast
volumes of data and use data discoveries to support decision making. It provides social data discovery and
interactive guided analytics. It can manipulate colossal data sets instantly with accuracy.
6. Apache Spark: Apache Spark is an open-source data analytics engine that processes data in real-time
and carries out sophisticated analytics using SQL queries and machine learning algorithms.
7. SAS: SAS is a statistical analysis software that can help you perform analytics, visualize data, write SQL
queries, perform statistical analysis, and build machine learning models to make future predictions.
Page 23
DATA ANALYTICS
What is a data analytics environment?
An environmental data analyst analyzes key factual information about air, water, soil, ice, and more. These
crucial data analyses are then compiled and used in other scientific fields to create an overall picture of
pollution, climate change, public health, and more.
Benefits and Limitations of Data Analytics :
Data analytics is the process of examining and analysing datasets to draw conclusions about
the information they hold. The data analytics techniques help uncover the patterns from raw
data and derive valuable insights from it. Data analytics helps businesses get real-time
insights about sales, marketing, finance, product development, and more. It allows teams
within businesses to collaborate and achieve better results. It is useful for businesses to
analyse past business performance and optimize future business processes. Analytics helps
businesses gain a competitive advantage.
There are several advantages and limitations of data analytics and in this article, we look at
the top 5 benefits & limitations of data analytics. By being aware of them, organizations can
take actions to leverage the advantages and modify their way of working to overcome the
limitations.
Advantages
 Data analytics helps an organization make better decisions
Lot of times decisions within organizations are made more on gut feel rather than
facts and data. One of the reasons for this could be lack of access to quality data that
can help with better decision making. Analytics can help with transforming the data
that is available into valuable information for executives so that better decisions can
be made. This can be a source of competitive advantage if fewer poor decisions are
made since poor decisions can have a negative impact on a number of areas including
company growth and profitability.
Page 24
DATA ANALYTICS
 Increase the efficiency of the work

Analytics can help analyse large amounts of data quickly and display it in a formulated
manner to help achieve specific organizational goals. It encourages a culture of
efficiency and teamwork by allowing the managers to share the insights from the
analytics results to the employees. The gaps and improvement areas within a company
become evident and actions can be taken to increase the overall efficiency of the
workplace thereby increasing productivity.
 The analytics keeps you updated of your customer behavioural changes
In today’s world, customers have a lot of choices. If organizations are not tuned to
customer desires and expectations, they can soon find themselves in a downward
spiral. Customers tend to change their minds as they are continuously exposed to new
information in this era of digitization. With vast amount of customer data, it is
practically impossible for organizations to make senses of all the changes in customer
perception data without using the power of analytics. Analytics gives you insights into
how your target market thinks and if there is any change. Hence, being aware of shift
in customer behaviour can provide a decisive advantage to companies so that they can
react faster to the market changes.
 Personalization of products and services
Gone are the days where a company could sell a standard set of products and services
to customers. Customers crave products and services that can meet their individual
needs. Analytics can help companies keep track of what kind of service, product, or
content is preferred by the customer and then show the recommendations based on
their preferences. For example, in social media, we usually see what we like to see, all
of this is made possible due to the data collection and analytics that companies do.
Data analytics can help provide targeted services to customers based on their
individual requirements.
 Improving quality of products and services
Data analytics can help with enhancing the user experience by detecting and
correcting errors or avoiding non-value-added tasks. For example, self-learning
systems can use data to understand the way customers are interacting with the tools
and make appropriate changes to improve user experience. In addition, data analytics
can help with automated data cleansing and improving the quality of data and
consecutively benefiting both customers and organizations.
Limitations
Page 25
DATA ANALYTICS
 Lack of alignment within teams

There is a lack of alignment between different teams or departments within an
organization. Data analytics may be done by a select set of team members and the
analysis done may be shared with a limited set of executives. However, the insights
generated by these teams are either of not much value or are having limited impact
on organizational metrics. This could be due to a “silos” way of working with each
team only using their existing processes disconnected from other departments. The
analytics team should be focussed on answering the right questions for the business
and the results generated by data analytics teams needs to be properly communicated
to the right employees to drive the right set of actions and behaviours so that it can
have an positive impact on the organization.
 Lack of commitment and patience
Analytics solutions are not difficult to implement, however, they are costly, and the
ROI is not immediate. Especially, if existing data is not available, it may take time to
put processes and procedures in place to start collecting the data. By nature, the
analytics models improve accuracy over time and require dedication to implement the
solution. Since the business users do not see results immediately, they sometimes lose
interest which results in loss of trust and the models fail. When an organization
decides to implement data analytics methods, there needs to be a feedback loop and
mechanism in place to understand what is working and what is not, and corrective
actions are required to fix things that are broken. Without this closed loop system,
senior management may decide that analytics is not working or much valuable and
may abandon the entire exercise.
 Low quality of data
One of the biggest limitations of data analytics is lack of access to quality data. It is
possible that companies already have access to a lot of data, but the question is do
they have the right data that they need? A top down approach is required where the
business questions that need to be answered need to be known first and what data is
required to answer these questions can then be determined. In some cases, data may
have been collected for historical reasons may not be suitable to answer the questions
that we ask today. At other times, even though we have the right metrics that we are
collecting data on, the quality of the data collection may be poor. There can be
instances where adequate data is not available or is missing for proper analytics to be
done. As they say, garbage-in garbage-out. If the data quality is poor, the decision
made by using this data is also going to be poor. Hence, actions must be taken to fix
the quality of the data before it can be effectively used within organizations.
 Privacy concerns
Sometimes, data collection might breach the privacy of the customers as their
information such as purchases, online transactions, and subscriptions are available to
companies whose services they are using. Some companies might exchange those
datasets with other companies for mutual benefit. Certain data collected can also be
used against a person, country, or community. Organizations need to be cautious of
Page 26
DATA ANALYTICS
what sort of data they are collecting from customers and ensure the security and
confidentiality of the data. Only the data required for the analysis needs to be
captured and if there is sensitive data, it needs to be anonymized so that sensitive
data is protected. Data breaches can cause customers to lose trust in the organizations
which may result in a negative impact on the organization.
 Complexity & Bias
Some of the analytics tools developed by companies are more like a black box model.
What is inside the black box is not clear or the logic the system uses to learn from data
and create a model is not readily evident. For example, a neural network model that
learns from various scenarios to decide who should be given a loan and who should be
rejected. The usage of these tools may be easy but the logic of how decisions are
made is not clear to anyone within the company. If companies are not careful and a
poor quality data set is used to train the model, there may be hidden biases in the
decisions made by these systems which may not be readily evident and organizations
may be breaking the law by discriminating against race, gender, sex, age etc.
Data Analytics Life cycle

Introduction
Data Analytics Life Cycle in Big Data and Data Science. In this guide, we’ll have a Data Analytic Lifecycle
overview, learn why it’s essential, know in detail about different phases of the Data Analytics Life Cycle,
and finally go through a Data Analytics lifecycle example.
1. What Is Data Analytics Lifecycle?
In today’s digital-first world, data is of immense importance. It undergoes various stages throughout its life,
during its creation, testing, processing, consumption, and reuse. Data Analytics Lifecycle maps out these
stages for professionals working on data analytics projects. These phases are arranged in a circular structure
that forms a Data Analytics Lifecycle. Each step has its significance and characteristics.
Why is Data Analytics Lifecycle Essential?
Page 27
DATA ANALYTICS
The Data Analytics Lifecycle is designed to be used with significant big data projects. It is used to portray the
actual project correctly; the cycle is iterative. A step-by-step technique is needed to arrange the actions and
tasks involved in gathering, processing, analyzing, and reusing data to explore the various needs for
assessing the information on big data. Data analysis is modifying, processing, and cleaning raw data
to obtain useful, significant information that supports business decision-making.
2. Importance of Data Analytics Lifecycle
Data Analytics Lifecycle defines the roadmap of how data is generated, collected, processed, used, and
analyzed to achieve business goals. It offers a systematic way to manage data for converting it into
information that can be used to fulfill organizational and project goals. The process provides the direction
and methods to extract information from the data and proceed in the right direction to accomplish business
goals.
Data professionals use the lifecycle’s circular form to proceed with data analytics in either a forward or
backward direction. Based on the newly received insights, they can decide whether to proceed with their
existing research or scrap it and redo the complete analysis. The Data Analytics lifecycle guides them
throughout this process.
3. Data Analytics Lifecycle Phases
There’s no defined structure of the phases in the life cycle of Data Analytics; thus, there may not be
uniformity in these steps. There can be some data professionals that follow additional steps, while there
may be some who skip some stages altogether or work on different phases simultaneously. Let us discuss
the various phases of the data analytics life cycle.
This guide talks about the fundamental phases of each data analytics process. Hence, they are more likely to
be present in most data analytics projects’ lifecycles. The Data Analytics lifecycle primarily consists of 6
phases.
Phase 1: Data Discovery and Formation
This phase is all about defining the data’s purpose and how to achieve it by the end of the data analytics
lifecycle. The stage consists of identifying critical objectives a business is trying to discover by mapping out
the data. During this process, the team learns about the business domain and checks whether the business
unit or organization has worked on similar projects to refer to any learnings.
The team also evaluates technology, people, data, and time in this phase. For example, the team can use
Excel while dealing with a small dataset. However, heftier tasks demand more rigid tools for data
preparation and exploration. The team will need to use Python, R, Tableau Desktop or Tableau Prep, and
other data-cleaning tools in such scenarios.
This phase’s critical activities include framing the business problem, formulating initial hypotheses to test,
and beginning data learning.
Phase 2: Data Preparation and Processing
Page 28
DATA ANALYTICS
In this phase, the experts’ focus shifts from business requirements to information requirements. One of the
essential aspects of this phase is ensuring data availability for processing. The stage encompasses collecting,
processing, and cleansing the accumulated data.
During this phase’s initial stage, the team gathers valuable information and proceeds with the business
ecosystem’s lifecycle. Various data collection methods are used for this purpose, such as
o Data Entry – Collecting recent data using manual data entry techniques or digital systems within the
organization
o Data Acquisition – Gathering data from external sources
o Signal Reception – Capturing data from digital devices, including the Internet of Things and control
systems.
Phase 3: Design a Model
This phase needs the availability of an analytic sandbox for the team to work with data and perform
analytics throughout the project duration. The team can load data in several ways.
o Extract, Transform, Load (ETL) – It transforms the data based on a set of business rules before loading it
into the sandbox.
o Extract, Load, Transform (ELT) – It loads the data into the sandbox and then transforms it based on a set
of business rules.
o Extract, Transform, Load, Transform (ETLT) – It’s the combination of ETL and ELT and has two
transformation levels.
The team identifies variables for categorizing data, and identifies and amends data errors. Data errors can
be anything, including missing data, illogical values, duplicates, and spelling errors. For example, the team
imputes the average data score for categories for missing values. It enables more efficient data processing
without skewing the data.
After cleaning the data, the team determines the techniques, methods, and workflow for building a model in
the next phase. The team explores the data, identifies relations between data points to select the key
variables, and eventually devises a suitable model.
Page 29
DATA ANALYTICS
Phase 4: Model Building
The team develops testing, training, and production datasets in this phase. Further, the team builds and
executes models meticulously as planned during the model planning phase. They test data and try to find
out answers to the given objectives. They use various statistical modeling methods such as regression
techniques, decision trees, random forest modeling, and neural networks and perform a trial run to
determine whether it corresponds to the datasets.
Phase 5: Result Communication and Publication
This phase aims to determine whether the project results are a success or failure and start collaborating
with significant stakeholders. The team identifies the vital findings of their analysis, measures the associated
business value, and creates a summarized narrative to convey the stakeholders’ results.
Phase 6: Measuring of Effectiveness
In this final phase, the team presents an in-depth report with coding, briefing, key findings, and technical
documents and papers to the stakeholders. Besides this, the data is moved to a live environment and
monitored to measure the analysis’s effectiveness. If the findings are in line with the objective, the results
and reports are finalized. On the other hand, if they deviate from the set intent, the team moves backward
in the lifecycle to any previous phase to change the input and get a different outcome.
4. Data Analytics Lifecycle Example
Consider an example of a retail store chain that wants to optimize its products’ prices to boost its revenue.
The store chain has thousands of products over hundreds of outlets, making it a highly complex scenario.
Once you identify the store chain’s objective, you find the data you need, prepare it, and go through the
Data Analytics lifecycle process.
You observe different types of customers, such as ordinary customers and customers like contractors who
buy in bulk. According to you, treating various types of customers differently can give you the solution.
However, you don’t have enough information about it and need to discuss this with the client team.
In this case, you need to get the definition, find data, and conduct hypothesis testing to check whether
various customer types impact the model results and get the right output. Once you are convinced with the
model results, you can deploy the model, and integrate it into the business, and you are all set to deploy the
prices you think are the most optimal across the outlets of the store.
Three beginner-friendly Data Analytics projects for students
Internet web show Database
The IMDb data extraction project is a great one for beginners. You can compile information about well-liked
TV series, movie reviews and trivia, various stars’ heights and weights, and more. The fact that IMDb’s data
is presented consistently across all of its sites makes the work much simpler.
Job portals
Page 30
DATA ANALYTICS
This is one of the best Data Analytics projects for students. Job portals often provide standard data types,
and many beginners like scraping data from them. There are also a lot of online tutorials that will walk you
through the process. Compile information on the jobs, employers, paychecks, locations, necessary skills, and
other information. The potential for later visualization is enormous. Such as plotting skillsets against
paychecks.
E-commerce sites
Another common method is to scrape information about products and prices from online stores. Extract
product details for Bluetooth speakers, or gather ratings and costs for different computers and tablets. Once
more, this is scalable and relatively easy to implement. This implies that once you feel confident utilizing the
algorithms, you can move on to a product with higher feedback.
Conclusion
The Data Analytics lifecycle’s circular process consists of 6 primary stages that dictate how information is
created, collected, processed, used, and analyzed. Mapping out business objectives and striving towards
achieving them will guide you through the rest of the stages. If you are interested in learning more about
Data Analytics and using the same for effective HR implementations, then do check out our 3-month
robust People Analytics & Digital HR Program!
Data Analytics Applications
Fig: Various applications of data analytics
Data analytics is used in almost every sector of business, let’s discuss a few of them:
1. Retail: Data analytics helps retailers understand their customer needs and buying habits to predict trends,
recommend new products, and boost their business.
Page 31
DATA ANALYTICS
They optimize the supply chain, and retail operations at every step of the customer journey.
2. Healthcare: Healthcare industries analyze patient data to provide lifesaving diagnoses and treatment
options. Data analytics help in discovering new drug development methods as well.
3. Manufacturing: Using data analytics, manufacturing sectors can discover new cost-saving opportunities.
They can solve complex supply chain issues, labor constraints, and equipment breakdowns.
4. Banking sector: Banking and financial institutions use analytics to find out probable loan defaulters and
customer churn out rate. It also helps in detecting fraudulent transactions immediately.
5. Logistics: Logistics companies use data analytics to develop new business models and optimize routes.
This, in turn, ensures that the delivery reaches on time in a cost-efficient manner.
Those were a few of the applications involving data analytics. To make things simpler, this blog will also
focus on a case study from Walmart. Here you can observe how data analytics is applied to grow a business
and serve its customers better.
Applications of Business
What is Business Analytics?

Every single thing in the contemporary world is data-driven. In all companies, whether small-scale startups
or multinational organizations, information is the most vital element. But what is information without
analysis? Nothing!
Analyzing the data to map out relevant trends and information is equally and extremely important. There
are tons of raw data present in the world, but it amounts to nothing if we cannot make sense of it. That’s
where Business Analytics comes into play.
The process of utilizing statistical tools and procedures to analyze and examine data relevant to businesses is
known as business analytics. It mainly makes use of the following methods:
Page 32
DATA ANALYTICS
 Analytical Modelling
 Predictive Analysis
 Numerical Analysis
The main steps in the Process of Business Analytics are to understand the data, structure the problem
statement, come up with various strategies using statistical models, and then organize favorable ideas to
reach an optimal solution.
With the use of business analytics, organizations may increase their productivity and efficiency by gaining
insightful information and taking strategic business decisions.
Advantages of Business Analytics?
Business Analytics provides an in-depth knowledge of the organization’s data. This in turn helps in
understanding the present circumstances as well as in predicting future events and trends.
Some major benefits of Business Analytics are:
 Improving customer service.

 Understanding the data better.
 Improving the organization’s capability to prevent or predict fraud.
 Provides a competitive edge.
Applications of Business Analytics?
Page 33
DATA ANALYTICS
The world has witnessed significant changes and tremendous growth in organizations that have adopted the
concepts and principles of business analytics. Business analytics can be applied to a wide range of industries.
The agriculture industry, medical industry, manufacturing and development industry, human resources,
finance industry, and numerous other fields use business analytics to help businesses grow and keep their
audience happy.
Here are some examples of business analytics to show how it is used in different industries.
 FINANCE
o Business Analytics assists financial managers in managing their finances optimally and then
taking relevant measures. Implementing business analytics in various sectors of finance(such
as investment banking and budgeting) can prove to be highly fruitful for the finance industry.
o It helps in building future strategies for a new product by observing similar products and
methodologies.
o In addition to this, business analytics can also be used to predict future loan defaulters.
 HUMAN RESOURCES MANAGEMENT (HRM)

o Human Resource Management is the process or practice of managing, hiring, organizing,
training, and directing people in an organization in a strategic manner. Human Resources (or
HR) professionals use business analytics in several ways.
o It helps them in analyzing large amounts of data to understand employees’ needs and
grievances and therefore assist them accordingly.
o Business analytics can be used by HR in determining the right candidates, the expected
salaries as well as the trending retention rates in the industries.
o Moreover, HR professionals can leverage business analytics to forecast the trajectory of the
organization and thus efficiently design appropriate training and development programs for
trainees or employees.
Page 34
DATA ANALYTICS
 PRODUCTION AND INVENTORY MANAGEMENT

o Management is a key element in every organization. It aims to enhance the profits and
productivity of an organization all the while trying to reduce overall costs.
o Business Analytics serves as a great tool for management and manufacturing. It is involved in
every phase of product development. It supports analyzing the inventory measures and
designing business solutions that are most suitable for products.
o It can help determine the costs and gauge the expected sales of products. This way the
organizations can adapt to the latest styles and opportunities in the industry.
o Hence, business analytics stands as a boon for the diverse sectors of management, be it
inventory management or product management.
 CUSTOMER RELATIONSHIP MANAGEMENT (CRM)

o Customer Relationship Management or CRM is the process of building and managing the
organization’s relationships as well as interactions with customers.
o Business analytics can be used in customer relationship management to understand the
customer base better and therefore, implement corresponding strategies. This helps
significantly drive sales and amplifies the organization’s profits.
o Customers’ purchasing patterns, needs, buying behaviors, issues, feedback, and all the other
indicators can be obtained and analyzed through business analytics methodologies. These
indicators can then be used to foster long-lasting and loyal relationships between clients and
the organization.
 MARKETING
o Marketing, when combined with business analytics can prove to be one of the best strategies
an organization can implement.
o Business analytics helps the organization to know its users, their needs, behaviors, and
purchasing styles to design and modify suitable plans and schemes.
o Sales can be optimized and user experience can be enhanced. Business analytics can help
marketers know their target audience and their interest.
o It can also be used to evaluate and determine how well a product or a marketing strategy is
performing in the market. Considering these factors, organizations can modify their strategies
and implement better planning.
Types of Business Analytics
Page 35
DATA ANALYTICS
 Descriptive Analytics
Descriptive analytics uses the existing data of an organization to understand what events have occurred in
the past, analyze current situations and follow the trajectory. It helps the organization perform the SWOT
analysis, thereby improving its performance. This further enhances the accessibility of relevant information
and patterns for the stakeholders and managers.
Descriptive analytics predominantly answers the question: What has happened in the past?
 Diagnostic Analytics
Diagnostic analytics is used to distinguish which factors influence or contribute to current events and
product performance in the market. If an organization loses its sales over a period of time, diagnostics
analytics can help them analyze the reasons behind the loss and prevent such situations in the future.
Similarly, the organization can easily and efficiently track the reasons and factors contributing to the success
of a particular product and implement the same strategies in the future for other campaigns.
 Predictive Analytics
Predictive analytics is used to forecast the possibility of upcoming or future events. It helps the organization
develop new statistical models and design better techniques based on the information gathered during the
descriptive analytics. It detects future industry trends along with their outcomes.
 Prescriptive Analytics
Prescriptive analytics suggests all the favorable actions along with their corresponding outcomes based on
specific events. It helps to analyze all the information gathered and then provides the best solution that
should be pursued. It also recommends a specific action plan to be taken to reach the desired results.
Page 36
DATA ANALYTICS
Data Analytics Fundamentals
Page 37

DATA ANALYTICS Syllabus 3 Units

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DATA ANALYTICS Syllabus 3 Units

Uploaded by

Copyright:

Available Formats

DATA ANALYTICS

UNIT - II : Data Analytics : Introduction to analytics,Introduction to tools and

UNIT - III : Data Visualizations : Pixel - Oriented Visualization techniques,Geometric

1. What Is Data Management?

Design data Architecture and manage the data for analysis :

2. Understanding various sources of data like Sensors/Signals/GPS etc.

What are the sources of data?

Types of data sources

2. Census data sources

Additional sources of data

1. Internal sources of data

2. External sources of data

1. Experimental sources of data

 CRD – Completely randomized design - A Completely Randomized Design is a simple

3 . Data Management ,Data Quality ( Noise , outliers,Missing Values , Duplicate

Examples of data quality problems:

 Noise and outliers

The first two are considered in more detail below.

For attributes, noise refers to modification of original values.

Case 1: Outliers are noise that interferes with data analysis

Case 2: Recognizing outliers can be the goal of our analysis

 Credit card fraud

Causes for case 1?

Missing Data Handling

 Discard records with missing values

Missing completely at random (MCAR)

 Missingness of a value is independent of attributes

Missing at Random (MAR)

 Missingness is related to other variables

Missing Not at Random (MNAR)

 Missingness is related to unobserved measurements

 Data may not been collected for mining purposes

A major issue when merging data from multiple, heterogeneous sources

 Examples: Same person with multiple email addresses

When should duplicate data not be removed?

Dimensionality Reduction - identify "important" variables

Feature subset selection - remove redundant or irrelevant attributes

Discretization and Binarization

4 . Data processing and processing :

Six stages of data processing

Ways to Use Data Analytics:

Data Analytics Tools

Fig: Data Analytics for Beginners - Tools used

1. Python: Python is an object-oriented open-source programming language. It supports a range of

What is a data analytics environment?

Benefits and Limitations of Data Analytics :

 Increase the efficiency of the work

 Lack of alignment within teams

Data Analytics Life cycle

1. What Is Data Analytics Lifecycle?

Why is Data Analytics Lifecycle Essential?

2. Importance of Data Analytics Lifecycle

3. Data Analytics Lifecycle Phases

Phase 1: Data Discovery and Formation

Phase 2: Data Preparation and Processing

o Data Acquisition – Gathering data from external sources

Phase 3: Design a Model

Phase 4: Model Building

Phase 5: Result Communication and Publication

Phase 6: Measuring of Effectiveness

4. Data Analytics Lifecycle Example

Three beginner-friendly Data Analytics projects for students

Internet web show Database

Data Analytics Applications