You are on page 1of 51

Maharana Pratap Group of Institutions, Mandhana, Kanpur

(Approved By AICTE, New Delhi And Affiliated To AKTU, Luck now)

Digital Notes
[Department of Computer Science &

Branch : Computer Science
Semester :V
Subject Name : Data Analytics
Subject Code : KCS-051
Lecture No. /Topic : Introduction to Data Analytics

Prepared by : Anand Prakash Dwivedi

yright, Confidential, Maharana Pratap Group
In computing, data is information that has been translated into a form that is
efficient for movement or processing.Data can exist in a variety of forms as
numbers or text on pieces of paper, as bits and bytes stored in electronic memory,
or as facts stored in a person's mind.

Analytics is the discovery, interpretation, and communication of meaningful patterns
in data and applying those patterns towards effective decision making .Analytics is
an encompassing and multidimensional field that uses mathematics, statistics,
predictive modeling and machine learning techniques to find meaningful patterns
and knowledge in recorded data.
What is DATA analytics?

•Data analysis is a process of inspecting, cleansing, transforming, and

modeling data.
Data analytics refers to qualitative and quantitative techniques and processes
used to enhance productivity and business gain

Why Data Analytics

Data Analytics is needed in Business to Consumer applications (B2C).
Organisations collect data that they have gathered from customers,
businesses, economy and practical experience. Data is then
processed after gathering and is categorised as per the requirement
and analysis is done to study purchase patterns and etc.
Types of Analytics
Types of Data Analytics
The main goal of big data analytics is to help organizations make smarter
decisions for better business outcomes.
With data in hand, you can begin doing analytics.
• But where do you begin?
• And which type of analytics is most appropriate for your big data
Looking at all the analytic options can be a daunting task. However, luckily these
analytic options can be categorized at a high level into three distinct types.

➢ Descriptive Analytics,
➢ Predictive Analytics,
➢ Prescriptive Analytics
Descriptive Analytics
• Descriptive Analytics, which use data aggregation and data mining to provide
insight into the past and answer:
– “What has happened in the business?”

• Descriptive analysis or statistics does exactly what the name implies they
“Describe”, or summarize raw data and make it something that is interpretable by

• The past refers to any point of time that an event has occurred, whether it is
one minute ago, or one year ago.

• Descriptive analytics are useful because they allow us to learn from past
behaviors, and understand how they might influence future outcomes.

• The main objective of descriptive analytics is to find out the reasons behind precious
success or failure in the past.

• The vast majority of the statistics we use fall into this category.
• Common examples of descriptive analytics are reports that provide
historical insights regarding the company’s production, financials,
operations, sales, finance, inventory and customers.

Predictive Analytics
• Predictive Analytics, which use statistical models and forecasts
techniques to understand the future and answer:
– “What could happen?”
• These analytics are about understanding the future.
• Predictive analytics provide estimates about the likelihood of a future
outcome. It is important to remember that no statistical algorithm
can “predict” the future with 100% certainty.
• Companies use these statistics to forecast what might happen in
the future. This is because the foundation of predictive analytics is
based on probabilities.
• These statistics try to take the data that you have, and fill in the missing data
with best guesses
Predictive analytics can be further categorized as –

• Predictive Modelling –What will happen next, if ?

• Root Cause Analysis-Why this actually happened?
• Data Mining- Identifying correlated data.
• Forecasting- What if the existing trends continue?
• Monte-Carlo Simulation – What could happen?
• Pattern Identification and Alerts –When should an action be invoked to
correct a process.
Sentiment analysis is the most common kind of predictive analytics. The
learning model takes input in the form of plain text and the output of the
model is a sentiment score that helps determine whether the sentiment is
positive, negative or neutral.
Prescriptive Analytics
• Prescriptive Analytics, which use optimization and simulation
algorithms to advice on possible outcomes and answer:
• – “What should we do?”
• The relatively new field of prescriptive analytics allows users to “prescribe”
a number of different possible actions to and guide them towards a solution. In
a nut-shell, these analytics are all about providing advice.
• Prescriptive analytics is the next step of predictive analytics that adds the spice of
manipulating the future.
• Prescriptive analytics is an advanced analytics concept based on,
– Optimization that helps achieve the best outcomes.
– Stochastic optimization that helps understand how to achieve the best
outcome and identify data uncertainties to make better decisions.
• Prescriptive analytics is a combination of data, mathematical models and
various business rules. The data for prescriptive analytics can be both internal
(within the organization) and external (like social media data).
• Prescriptive analytics can be used in healthcare to enhance drug development,
finding the right patients for clinical trials, etc
The process of Data Analysis
Analysis refers to breaking a whole into its separate components for
individual examination. Data analysis is a process for obtaining raw
data and converting it into information useful for decision-making by users.
There are several phases that can be distinguished :Data
requirements,Data collection ,Data processing ,Data cleaning, Exploratory
data analysis,Modeling and algorithms , Data product ,Communication
Scope of Data Analytics
Bright future of data analytics, many professionals and students are interested
in a career in data analytics. Any person who likes to work on numbers, has a
logical thinking, can understand figures and can turn them into actionable
insights, has a good future in this field. A proper training of the tools of data
analytics would be required to begin with. Since it is a course that requires
effort to learn and get certified, there is always dearth of qualified rofessionals.
Being a relatively new field also, the demand for such professionals is more
than the current supply. Higher demand also means higher salaries
Importance Data Analytics
● Predict customer trends and behaviours
● Analyse, interpret and deliver data in meaningful ways
● Increase business productivity
● Drive effective decision-making
Hiding within those mounds of data is knowledge
that could change the life of a patient or change the world.

once could conclude with its big data expertise, to raise warning for flu in the U.S.
by analyzing the queries having a ‘flu theme’ well before the conventional public health
Business Analytics
Business Analytics involves business planning / making business insights/
arriving at solutions for business problems using the information and statistics
from relevant/ associated data sources by applying different tools and

Business Analytics [4]

•The data associated with an analytics problem can be from social networks
(Facebook, Twitter), relevant databases, spreadsheets. These data are identified,
gathered and organized. Then it is subjected to analysis using tools and techniques.

•The tools and techniques can be statistical models or machine learning concepts etc.
The tools and techniques involved are for descriptive analytics, predictive analytics,
discovery analytics and/or prescriptive analytics. These analytics are for generating
the statistics and other information that shall eventually lead to relevant solutions.
Applications of Business Analytics

Personalized marketing
Many shopping companies use Big Data Analytics for personalized marketing to make their
customers happy
Mobile Advertising [6]
The Big Data analytic engine of a shopping company knows the personalized
needs of its customers from shopping history. When offers come up on the products of
their interest in a particular place where the customer is around, he/she gets informed
over their mobile phones.
The Big Data source associated with customer’s geographical position is also used here
Data Analytics-Advantages in Manufacturing Industry
Big Data Analytics will always improve the functioning of any associated organization or
For example, In manufacturing Unit, Data Analytics can improve the following
• Procurement- to find efficient and cost-effective suppliers.

• Product Development- make innovative design based on demand.

• Manufacturing-to find problems that can come up in the

machines which will affect the quality of the product.

• Distrbution- to enhance supply and increase inventory based on

demand such as weather, holidays, economy etc.

• Marketing- understanding customer behaviors for

personalized marketing

• Prices management- Manage prices based on

related conditions.
Medical Analytics: is used for
It involves analyzing his/ her genetic data, environment, day to day
i) Precision Medicine- activities,
to predict the health problems so that prescriptions can
be followed to prevent diseases.
In case a disease is diagnosed, it provides personalized
treatment targeting a individual to provide the correct
composition of medicine in the correct doze.
Precision Medicine-[19] Data sources to be analyzed:[12]
1. Sensor data- from the digital bio-medical equipments, Fitness
2. Organizational data- data from bio technology database
maintained by public organizations like the National
Centre for Biotechnology Information, the knowledge
databases-Gene ontology, Unified Medical Language
National Center for Biotechnology
Information-[12] System.
3. People Data- self reported data from Health apps which record various human body
measurements like blood sugar, pressure, heart rate, oxygen saturation level etc. or from
social networks help identify actual changes that has happened some time in the body.
These data sources when integrated can increase the life time and well being of a human being.
Data Analytics on Bio-informatics data
The very huge Genomic data are analyzed for personalized treatment,
personalized/ precision medicines, better health profiles of many genetic
diseases like Diabetes, Arthritis, breast Cancer, Heart Diseases etc by analyzing
all the big data that constitutes the components of disease like genomic data,
metabolites, tissues, ecosystems etc.
Millions of genomes of the order of many exabytes are to sequenced for the
In fact, prediction of the probable attack of a particular hereditary disease is
under proposal.
Smart Data Analytics[12]

The very huge Smart data from the sensor networks of Smart
projects viz… smart cities, smart homes etc. are analyzed for pollution
control, security by preventing from thefts, homicide, energy
conservation, traffic maintanance, disaster management and many more.

Smart city signals to Sensor network [13]

Data Analytics on Spatial Data

The very huge Spatial data from GPS, Radar, Lidar, Aerial data are used for
identifying, visualizing and analyzing patterns of an area with specific condition or
characteristic for:
•Tracking movements of vehicles between destinations,
•Public Safety,
•Emergency management,
•Climate analysis etc.

Eg: Economic analysis is done based on the different attributes in the vehicle
movement patterns: taxi id, distance travelled, fare etc...

Data Analytics on Geo-spatial Data

Big Data Analytics-Possibilities in Financial Services
In Finance, Data Analytics can improve the following aspects:
•Credit Scoring- to find people with highest credit worthiness.

Credit Score [10] Fraud detection [11]

•Fraud Detection- to predict fraudulent transactions and customers in the
financial services industry and to formulate strategies to prevent or minimize
damages from it [11].

•Claims analysis-to find positive and negative

factors that affect the claim features [12], [13].

Fraud detection [12]

What’s driving Big Data to Analytics

- Optimizations and predictive analytics

- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time

- Ad-hoc querying and reporting

- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets

Structuring Big Data
• In simple terms, is arranging the available data in a manner such that it
becomes easy to study, analyze, and derive conclusion format.
• Why is structuring required?

In our daily life, you may have come across questions like,

‒ How do I use to my advantage the vast amount of data and information I

come accross?
‒ Which news articles should I read of the thousands I come accross?
‒ How do I choose a book of the millions available on my favourate sites or
‒ How do I keep myself updated about new events, sports, inventions, and
discoveries taking place across the globe?
Types of Data
• Data that comes from multiple sources, such as databases, ERP
systems, weblogs, chat history, and GPS maps so varies in format. But
primarily data is obtained from following types of data sources.
• Internal Sources : Organisational or enterprise data
– C R M , E R P, O LT P, p r o d u c t s a n d s a l e s d a t a . . . . . . .
(Structured data)
• External sources: Social Data
• Business partners, Internet, Government, Data supliers.............
(Unstructured or unorganised data)
• On the basis of the data received from the
source mentioned, big data is comprises;
–Structure Data
–Unstructured Data
–Semi-structured Data

BIG DATA = Structure Data + Unstructure Data + Semi-structure Data

Structure Data
• It can be d ef i n e d as the data that has a defined repeating pattern.
• This pattern makes it easier for any program to sort, read, and process the data.
• Processing structured data is much faster and easier than p ro ce ssin g
data without any specific repeating pattern.
• Is organised data in a prescribed format.
• Is stored in tabular form.
• Is the data that resides in fixed fields within a record or file.
• Is formatted data that has enities and their attributes are properly
• Is used in query and report against predetermined data types.
• Sources: DBMS/RDBMS, Flat files, Multidimensional databases, Legacy
Structure Data
Unstructured Data
• It is a set of data that might or might not have any logical or repeating patterns.

• Ty p i c a l l y o f m e t a d a t a ,i . e ,t h e a d d i t i o n a l information related to data.

• Inconsistent data (files, social media websites, satalities, etc.)
• Data in different format (e-mails, text, audio, video or images.
• Sources: Social media, Mobile Data, Text both internal & external to an
Semi-Structure Data
• Having a schema-less or self-describing structure, refers to a form of structured
data that contains tags or markup element in order to separate elements and
generate hierarchies of records and fields in the given data.

• In other words, data is stored inconsistently in rows and columns of a database.

• Sources: File systems such as Web data in the form of cookies, Data exchange
BigData Challenges & Characteristics

What is Big Data

Volume: The size of the Data
Terrabytes to 10s of petabytes
What is not Big Data A few gigabytes
Wikipedia corpus with history ca. 10 TByte
Wikimedia commons ca. 23 TByte
Google search index ca. 46 Gigawebpages2
YouTube per year 76 PByte (20123)
Velocity: Data Volume per Time

What is Big Data

30 KiB to 30 GiB per second (902 GiB/year to 902

What is not Big Data

A never changing data set

LHC (Cern) with all experiments about 25 GB/s 4
Square Kilometre Array 700 TB/s (in 2018) 5
50k Google searches per s 6
Facebook 30 Billion content pieces shared per month 7

Data Sources
Enterprise data
Serves business objectives, well defined Customer information
Transactions, e.g. Purchases

Experimental/Observational data (EOD)

Created by machines from sensors/devices Trading systems, satellites
Microscopes, video streams, Smart meters

Social media
Created by humans Messages, posts, blogs, Wikis
Veracity: Trustworthiness of Data

What is Big Data

Data involves some uncertainty and ambiguities Mistakes can be
introduced by humans and machines
People sharing accounts
Like sth. today, dislike it tomorrorw Wrong system timestamps

Data Quality is vital!

Analytics and conclusions rely on good data quality
Garbage data + perfect model => garbage results Perfect data + garbage
model => garbage results
GIGO paradigm: Garbage In – Garbage Out
Value of Data
What is Big Data
Raw data of Big Data is of low value
For example, single observations
Analytics and theory about the data increases the value
Analytics transform big data into smart data!

Types of Data Analytics and Value of Data

1 Descriptive analytics (Beschreiben)

“What happened?”
2 Diagnostic analytics
“Why did this happen, what went wrong?”
3 Predictive analytics (Vorhersagen)
“What will happen?”
4 Prescriptive analytics (Empfehlen)
“What should we do and why?”
The level of insight and value of data increases from step 1 to 4
The Value of Data (alternative view)
Big Data Analytics Value Chain

There are many visualizations of the processing and value chain [8]
Reporting vs.Analytics
Reporting Analytics
▪ Lists ▪ Crosstabs, pivot tables
▪ Invoices, Orders ▪ Slice and Dice e.g. “this bythat”
▪ Information from a specificpoint ▪ Key Performance Indicators
in time ▪ Analysis of trends OVERtime
▪ What is in your Salesforce ▪ How does your Salesforce data CHANGE over
database right now? ▪ Complex datarelationships
▪ Simple data relationships ▪ How does the application of Chatter
▪ Which Opportunities did wewin? affect my win rate on Opportunities?
▪ How many Customers bought specific
▪ Which Customers bought shoes? brands of shoes in each region this year,
compared to lastyear?
❑ Phase 1: Discovery
 Learn the business domain, including relevant history, such as whether the
organization or business unit has attempted similar projects in the past, from
which you can learn.
 Assess the resources you will have to support the project, in terms of people,
technology, time, and data.
 Frame the business problem as an analytic challenge that can be addressed
in subsequent phases.
 Formulate Initial hypotheses (IH) to test and begin learning the data.
❑ Phase 2: Data Preparation
 Prepare an analytic sandbox, in which you can Perform ELT and ETL to get data into
the sandbox, and begin transforming the data so you can work with it and analyze
 Familiarize yourself with the data thoroughly and take steps to condition the data.
❑ Phase 3: Model Planning
 Determine the methods, techniques and workflow you intend to follow for the
model scoring.
 Explore the data to learn about the relationships between variables, and
subsequently select key variables and the models you are likely to use.
❑ Phase 4: Model Building
 Develop data sets for testing, training, and production purposes.
 Get the best environment you can for executing models and workflows, including
fast hardware and parallel processing.
❑ Phase 5: Communicate Results
 Determine if you succeeded or failed, based on the criteria you developed in the
Discovery phase, in collaboration with your stakeholders.
 Identify your key findings, quantify the business value and develop a narrative to
summarize your findings and convey to stakeholders
❑ Phase 6: Operationalize

 Deliver final reports, briefings, code, and technical documents.

 Run a pilot project, and implement your models in a production environment.

 It is critical to ensure that once you have run the models and produced findings, you
frame these results in a way that is appropriate for the audience that engaged you
for the work in a manner that demonstrates clear value.
 If you perform a technically accurate analysis, but cannot translate the results into a
language that speaks to the audience, people will not see the value and much of
your time will have been wasted.

You might also like