Data Analytics
MSBTE K SCHEME
BY STUDY TECH
CHAPTER NO 1
• Unit - I Introduction to Data Analytics
1.1 Data Analytics: An Overview, Importance of Data Analytics
1.2 Types of Data Analytics: Descriptive Analysis, Diagnostic Analysis,
Predictive Analysis, Prescriptive Analysis, Visual Analytics
1.3 Life cycle of Data Analytics, Quality and Quantity of data,
Measurement
1.4 Data Types, Measure of central tendency, Measures of dispersion
1.5 Sampling Funnel, Central Limit Theorem, Confidence Interval,
Sampling Variation
1.1 Data Analytics: An Overview,
Importance of Data Analytics
• Data Analytics is the process of collecting, organizing and studying
data to find useful information understand what’s happening and
make better decisions.
• In simple words it helps people and businesses learn from data like
what worked in the past, what is happening now and what might
happen in the future.
• Data analytics is used in many fields like banking, farming, shopping,
government and more. It helps in many ways:
• Helps in Decision Making: It gives clear facts and patterns from data
which help people make smarter choices.
IMPORTANCE OF DATA ANALYTICS
• Helps in Problem Solving: It points out what's going wrong and why
making it easier to fix problems.
• Helps Identify Opportunities: It shows trends and new chances for
growth that might not be obvious.
• Improved Efficiency: It helps reduce waste, saves time and makes work
smoother by finding better ways to do things.
• Process of Data Analytics
• Data analysts, data scientists and data engineers together create data
pipelines which helps to set up the model and do further analysis. Data
Analytics can be done in the following steps which are mentioned below:
• Data Collection : Data collection is the first step where raw information is
gathered from different places like websites, apps, surveys or machines.
DATA ANALYTICS PROCESS
PROCESS EXPLANATION
• Data Collection : Data collection is the first step where raw
information is gathered from different places like websites, apps,
surveys or machines.
Data Cleansing : Once the data is collected it usually contains
mistakes like wrong entries, missing values or repeated rows
• Data Analysis and Data Interpretation: After cleaning the data is
studied using tools like Excel, Python, R or SQL
• Data Visualization: Data visualization is the process of creating visual
representation of data using the plots, charts and graphs which helps
to analyze the patterns
1.2 Types of Data Analytics: Descriptive Analysis,
Diagnostic Analysis, Predictive Analysis, Prescriptive
Analysis, Visual Analytics
Descriptive Analysis, Diagnostic Analysis,
Predictive Analysis, Prescriptive Analysis,
Visual Analytics
• Descriptive Data Analytics : Descriptive data analytics helps to summarize and understand
past data. It shows what has happened by using tables, charts and averages. Companies
use it to compare results, find strengths and weaknesses and spot any unusual patterns.
• Diagnostic Data Analytics: Diagnostic data analytics looks at why something happened in
the past. It uses tools like correlation, regression or comparison to find the cause of a
problem. This helps companies understand the reason behind a drop in sales or a sudden
change in performance.
• Predictive Data Analytics: Predictive data analytics is used to guess what might happen in
the future. It looks at current and past data to find patterns and make forecasts.
Businesses use it to predict things like customer behavior, future sales or possible risks.
• Prescriptive Data Analytics: Prescriptive data analytics helps to choose the best action or
solution. It looks at different options and suggests what should be done next. Companies
use it for things like loan approval, pricing decisions and managing machines or schedules.
• Visual Analytics
• It means studying data using pictures, charts, and graphs instead of
just numbers or text.
• The goal is to make data easy to see and understand.
• We use tools like bar charts, pie charts, line graphs, heat maps,
dashboards etc.
Life Cycle of Data Analytics
• Life Cycle of Data Analytics
• This is the step-by-step process of doing data analytics.
• Data Collection → Gather data (from surveys, sensors, databases, etc.).
• Data Cleaning → Remove wrong, missing, or duplicate data.
• Data Storage → Keep data in databases or files.
• Data Analysis → Study data using methods (qualitative, quantitative, or
visual).
• Data Visualization → Show results in charts/graphs for easy understanding.
• Decision Making → Use the results to take correct actions.
1.3Life cycle of Data Analytics, Quality
and Quantity of data, Measurement
• 1. Qualitative Data Analytics (Words)No numbers. Uses words, pictures, feelings,
stories.
• Example methods: Narrative → Study stories/interviews.Content → Study
spoken/written words or behavior.Grounded theory → Watch what happens and
then explain it
• .👉 Example: Reading students’ essays to find common ideas like “happy,”
“stress,” or “fun.”
• 2. Quantitative Data Analytics (Numbers)All about numbers.Uses maths and
statistics
• .Example methods:Hypothesis testing → Checking if a guess is true or
false.Sample size → Studying a small group to know about a big
group.Average/mean → Add all numbers, divide by how many numbers.
• 👉 Example: Collecting marks of students and finding the average marks.
Measurement in Data Analytics
• Measurement in Data Analytics
• Measurement means turning information into a form that can be
studied (numbers or categories).
• Example:
• Height = measured in cm/meters.
• Satisfaction = measured as “Happy, Neutral, Sad.”
• Good measurement = easier to analyze.
1.4 Data Types, Measure of central tendency, Measures of
dispersion
• 1. Data Types
• Qualitative (Words/Labels) → Data in form of names or categories.
👉 Example: Colors (Red, Blue), Gender (Male, Female).
• Quantitative (Numbers) → Data in numbers.
• Discrete → Countable (like 5 books, 20 students).
• Continuous → Measurable (like 5.5 feet height, 60.3 kg weight).
• 2. Measure of Central Tendency (Middle Value of Data)
• These tell us the typical or center value in data:
• Mean (Average) → Add all numbers ÷ how many numbers.
👉 Example: (10+20+30)/3 = 20.
• Median → Middle number when arranged in order.
👉 Example: 2, 4, 6 → median = 4.
• Mode → Number that comes most often.
👉 Example: 2, 2, 3, 4 → mode = 2.
• 3. Measures of Dispersion (Spread of Data)These tell us how spread
out the numbers are:
• Range → Largest number – Smallest number.👉 Example: 90 – 30 = 60.
• Variance → Shows how much values differ from the mean.
• Standard Deviation (SD) → Easy version of variance (lower SD =
numbers close together, higher SD = numbers spread out).
• ✅ Super Short Memory Tip:
• Data Types → Words or Numbers.
• Central Tendency → Mean, Median, Mode (middle value).
• Dispersion → Range, Variance, SD (spread of values).
1.5 Sampling Funnel, Central Limit Theorem,
Confidence Interval, Sampling Variation
• 1. Sampling Funnel (in Data Analytics)
• In analytics, we often deal with huge data (millions of rows).
• It’s not possible to analyze everything → so we filter step by step, like
a funnel, to pick the right sample.
• 👉 Example in Data Analytics:
From all customers → select only active users → then select only
mobile app users → then select 1000 users for analysis.
• 2. Central Limit Theorem (CLT in Data Analytics)When we take many
samples of data and calculate averages, those averages will form a normal
distribution (bell curve).This is very useful because most statistical tests in
analytics assume normal distribution.
• 👉 Example:If we take many samples of “daily sales data” from a store, their
averages will form a bell curve even if actual sales data is uneven.
• 3. Confidence Interval (in Data Analytics)In analytics, we don’t just say “the
average is X,” we also give a range to show reliability.Confidence interval =
the range where the true population value is likely to lie.
• 👉 Example:From a sample of customers, average monthly spending =
₹1200.95% confidence interval = ₹1150 to ₹1250.Meaning → we are 95%
sure the true average is in that range.
• 4. Sampling Variation (in Data Analytics)Different samples of data will give
slightly different results.This is normal and expected in analytics.
• 👉 Example:Sample 1 of 500 users → average app usage = 40 mins.Sample 2
of 500 users → average app usage = 42 mins.This small difference is called
sampling variation.
• ✅ Summary in Data Analytics Terms:
• Sampling Funnel → Step-by-step filtering to get the right data sample.
• CLT → Sample averages become normal (bell curve) → makes analysis
reliable.
• Confidence Interval → A range showing how sure we are about the result.
• Sampling Variation → Small changes in results due to different samples.
THANK YOU SO MUCH
SUBSCRIBE THE CHANNEL FOR MORE