Professional Documents
Culture Documents
Pritha Guha
MANAGERIAL DECISION - 1 (QTMD1G21-1)
TEACHING AND GRADING
TEXTBOOK:
PRITHA GUHA A FIRST COURSE IN PROBABILITY GRADING:
(EMAIL: PRITHA@XLRI.AC.IN) BY QUIZ-1 (30%),
SHELDON ROSS (9TH EDITION), QUIZ-2 (30%),
PEARSON END-TERM (40%)
PROBABILITY
SUMMARISING DATA
Facts and figures
collected, analysed
and summarised
for presentation
and interpretation.
WHAT IS DATA
Existing Sources
• Data Repositories:
• Kaggle(https://www.kaggle.com/),
• UCI (https://archive.ics.uci.edu/ml/index.php)
Find the right data. Use the appropriate Clearly communicate the
statistical tools. numerical information
into written language.
TWO BRANCHES OF STATISTICS
Population: consists
of all items of
interest.
Sample: a subset
of the population.
The statistical analysis that is appropriate depends on whether the data for the
variable are qualitative or quantitative
What are the variables here?
AN EXAMPLE What type of variables are those?
ID gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score
1 female group B bachelor's degree standard none 72 72 74
2 female group C some college standard completed 69 90 88
3 female group B master's degree standard none 90 95 93
4 male group A associate's degree free/reduced none 47 57 44
5 male group C some college standard none 76 78 75
6 female group B associate's degree standard none 71 83 78
7 female group B some college standard completed 88 95 92
8 male group B some college free/reduced none 40 43 39
9 male group D high school free/reduced completed 64 64 67
10 female group B high school free/reduced none 38 60 50
11 male group C associate's degree standard none 58 54 52
12 male group D associate's degree standard none 40 52 43
13 female group B high school standard none 65 81 73
14 male group A some college standard completed 78 72 70
15 female group A master's degree standard none 50 53 58
16 female group C some high school standard none 69 75 78
17 male group C high school standard none 88 89 86
How can we extract the most prominent
features of the data?
20-30 7 0.007
Groups data into intervals called classes and
records the number of observations that falls into
each class. 30-40 19 0.019
40-50 70 0.070
How to construct?
50-60 178 0.178
• Decide the number of non-overlapping classes. 60-70 238 0.238
• The classes are exhaustive.
• Determine the width of each class: take equal width for 70-80 252 0.252
classes.
• approx. class width = (Max-Min)/no. of classes
• Determine the class limits: each data point should be in 80-90 173 0.173
exactly one class; no more, no less.
90-100 62 0.062
Relative Frequency: A relative frequency
distribution identifies the proportion or fraction of
values that fall into each class. Total 1000 1.0
HISTOGRAM
𝐷𝑒𝑛𝑠𝑖𝑡𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠
=
𝐶𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠
• Along x-axis, plot the classes.
• AlongY-axis plot the density of the data in
that class.
• The vertical scale on such a histogram is
called a density scale.
• Interesting property of a density histogram:
• The area of each rectangle is the relative
frequency of the corresponding class.
• As the sum of relative frequencies must be
1.0 (except for roundoff), the total area of
all rectangles in a density histogram is 1.
NUMERICAL SUMMARY Measures of central tendency,
Measures of dispersion
A measure of central tendency represents
the centre or middle of the data
Mean
MEASURES OF Median
CENTRAL
TENDENCY Quartiles
Mode
• The sample mean of observations x 1, x2, x3, …, xn:
x1 +x2 +⋯+xn σn
i=1 xi
=
MEAN n n
MEDIAN
• Resistant to extreme values, easy to describe
• Not as mathematically tractable as mean, need to sort the data to
calculate
DEVIATION AND • The variance is a measure of variability that utilizes all the
data
VARIANCE • The variance is useful in comparing the variability of two or
more variables
• Standard deviation is measured in the same units as the
data, making it more easily interpreted than the variance.
A LITTLE BIT OF R…
R is a language and
environment for statistical
computing and graphics.
R is available as Free
Software.
R provides a wide variety of
statistical and graphical
techniques and is highly
extensible (active community
of developers).
WHAT IS R?
DOWNLOADING R AND RSTUDIO
R download: https://cran.rstudio.com/
Rstudio download:
https://www.rstudio.com/products/rstudio/download/#download
Download the free desktop version for RStudio
After installing R and RStudio, launch RStudio from your computer
“application folders”.
Code Editor
Workspace and History
y = c(90, 70, 80, 90, 120, 140, 110, 100, 130, 1200)
IN R
How will you enter the data
127, 132, 138, 141, 144, 146,
152, 154, 162, 171, 177, 192,
241
in R?
x = c(90, 70, 80, 90, 120, 140, 110, 100, 130)
mean(x)
[1] 103.3333
y = c(90, 70, 80, 90, 120, 140, 110, 100, 130, 1200)
COMPUTATIONS mean(y)
[1] 213
IN R
sort(x)
median(x)
[1] 100
data1 = c(127, 132, 138, 141, 144, 146, 152, 154, 162,
171, 177, 192,241)
quantile(data1)
90%
189
range(data1)
[1] 127 241
diff(range(data1))
COMPUTATIONS
[1] 114
IQR(data1)
IN R [1] 30
var(data1)
[1] 939.0256
sd(data1)
[1] 30.64353
BOXPLOT IN R
boxplot(data1)
FIVE NUMBER SUMMARY IN R
summary(data1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
127.0 141.0 152.0 159.8 171.0 241.0
fivenum(data1)
[1] 127 141 152 171 241
READING A DATA FILE IN R
Read the StudentsPerformance.csv file
SPData = read.csv(file.choose(),
header = TRUE)
IQR(SPData$reading.score)
What is the IQR of the reading score? [1] 20
boxplot(SPData$reading.score)
How would the boxplot for reading score look
like?
Are there any outliers?
boxplot(SPData$reading.score, main = "Boxplot of reading score", ylab =
"Reading score (out of total marks 100)", col = "gold")
break.SPData = c(10,20,30,40,50,60,70,80,90,100)
Relative Frequency
Table
PropFreqTab.ReadingScore=prop.table(FreqTab.Reading
Score)
HOME WORK
Toss a coin 1, 2, 3, 4, …, 20 times and note down the number of heads.
Compute the relative frequency.
Plot the relative frequency and send it to me!
Coin Tossed Frequency of Heads Relative Frequency
1 1 1
2 1 0.5
3 3 1
4 1 0.25
5 3 0.6
6 1 0.166666667
7 3 0.428571429
8 5 0.625
9 5 0.555555556
10 6 0.6