Professional Documents
Culture Documents
1. Introduction to Statistics
and Data Analysis
2. Probability
3. Random Variables and
Probability Distributions
4. Mathematical Expectation
5. Some Discrete Probability
Distributions
6. Some Continuous
Probability Distributions
Introduction to
Statistics and Data
Analysis
1-2
Chapter Outline
1-3
Example: Data Samples in Tabular Form
1-4
The Dot Plot: Another Representation of the tabulated data
1-5
Fundamental Relationship between
Probability and Inferential Statistics
The Sample along with inferential statistics allow us to draw conclusion about
the population.
1-6
Data Classification
Data
Qualitative Quantitative
Categorical Numerical,
Can be ranked
Discrete Continuous
Countable Non-Countable
Number of (Measureable)
children in a Height of a student
family From 175 to 180
1-7
Qualitative Categorical Frequency distribution
1-9
Quantitative: Discrete or Continuous
(Continued)
Example:
Given the following: 4, 7, 9, 3, 0
First step: (Sorting the data)
0, 3, 4, 7, 9
99.5 - 104.5 10
104.5 - 109.5 12
109.5 - 114.5 8
114.5 - 119.5 6
119.5 - 124.5 4
124.5 - 129.5 7
129.5 - 134.5 3
1 - 11
Measures of Location (Central
Tendency)
• The data (observations) often tend to be concentrated around the center of the
data.
• Some measures of location are: the mean, mode, and median.
• These measures are considered as representatives (or typical values) of the
data. They are designed to give some quantitative measures of where the
center of the data is in the sample.
Notations:
1) The Mean (denoted by: x )
2) )
The Median (denoted by: x
1 - 12
Sample Mean
Example:
Given the following data: 4, 9, 6, 12, 19, 16
The sample size n is 6 (since we have 6 observations)
Note: There is no need to sort the data, since we are adding all the
observations.
1 - 13
Example
Suppose that the following sample represents the ages (in year) of
a sample of 3 men:
1 - 14
Sample Mean as a Centroid of
the with-nitrogen stem weight
1 - 15
Median
1 - 17
Mode
1 - 19
Measures of Variability OR Dispersion
Sample Variance
Solution: n 5
x i x i
10 21 33 53 54 171
x i 1
i 1
34.2 year
n 5 5 5
n 5
xi x
i
2 2
x 34. 2
s2 i 1
i 1
n 1 5 1
10 34.2 21 34.2 33 34.2 53 34.2 54 34.2
2 2 2 2 2
4
1506.8
376.7 (year)2
4
A sample of 10 students scored the following grades: 40, 42, 35, 54, 57,
54, 46, 42, 54, 57.
(i)Find the sample mean, mode and median.
(ii)Compute the standard deviation.
Solution:
(i) Listing the scores in order:35, 40, 42, 42, 46, 54, 54, 54, 57, 57
35 40 42 42 46 54 54 54 57 57
Mean x 48.1
10
46 54
Mode 54 Median 50
2
1
(ii) s [(35 48.1) 2 (40 48.1) 2 (57 48.1) 2 ] 8.1
9
1 - 22
Example 3: Using a Table (Alternative
Method)
x 2 5 6 8 9 Total Sum
-4 -1 0 2 3 0 = sum of residuals
( xi x )
( xi x ) 2 16 1 0 4 9 30
Now,
1 - 23
Example 3: (Continued)
Example:
In the previous page, we found that the standard deviation is
.
Therefore, the variance is 7.5
1 - 24
More Measures of Dispersion.
• The range is the numerical difference between the largest and the
smallest value of a set of a batch of data:
range = max – min
• The lower quartile, denoted by Q1, is the median of the lower half of the
batch of data (median of the values below the median of the data set).
• The upper quartile, denoted by Q3, is the median of the upper half of the
batch of data (median of the values above the median of the data set).
• The inter-quartile range, is defined by Q3 – Q1.
• A Box-plot is a diagram consisting of a box and whiskers. On it is displayed
the median, the quartiles and the maximum and minimum values in a batch of
data as shown below.
• A Box-plot is used for comparing two sets of data. In this case two box-plots
are needed and an appropriate common scale.
median
min max
Q1 Q3
1 - 26
Example 1
• Q1 is the median of the values below the median 11.5. That is the
median of the values: 4, 5, 6, 6, 7, 11; n = 6 (even); Therefore,
n n 6 6
x x 1 x x 1
2 2 2 2 x 3 x 4 6 6 6
Q 1= 2 2 2 2
1 - 27
Example 1 (Continued)
• Q3 is the median of the values above the median 11.5. That is the median of the
values: 12, 14, 16, 20, 22, 29; n = 6 (even); Therefore,
n n 6 6
x x 1 x x 1
2 2 2 2 x 3 x 4 16 20 18
Q3= 2 2 2 2
Below is the representation of the Box-plot diagram for the batch of data:
11.5
4 29
6 18
1 - 28
Example 2
The table below gives the gross weekly earning including overtime in
pounds of 20 actors working in a theatre (9 women and 11 men):
Women 221 272 334 361 372 399 415 456 510
Men 258 315 333 353 398 420 435 462 495 523 587
1 - 29
Example 2
Women
Men
1 - 30
Example 2
CONTINUE
D
From the box-plots it is clear that the men’s earnings are higher than the
women’s: all the five values marked on the box-plots are higher for men
than for the women.
1 - 31
Example 3: Nicotine Data – Outliers Or
Extreme Values
The box-plot below is a representation of the data in the table above. Note the
dots on the two sides of the plot, to the left of the minimum value and to the
right of the maximum value. These dots are called “Outliers” or “Extreme
Values”.
1 - 32
Example 3: Nicotine Data – Outliers Or
Extreme Values
1 - 33
Statistical Modeling: The Histogram
1 - 34
Statistical Modeling: The Histogram –
Example: Battery Life (in Years).
1 - 35
Statistical Modeling: The Histogram –
Example: Battery Life (in Years).
1 - 36
Probability Distribution Function
Corresponding to Histogram of Example of
Battery Life
1 - 37
Skewness of Data Distribution
• The distribution in (a) is said to be “Right Skewed” as it has a longer tail to the
Right side.
• The distribution in (b) is Symmetric.
• The distribution in (c) is said to be “Left Skewed” as it has a longer tail on the
Left side.
1 - 38
Statistical Modeling: Scatter-Plot
1 - 39
Examples of Scatter Plots: Death
rates vs. Alcohol Consumption.
1 - 40