You are on page 1of 82

Course : STAT6171 – Basic Statistics

Numerical Descriptive
Measures

Session 3
Learning Objectives

1. Describe the basic concepts of descriptive and


inferential statistics.
2. Calculate the statistical measurement which related to
descriptive and inferential statistics.
3. Use Microsoft Excel to do data analysis.

Bina Nusantara University 2


CENTRAL TENDENCY

Bina Nusantara University 3


Rata-rata Mean=average

Sampel Mean:

Bina Nusantara University 4


Sum(…)/count(…)

Count=day

Bina Nusantara University 5


Nutritional data about a sample of seven breakfast cereals (stored in Cereals )
includes the number of calories per serving:

Compute the mean


number of calories in
these breakfast cereals.

Bina Nusantara University 6


The median is the middle value in an ordered array of data that has been
ranked from smallest to largest. Half the values are smaller than or equal
to the median, and half the values are larger than or equal to the median.
The median is not affected by extreme values, so you can use the
median when extreme values are present.
Syarat harus diurutkan

Bina Nusantara University 7


You compute the median by following one of two rules:
• Rule 1 If the data set contains an odd number of values, the
median is the measurement associated with the middle-ranked
value.
• Rule 2 If the data set contains an even number of values, the
median is the measurement associated with the average of the two
middle-ranked values.

Bina Nusantara University 8


Nutritional data about a sample of seven breakfast cereals (stored in
Cereals ) includes the number of calories per serving (see Example 3.1
on page 122). Compute the median number of calories in breakfast
cereals.

Bina Nusantara University 9


sort / Mode()

The mode is the value in a set of data that appears most


frequently.

A systems manager in charge of a company’s network keeps track of the


number of server failures that occur in a day. Determine the mode for the
following data, which represent the number of server failures per day for the
past two weeks:

Nilai terbanyak
Bina Nusantara University 10
The geometric mean measures the rate of change
of a variable over time.

Bina Nusantara University 11


The geometric mean rate of return measures the mean percentage return
of an investment per time period. Equation (3.4) defines the geometric
mean rate of return.

Rn=jumlah sempel

Bina Nusantara University 12


RG= (1+37,0%)*(1+3,5%)^(1/2)-1

The percentage change in the Russell 2000 Index of the stock prices
of 2,000 small companies was 37.0% in 2013 and 3.5% in 2014.
Compute the geometric rate of return.

The geometric mean rate of


return in the Russell 2000
Index for the two years is
19.08% per year.

Bina Nusantara University 13


VARIATION AND SHAPE

Bina Nusantara University 14


The range is the simplest numerical descriptive
measure of variation in a set of data.

Nilai tinggi dikurang nilai rendah


Bina Nusantara University 15
Max-Min jika tidak berurut

Nutritional data about a sample of seven breakfast cereals (stored in


Cereals ) includes the number of calories per serving (see Example 3.1
on page 122). Compute the range of the number of calories for the
cereals.

Solution Ranked from smallest to largest, the calories for the seven cereals
are
80 100 100 110 130 190 200

Therefore, using Equation (3.5), the range = 200 - 80 = 120. The largest
difference in the number of calories between any two cereals is 120.

Bina Nusantara University 16


The sample variance is the sum of the squared differences
around the mean divided by the sample size minus one.

Bina Nusantara University 17


The sample standard deviation is the square root of
the sum of the squared differences around the mean
divided by the sample size minus one.

Bina Nusantara University 18


• Step 1 Calculate the difference between each value and the
mean.
• Step 2 Square each difference.
• Step 3 Sum the squared differences.
• Step 4 Divide this total by n - 1 to compute the sample
variance.
• Step 5 Take the square root of the sample variance to
compute the sample standard deviation.

Bina Nusantara University 19


=sqrt

Bina Nusantara University 20


Bina Nusantara University 21
Nutritional data about a sample of seven breakfast cereals (stored in Cereals
) includes the number of calories per serving (see Example 3.1 on page
122). Compute the variance and standard deviation of the calories in the
cereals.

Bina Nusantara University 22


N=count
Mean=average

The standard deviation of 46.9042 indicates that the calories in the cereals are clustering within
±46.9042 around the mean of 130 (i.e., clustering between X - 1S = 83.0958 and X + 1S =
176.9042). In fact, 57.1% (four out of seven) of the calories lie within this interval.

Bina Nusantara University 23


The coefficient of variation is equal to the standard
deviation divided by the mean, multiplied by 100%.

Bina Nusantara University 24


• Which varies more from cereal to cereal—the number of calories or the amount
of sugar (in grams)?
Because calories and the amount of sugar have different units of
measurement, you need to compare the relative variability in the two
measurements.

For calories, using the mean and variance computed in Examples 3.1 and 3.6 on
pages 122 and 128, the coefficient of variation is

Bina Nusantara University 25


Tingkat variasi

Bina Nusantara University 26


The Z score of a value is the difference between that value and the mean, divided
by the standard deviation. A Z score of 0 indicates that the value is the same as the
mean. If a Z score is a positive or negative number, it indicates whether the value
is above or below the mean and by how many standard deviations.

Z scores help identify outliers, the values that seem excessively different from
most of the rest of the values (see Section 1.4). Values that are very different from
the mean will have either very small (negative) Z scores or very large (positive) Z
scores. As a general rule, a Z score that is less than -3.0 or greater than +3.0
indicates an outlier value.

Bina Nusantara University 27


$c$

The Z score for a value is equal to


the difference between the value
and the mean, divided by the
standard deviation:

(3.9)

Bina Nusantara University 28


Nutritional data about a sample of seven breakfast cereals (stored in Cereals )
includes the number of calories per serving (see Example 3.1 on page 122).
Compute the Z scores of the calories in breakfast cereals.

Bina Nusantara University 29


Skewness measures the extent to which the data values are not
symmetrical around the mean. The three possibilities are:

•Mean < median: negative, or left-skewed distribution


•Mean = median: symmetrical distribution (zero skewness)
•Mean > median: positive, or right-skewed distribution

Paling banyak bagus normal Paling jelek banyak


Bina Nusantara University 30
• Kurtosis measures the peakedness of the curve of the distribution—that
is, how sharply the curve rises approaching the center of the distribution.
• A distribution that has a sharper-rising center peak than the peak of a
normal distribution has positive kurtosis, a kurtosis value that is greater
than zero, and is called lepokurtic.
• A distribution that has a slower-rising (flatter) center peak than the peak
of a normal distribution has negative kurtosis, a kurtosis value that is
less than zero, and is called platykurtic.

Bina Nusantara University 31


LEARNING THE BASICS

3.1 The following set of data is from a sample of n = 5:


74982
a. Compute the mean, median, and mode.
b. Compute the range, variance, standard deviation, and
coefficient of variation.
c. Compute the Z scores. Are there any outliers?
d. Describe the shape of the data set.

Bina Nusantara University 32


LEARNING THE BASICS

Bina Nusantara University 33


APPLYING THE CONCEPTS

Bina Nusantara University 34


EXPLORING NUMERICAL DATA

Bina Nusantara University 35


Quartiles split a set of data into four equal parts.

• The first quartile, Q1, divides the smallest 25.0% of the values
from the other 75.0% that are larger.
• The second quartile, Q2, is the median – 50.0% of the values
are smaller than the median and 50% are larger.
• The third quartile, Q3, divides the smallest 75.0% of the
values from the largest 25.0%.

Bina Nusantara University 36


Q2 sama dengan median

Bina Nusantara University 37


Rules for Calculating the Quartiles from a Set of Ranked Values

Rule 1
If the ranked value is a whole number, the quartile is equal to the
measurement that corresponds to that ranked value. For example, if
the sample size n = 7, the first quartile, Q1, is equal to the
measurement associated with the 917 + 12) / 4 = second ranked
value.

Bina Nusantara University 38


Rules for Calculating the Quartiles from a
Set of Ranked Values
Rule 2
If the ranked value is a fractional half (2.5, 4.5, etc.), the quartile is
equal to the measurement that corresponds to the average of the
measurements corresponding to the two ranked values involved.
For example, if the sample size n = 9, the first quartile, Q1, is equal
to the (19 + 12) / 4 = 2.5 ranked value, halfway between the
second ranked value and the third ranked value.

Bina Nusantara University 39


Rules for Calculating the Quartiles from a Set
of Ranked Values

Rule 3
If the ranked value is neither a whole number nor a fractional half, you
round the result to the nearest integer and select the measurement
corresponding to that ranked value. For example, if the sample size n =
10, the first quartile, Q1, is equal to the (110 + 12) / 4 = 2.75 ranked
value. Round 2.75 to 3 and use the third ranked value.

Bina Nusantara University 40


=quartile(….;1)

Nutritional data about a sample of seven breakfast cereals (stored in Cereals )


includes the number of calories per serving (see Example 3.1 on page 122).
Compute the first quartile (Q1) and third quartile (Q3) of the number of calories for
the cereals.
Ranked from smallest to largest, the number of calories for the seven cereals are as
follows:

Q1 Q2 Q3

Bina Nusantara University 41


Bina Nusantara University 42
IR=

The interquartile range (also called midspread) is the


differece between the third and first quartiles in a set
of data.

Bina Nusantara University 43


Nutritional data about a sample of seven breakfast cereals (stored in Cereals )
includes the number of calories per serving (see Example 3.1 on page 122).
Compute the interquartile range of the number of calories in cereals.
Solution
Ranked from smallest to largest, the number of calories for the seven cereals are
as follows:

80 100 100 110 130 190 200

Using Equation (3.12) and the earlier results from Example 3.11 on page 138, Q1
= 100 and Q3 = 190:
Interquartile range = 190 - 100 = 90

Therefore, the interquartile range of the number of calories in cereals is 90


calories.
Bina Nusantara University 44
The five-number summary for a variable consists of the smallest value
(Xsmallest), the first quartile, the median, the third quartile, and the largest
value (Xlargest).

Bina Nusantara University 45


Nutritional data about a sample of seven breakfast cereals (stored in Cereals )
includes the number of calories per serving (see Example 3.1 on page 122).
Compute the five-number summary of the number of calories in cereals.

Median 110
X largest 200
X smallest 80
Q1 100
Q3 190

Bina Nusantara University 46


The boxplot uses a five-number summary to visualize the shape of the
distribution for a variable. Figure 3.4 contains a boxplot for the sample
of 10 times to get ready in the morning.

Figure 3.4 Boxplot for the getting-ready times

Bina Nusantara University 47


In the More Descriptive Choices scenario, you are interested in comparing the
past performance of the growth and value funds from a sample of 407 funds.
One measure of past performance is the one-year return percentage variable.
Construct the boxplots for this variable for the growth and value funds.

Bina Nusantara University 48


Bina Nusantara University 49
Figure 3.6 demonstrates the relationship between the boxplot and the
density curve for four different types of distributions. The area under
each density curve is split into quartiles corresponding to the five-
number summary for the boxplot.

Bina Nusantara University 50


APPLYING THE CONCEPTS

The file CD Rate contains the yields for one-year CDs and five-
year CDs, for 5 banks in the United States, as of April 15, 2015.

For each type of account:


a. Compute the first quartile 1Q12, the third quartile 1Q32, and
the interquartile range.
b. List the five-number summary.
c. Construct a boxplot and describe its shape.

Bina Nusantara University 51


Numerical Descriptive
Measures for a Population

Bina Nusantara University 52


The population mean is the sum of the values in the population
divided by the population size, N. This parameter, represented by the
Greek lowercase letter mu, , serves as a measure of central tendency.
Equation (3.13) defines the population mean.

Bina Nusantara University 53


The population variance and the population standard deviation parameters
measure variation in a population. The population variance is the sum of
the squared differences around the population mean divided by the
population size, N, and the population standard deviation is the square
root of the population variance.

Bina Nusantara University 54


The empirical rule states that for population data from a symmetric
mound-shaped distribution such as the normal distribution, the
following are true:
• Approximately 68% of the values are within ±1 standard
deviation from the mean.
• Approximately 95% of the values are within ±2 standard
deviations from the mean.
• Approximately 99.7% of the values are within ±3 standard
deviations from the mean.

Bina Nusantara University 55


Aturan empiris
A population of 2-liter bottles of cola is known to have a mean fill-weight of
2.06 liters and a standard deviation of 0.02 liter. The population is known to
be bell-shaped. Describe the distribution of fill-weights. Is it very likely that a
bottle will contain less than 2 liters of cola?

Using the empirical rule, you can see that approximately 68% of the bottles
will contain between 2.04 and 2.08 liters, approximately 95% will contain
between 2.02 and 2.10 liters, and approximately 99.7% will contain between
2.00 and 2.12 liters. Therefore, it is highly unlikely that a bottle will contain
less than 2 liters.

Bina Nusantara University 56


For heavily skewed sets of data and data sets that do not appear to be
normally distributed, you should use Chebyshev’s theorem instead of
the empirical rule. Chebyshev’s theorem (see reference 2) states that
for any data set, regardless of shape, the percentage of values that are
found within distances of k standard deviations from the mean must be
at least

You can use this rule for any value of k greater than 1.

Bina Nusantara University 57


Tidak normal normal

Bina Nusantara University 58


As in Example 3.15, a population of 2-liter bottles of cola is known to have
a mean fill-weight of 2.06 liter and a standard deviation of 0.02 liter.
However, the shape of the population is unknown, and you cannot assume
that it is bell-shaped. Describe the distribution of fillweights. Is it very likely
that a bottle will contain less than 2 liters of cola?

Bina Nusantara University 59


APPLYING THE CONCEPTS

Bina Nusantara University 60


The Covariance and the
Coefficient of Correlation

Bina Nusantara University 61


The covariance measures the strength of the linear relationship
between two numerical variables (X and Y). Equation (3.16)
defines the sample covariance, and Example 3.17 illustrates its
use. Meneliti 2 buah variable

Bina Nusantara University 62


NBA team revenue and NBA values were used to construct a scatter
plot that showed the relationship between those two variables. Now,
you want to measure the association between the annual revenue
and value of a team by determining the sample covariance.

Bina Nusantara University 63


• The coefficient of correlation measures the relative
strength of a linear relationship between two numerical
variables.
• The values of the coefficient of correlation range from -1
for a perfect negative correlation to +1 for a perfect positive
correlation.
• Perfect in this case means that if the points were plotted on
a scatter plot, all the points could be connected with a
straight line.

Bina Nusantara University 64


When dealing with population data for two numerical variables, the
Greek letter (rho) is used as the symbol for the coefficient of
correlation. Figure 3.8 illustrates three different types of association
between two variables.

venue

revenue

Ada hubungan negatif Tidak ada hubungan Ada hubungan positif


Bina Nusantara University 65
Option- add- analisis
Analis fullstack

Data analisis
Deskriptif analis

Bina Nusantara University 66


APPLYING THE CONCEPTS

Bina Nusantara University 67


APPLYING THE CONCEPTS

College football is big business, with coaches’ pay in millions of dollars.


The file College Football contains the 2013 total athletic department
revenue and 2014 football coaches’ total pay for 108 schools. (Data
extracted from usat.ly/1uws90v and usat.ly/1DOl0v3.)
a. Compute the covariance.
b. Compute the coefficient of correlation.
c. Based on (a) and (b), what conclusions can you reach about the
relationship between coaches’ total pay and athletic department
revenues?
Bina Nusantara University 68
Excel Guide

Bina Nusantara University 69


Analysis ToolPak Use Descriptive Statistics.
For the example, open to the DATA worksheet of the Times
workbook and:
1. Select Data ➔ Data Analysis.
2. In the Data Analysis dialog box, select Descriptive Statistics from the
Analysis Tools list and then click OK.

In the Descriptive Statistics dialog box


(shown below):
1. Enter A1:A11 as the Input Range.
Click Columns and check Labels in first
row.
2. Click New Worksheet Ply and check
Summary statistics, Kth Largest, and
Kth Smallest.
3. Click OK.

Bina Nusantara University 70


1. Input your data.
2. Choose Data Analysis – Descriptive Statistics – OK.

Bina Nusantara University 71


Statistics
Descriptive

Bina Nusantara University 72


Coefficient Correlation

Bina Nusantara University 73


1. Input your data.
2. Choose Insert - Scatter

Bina Nusantara University 74


You can add the trend line by
click the “dot” and right click
on your mouse then choose
the line type.

Bina Nusantara University 75


Bina Nusantara University 76
To compute the correlation coefficient:
Choose Data Analysis – Correlation - OK

Bina Nusantara University 77


Add the information that you needed to compute the
coefficient correlation.

Bina Nusantara University 78


To compute the covariance coefficient:
Choose Data Analysis – Covariance - OK

Bina Nusantara University 79


Add the information that you needed to compute the
covariance correlation.

Bina Nusantara University 80


David M. Levine, David F. Stephan, Kathryn A. Szabat. Statistics for
Managers using Microsoft Excel. Pearson, 8th Edition.
Thank You

Bina Nusantara University 82

You might also like