You are on page 1of 147

NCP_1_MBA_2020_2022

By Subhodip Pal (MBA 19-21)

1
STATISTICS
 Statistics is a form of mathematical analysis that uses quantified
models, representations and synopses for a given set of
experimental data or real-life studies.
 Statistics studies methodologies to gather, review, analyze and
draw conclusions from data.

 Some statistical measures include the following:


1. Mean
2. Regression analysis
3. Skewness
4. Kurtosis
5. Variance
6. Analysis of variance

Subhodip Pal (MBA 19-21) 2


BUSINESS STATISTICS
 Business statistics takes the data analysis tools from
elementary statistics and applies them to business.
 For example, estimating the probability of a defect
coming off a factory line, or seeing where sales are
headed in the future.

Subhodip Pal (MBA 19-21) 3


ANALYTICS
 Analytics is the scientific process of discovering and
communicating the meaningful patterns which can be
found in data.
 It is concerned with turning raw data into insight for
making better decisions.
 Analytics relies on the application of statistics,
computer programming, and operations research in
order to quantify and gain insight to the meanings of
data.
 It is especially useful in areas which record a lot of data
or information.
Subhodip Pal (MBA 19-21) 4
BUSINESS ANALYTICS
 Business analytics (BA) is the iterative, methodical
exploration of an organization's data, with an emphasis on
statistical analysis.
 Business analytics is used by companies that are committed
to making data-driven decisions.
 Data-driven companies treat their data as a corporate asset
and actively look for ways to turn it into a competitive
advantage.
 Successful business analytics depends on data quality,
skilled analysts who understand the technologies and the
business, and an organizational commitment to using data
to gain insights that inform business decisions

Subhodip Pal (MBA 19-21) 5


VARIABLE, ELEMENT, DATA SET,
OBSERVATION
 Elements are the entities on which data are collected.
 A variable is a characteristic of interest for the
elements.
 The set of measurements obtained for a particular
element is called an observation.
 A data set with n elements contains n observations.
 The total number of data values in a complete data set
is the number of elements multiplied by the number
of variables.

Subhodip Pal (MBA 19-21) 6


Data, Data Sets, Elements, Variables, and
Observations Variables

Compan Stock Annual Earnings per


y Exchange Sales ($M) share ($)

Observatio
Dataram NQ 73.10 0.86 n
Element
Names EnergySo N 74.00 1.67
uth
Keystone N 365.70 0.86

LandCare NQ 111.40 0.33

Psycheme N 17.60 0.13


dics
Data Set
Subhodip Pal (MBA 19-21) 7
EVOLUTION OF BUSINESS ANALYTICS

Subhodip Pal (MBA 19-21) 8


BUSINESS ANALYTICS FRAMEWORK

Subhodip Pal (MBA 19-21) 9


Scales of Measurement

Subhodip Pal (MBA 19-21) 10


Scales of Measurement
 Scales of measurement include
 Nominal
 Ordinal
 Interval
 Ratio
 The scale determines the amount of information
contained in the data.
 The scale indicates the data summarization and
statistical analyses that are most appropriate.

Subhodip Pal (MBA 19-21) 11


Scales of Measurement
Nominal scale
 Data are labels or names used to identify an attribute of the element.
 A nonnumeric label or numeric code may be used.

Example
Students of a university are classified by the school in which they are
enrolled using a nonnumeric label such as Business, Humanities,
Education, and so on.
Alternatively, a numeric code could be used for the school variable (e.g. 1
denotes Business, 2 denotes Humanities, 3 denotes Education, and so
on).

Subhodip Pal (MBA 19-21) 12


Scales of Measurement
Ordinal scale
 The data have the properties of nominal data and the order
or rank of the data is meaningful.
 A nonnumeric label or numeric code may be used.

Example
Students of a university are classified by their class standing
using a nonnumeric label such as Freshman, Sophomore,
Junior, or Senior.
Alternatively, a numeric code could be used for the class
standing variable (e.g. 1 denotes Freshman, 2 denotes
Sophomore, and so on).

Subhodip Pal (MBA 19-21) 13


Scales of Measurement
Interval scale
 The data have the properties of ordinal data, and the
interval between observations is expressed in terms of
a fixed unit of measure.
 Interval data are always numeric.

Example
Melissa has an SAT score of 1985, while Kevin has an SAT
score of 1880. Melissa scored 105 points more than
Kevin.

Subhodip Pal (MBA 19-21) 14


Scales of Measurement
Ratio scale
 Data have all the properties of interval data and the ratio
of two values is meaningful.
 Ratio data are always numerical.
 Zero value is included in the scale.

Example:
Price of a book at a retail store is $ 200, while the price of
the same book sold online is $100. The ratio property
shows that retail stores charge twice the online price.

Subhodip Pal (MBA 19-21) 15


Quantitative Data
 Quantitative data indicate how many or how much.
 Quantitative data are always numeric.
 Ordinary arithmetic operations are meaningful for
quantitative data.
 Examples include:
I. How many people took part?
II. How much did it cost?
III. How long did it run for?
IV. Average attendance at each programme session?

Subhodip Pal (MBA 19-21) 16


Categorical Data
 Labels or names are used to identify an attribute of
each element
 Often referred to as qualitative data
 Use either the nominal or ordinal scale of
measurement
 Can be either numeric or nonnumeric

Subhodip Pal (MBA 19-21) 17


Qualitative Data
 Qualitative data is information about qualities,
you can’t count it.
 That is, it’s information about how people feel about
something.
 Examples include:
I. Sharing what people like about a programme.
II. How they think it could be improved.
III. What difference it has made to their lives.
IV. Whether they would recommend the programme to
others.

Subhodip Pal (MBA 19-21) 18


Cross-Sectional Data
Cross-sectional data are collected at the same or
approximately the same point in time.

Example
Data detailing the number of building permits issued in
November 2013 in each of the counties of Ohio.

Subhodip Pal (MBA 19-21) 19


Time Series Data
Time series data are collected over several time periods.

Example
Data detailing the number of building permits issued in
Lucas County, Ohio in each of the last 36 months.

Graphs of time series data help analysts understand


 what happened in the past
 identify any trends over time, and
 project future levels for the time series
Subhodip Pal (MBA 19-21) 20
Longitudinal Data
 Longitudinal data, sometimes referred to as panel
data, track the same sample at different points in time.
 The sample can consist of individuals, households,
establishments, and so on.
 In contrast, repeated cross-sectional data, which also
provides long-term data, gives the same survey to
different samples over time.

Subhodip Pal (MBA 19-21) 21


WHAT IS PRIMARY DATA?
 Primary data is the kind of data that is collected
directly from the data source without going through
any existing sources. It is mostly collected specially for
a research project and may be shared publicly to be
used for other research.
 A common example of primary data is the data
collected by organizations during market research,
product research, and competitive analysis. This data
is collected directly from its original source which in
most cases are the existing and potential customers.

Subhodip Pal (MBA 19-21) 22


PROS – Primary Data
 Primary data is specific to the needs of the researcher at the
moment of data collection. The researcher is able to control the
kind of data that is being collected.
 It is accurate compared to secondary data. The data is not
subjected to personal bias and as such the authenticity can be
trusted.
 The researcher exhibit ownership of the data collected through
primary research. He or she may choose to make it available
publicly, patent it, or even sell it.
 Primary data is usually up to date because it collects data in real-
time and does not collect data from old sources.
 The researcher has full control over the data collected through
primary research. He can decide which design, method, and data
analysis techniques to be used.

Subhodip Pal (MBA 19-21) 23


CONS – Primary Data
 Primary data is very expensive compared to secondary
data. Therefore, it might be difficult to collect primary
data.
 It is time-consuming.
 It may not be feasible to collect primary data in some
cases due to its complexity and required commitment.

Subhodip Pal (MBA 19-21) 24


What is Secondary Data?
 Secondary data is the data that has been collected in
the past by someone else but made available for others
to use.
 They are usually once primary data but become
secondary when used by a third party.
 For example, when conducting a research thesis,
researchers need to consult past works done in this
field and add findings to the literature review. Some
other things like definitions and theorems are
secondary data that are added to the thesis to be
properly referenced and cited accordingly.

Subhodip Pal (MBA 19-21) 25


PROS – Secondary Data
 Secondary data is easily accessible compared to primary
data. Secondary data is available on different platforms that
can be accessed by the researcher.
 Secondary data is very affordable. It requires little to no
cost to acquire them because they are sometimes given out
for free.
 The time spent on collecting secondary data is usually very
little compared to that of primary data.
 Secondary data makes it possible to carry out longitudinal
studies without having to wait for a long time to draw
conclusions.
 It helps to generate new insights into existing primary data.
Subhodip Pal (MBA 19-21) 26
CONS - Secondary Data
 Secondary data may not be authentic and reliable. A
researcher may need to further verify the data
collected from the available sources.
 Researchers may have to deal with irrelevant data
before finally finding the required data.
 Some of the data is exaggerated due to the personal
bias of the data source.
 Secondary data sources are sometimes outdated
with no new data to replace the old ones.

Subhodip Pal (MBA 19-21) 27


POPULATION vs SAMPLE
 The main difference between a population and sample
has to do with how observations are assigned to the
data set.
 A population includes all of the elements from a set
of data.
 A sample consists one or more observations drawn
from the population.

Subhodip Pal (MBA 19-21) 28


 A measurable characteristic of a population, such
as a mean or standard deviation, is called a
parameter; but a measurable characteristic of a
sample is called a statistic.
 We will see in future lessons that the mean of a
population is denoted by the symbol μ; but the
mean of a sample is denoted by the symbol x.

Subhodip Pal (MBA 19-21) 29


Descriptive Statistics
 Most of the statistical information in newspapers,
magazines, company reports, and other publications
consists of data that are summarized and presented in
a form that is easy to understand.
 Such summaries of data, which may be tabular,
graphical, or numerical, are referred to as descriptive
statistics.

Subhodip Pal (MBA 19-21) 30


Descriptive Statistics: Tabular and Graphical
Displays
 Summarizing Data for a Categorical Variable
 Categorical data use labels or names to identify categories
of like items.
 Summarizing Data for a Quantitative Variable
 Quantitative data are numerical values that indicate how
much or how many.

Subhodip Pal (MBA 19-21) 31


Summarizing Categorical Data
 Frequency Distribution
 Relative Frequency Distribution
 Percent Frequency Distribution
 Bar Chart
 Pie Chart

Subhodip Pal (MBA 19-21) 32


Frequency Distribution

• A frequency distribution is a tabular summary of data


showing the number (frequency) of observations in
each of several non-overlapping categories or classes.
• The objective is to provide insights about the data that
cannot be quickly obtained by looking only at the
original data.

Subhodip Pal (MBA 19-21) 33


Frequency Distribution
Example: Marada Inn
 Guests staying at Marada Inn were asked to rate the quality
of their accommodations as being excellent, above average,
average, below average, or poor.
 The ratings provided by a sample of 20 guests are:

Below Average Average Above Average


Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average

Subhodip Pal (MBA 19-21) 34


Frequency Distribution
• Example: Marada Inn
Rating Frequency

Poor 2
Below 3
Average

Average 5

Above 9
Average

Excellent 1

Total 20

Subhodip Pal (MBA 19-21) 35


Relative Frequency Distribution
• The relative frequency of a class is the fraction or
proportion of the total number of data items
belonging to the class.
Frequency of the class
Relative frequency of a class =
𝑛

• A relative frequency distribution is a tabular


summary of data showing the relative frequency for
each class.

Subhodip Pal (MBA 19-21) 36


Percent Frequency Distribution
• The percent frequency of a class is the relative
frequency multiplied by 100.
• A percent frequency distribution is a tabular
summary of a set of data showing the percent
frequency for each class.

Subhodip Pal (MBA 19-21) 37


Relative Frequency and Percent Frequency
Distributions
• Example: Marada Inn

Rating Relative Percent


Frequency Frequency

Poor .10 10
Below Average .15 15

Average .25 25
Above Average .45 45

Excellent .05 5

Total 1.00 100


Subhodip Pal (MBA 19-21) 38
Bar Chart
• A bar chart is a graphical display for depicting
qualitative data.
• On one axis (usually the horizontal axis), we specify
the labels that are used for each of the classes.
• A frequency, relative frequency, or percent frequency
scale can be used for the other axis (usually the vertical
axis).
• Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
• The bars are separated to emphasize the fact that each
class is a separate category.

Subhodip Pal (MBA 19-21) 39


Bar Chart
10 Marada Inn Quality Ratings
9
8
7
Frequency

6
5
4
3
2
1
Quality
Poor Below Average Above Excellent Rating
Average Average

Subhodip Pal (MBA 19-21) 40


Pie Chart
• The pie chart is a commonly used graphical display
for presenting relative frequency and percent
frequency distributions for categorical data.
• First draw a circle; then use the relative frequencies to
subdivide the circle into sectors that correspond to the
relative frequency for each class.
• Since there are 360 degrees in a circle, a class with a
relative frequency of .25 would consume .25(360) = 90
degrees of the circle.

Subhodip Pal (MBA 19-21) 41


Pie Chart
Marada Inn Quality Ratings

Excellent
5%
Poor
10%
Below
Average
Above 15%
Average Average
45% 25%
Average
25%

Subhodip Pal (MBA 19-21) 42


Summarizing Quantitative Data
 Frequency Distribution
• Relative Frequency and Percent Frequency Distributions
• Dot Plot
• Histogram
• Cumulative Distributions
• Stem-and-Leaf Display

Subhodip Pal (MBA 19-21) 43


Frequency Distribution
 Guidelines for Determining the Width of Each Class
• Use classes of equal width.
• Approximate Class Width =
Largest data value − Smallest data value
Number of classes
• Making the classes the same width reduces the
chance of inappropriate interpretations.

Subhodip Pal (MBA 19-21) 44


Frequency Distribution
• Example: Hudson Auto Repair
Sample of Parts Cost($) for 50 Tune-ups
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

Subhodip Pal (MBA 19-21) 45


Frequency Distribution
• Example: Hudson Auto Repair
If we choose six classes:
Approximate Class Width = (109 - 50)/6 = 9.83 10

Parts Cost ($) Frequency


50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
Total 50

Subhodip Pal (MBA 19-21) 46


Dot Plot
• One of the simplest graphical summaries of data is a dot
plot.
• A horizontal axis shows the range of data values.
• Then each data value is represented by a dot placed
above the axis.

Subhodip Pal (MBA 19-21) 47


Dot Plot
• Example: Hudson Auto Repair

50 60 70 80 90 100 110

Tune-up Parts Cost ($)

Subhodip Pal (MBA 19-21) 48


Histogram
• Another common graphical display of quantitative
data is a histogram.
• The variable of interest is placed on the horizontal
axis.
• A rectangle is drawn above each class interval with its
height corresponding to the interval’s frequency,
relative frequency, or percent frequency.
• Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent classes.

Subhodip Pal (MBA 19-21) 49


Histogram
• Example: Hudson Auto Repair
18
Tune-up Parts Cost
16
14
Frequency

12
10
8
6
4
2 Parts
Cost ($)
50-59 60-69 70-79 80-89 90-99 100-110

Subhodip Pal (MBA 19-21) 50


Scatter Diagram and Trendline
• A scatter diagram is a graphical presentation of the
relationship between two quantitative variables.
• One variable is shown on the horizontal axis and the
other variable is shown on the vertical axis.
• The general pattern of the plotted points suggests the
overall relationship between the variables.
• A trendline provides an approximation of the
relationship.

Subhodip Pal (MBA 19-21) 51


Scatter Diagram
 A Positive Relationship

Subhodip Pal (MBA 19-21) 52


Scatter Diagram
 A Negative Relationship

Subhodip Pal (MBA 19-21) 53


Scatter Diagram
 No Apparent Relationship

Subhodip Pal (MBA 19-21) 54


Scatter Diagram
 Example: Panthers Football Team
The Panthers football team is interested in investigating
the relationship, if any, between interceptions made and
points scored.
x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 30

Subhodip Pal (MBA 19-21) 55


Scatter Diagram and Trendline
y
35
Number of Points Scored

30
25
20
15
10
5
0 x
0 1 2 3 4
Number of Interceptions

Subhodip Pal (MBA 19-21) 56


Side-by-Side Bar Chart
• A side-by-side bar chart is a graphical display for
depicting multiple bar charts on the same display.

• Each cluster of bars represents one value of the first


variable.
• Each bar within a cluster represents one value of the
second variable.

Subhodip Pal (MBA 19-21) 57


Side-by-Side Bar Chart

20
18
16
14
Frequency

12
10
8
6
4
2

Subhodip Pal (MBA 19-21) 58


Stacked Bar Chart
• A stacked bar chart is another way to display and
compare two variables on the same display.
• It is a bar chart in which each bar is broken into
rectangular segments of a different color.
• If percentage frequencies are displayed, all bars will be
of the same height (or length), extending to the 100%
mark.

Subhodip Pal (MBA 19-21) 59


Stacked Bar Chart
40
36
32
28
Frequency

24
20
16
12
8
4

Subhodip Pal (MBA 19-21) 60


Data Dashboards
• A data dashboard is a widely used data visualization tool.
• It organizes and presents key performance indicators
(KPIs) used to monitor an organization or process.
• It provides timely summary information that is easy to
read, understand, and interpret.
• Some additional guidelines include . . .
• Minimize the need for screen scrolling.
• Avoid unnecessary use of color or 3D displays.
• Use borders between charts to improve readability.

Subhodip Pal (MBA 19-21) 61


Data Dashboard Example

Subhodip Pal (MBA 19-21) 62


Tabular and Graphical Displays
Data

Categorical Data Quantitative Data

Tabular Graphical Tabular Graphical


Displays Displays Displays Displays

• Frequency • Bar Chart • Frequency Dist. • Dot Plot


Distribution • Pie Chart • Rel. Freq. Dist. • Histogram
• Rel. Freq. Dist. • Side-by-Side • % Freq. Dist. •Scatter
• Percent Freq. Bar Chart • Cum. Freq. Dist. Diagram
Distribution • Stacked • Cum. Rel. Freq. Dist.
• Crosstabulation Bar Chart • Cum. % Freq. Dist.
• Crosstabulation

Subhodip Pal (MBA 19-21) 63


Measures of Location
• Mean
• Median
• Mode
• Weighted Mean
• Geometric Mean
• Percentiles
• Quartiles

Subhodip Pal (MBA 19-21) 64


Mean
• Perhaps the most important measure of location is the
mean.
• The mean provides a measure of central location.

 The mean of a data set is the average of all the data


values.

Subhodip Pal (MBA 19-21) 65


Sample Mean 𝑥ҧ
σ 𝑥𝑖
𝑥ҧ =
𝑛
where: Sxi = sum of the values of n
observations
n = number of observations
in the sample

Subhodip Pal (MBA 19-21) 66


Population Mean m
σ 𝑥𝑖
𝜇=
𝑁

where: Sxi = sum of the values of the N


observations
N= number of observations in
the population

Subhodip Pal (MBA 19-21) 67


Subhodip Pal (MBA 19-21) 68
Median
• The median of a data set is the value in the middle
when the data items are arranged in ascending order.
• Whenever a data set has extreme values, the median is
the preferred measure of central location.
• The median is the measure of location most often
reported for annual income and property value data.
• A few extremely large incomes or property values can
inflate the mean.

Subhodip Pal (MBA 19-21) 69


Median
• For an odd number of observations:

26 18 27 12 14 27 19 7 observations

12 14 18 19 26 27 27 in ascending order

The median is the middle value. Median = 19

Subhodip Pal (MBA 19-21) 70


Median
• For an even number of observations:

26 18 27 12 14 27 30 19 8 observations

12 14 18 19 26 27 27 30 in ascending order

The median is the average of the two middle values.


Median = (19 + 26)/2 = 22.5

Subhodip Pal (MBA 19-21) 71


Mode
• The mode of a data set is the value that occurs with
greatest frequency.
• The greatest frequency can occur at two or more
different values.
• If the data have exactly two modes, the data are
bimodal.
• If the data have more than two modes, the data are
multimodal.

Subhodip Pal (MBA 19-21) 72


Mode
• Example: Apartment Rents
550 occurred most frequently (7 times)
Mode = 550

525 530 530 535 535 535 535 535 540 540
540 540 540 545 545 545 545 545 550 550
550 550 550 550 550 560 560 560 565 565
565 570 570 572 575 575 575 580 580 580
580 585 590 590 590 600 600 600 600 610
610 615 625 625 625 635 649 650 670 670
675 675 680 690 700 700 700 700 715 715

Note: Data is in ascending order.

Subhodip Pal (MBA 19-21) 73


Percentiles
• A percentile provides information about how the data
are spread over the interval from the smallest value to
the largest value.
• Admission test scores for colleges and universities are
frequently reported in terms of percentiles.
 The pth percentile of a data set is a value such that at least p
percent of the items take on this value or less and at least (100 -
p) percent of the items take on this value or more.

Subhodip Pal (MBA 19-21) 74


Percentiles
• Arrange the data in ascending order.
• Compute Lp, the location of the pth percentile.

Lp = (p/100)(n + 1)

Subhodip Pal (MBA 19-21) 75


PERCENTILE
 If all you are interested in is where you stand compared to
the rest of the herd, you need a statistic that reports
relative standing, and that statistic is called a percentile.

Subhodip Pal (MBA 19-21) 76


Subhodip Pal (MBA 19-21) 77
Subhodip Pal (MBA 19-21) 78
Measures of Variability
• Range
• Interquartile Range
• Variance
• Standard Deviation
• Coefficient of Variation

Subhodip Pal (MBA 19-21) 79


Range
• The range of a data set is the difference between the
largest and smallest data value.
Range = Largest value – Smallest value

• It is the simplest measure of variability.


• It is very sensitive to the smallest and largest data values.

32
Range
• Example: Apartment Rents
Range = largest value - smallest value
Range = 715 - 525 = 190

525 530 530 535 535 535 535 535 540 540
540 540 540 545 545 545 545 545 550 550
550 550 550 550 550 560 560 560 565 565
565 570 570 572 575 575 575 580 580 580
580 585 590 590 590 600 600 600 600 610
610 615 625 625 625 635 649 650 670 670
675 675 680 690 700 700 700 700 715 715

Subhodip Pal (MBA 19-21) 81


Interquartile Range
• The interquartile range of a data set is the difference
between the third quartile and the first quartile.

• It is the range for the middle 50% of the data.

• It overcomes the sensitivity to extreme data values.


INTERQUARTILE RANGE
 The interquartile range (IQR) is a measure of variability,
based on dividing a data set into quartiles.
 Quartiles divide a rank-ordered data set into four equal
parts. The values that divide each part are called the first,
second, and third quartiles; and they are denoted by Q1,
Q2, and Q3, respectively.
 Q1 is the "middle" value in the first half of the rank-
ordered data set.
 Q2 is the median value in the set.
 Q3 is the "middle" value in the second half of the rank-
ordered data set.

Subhodip Pal (MBA 19-21) 83


Subhodip Pal (MBA 19-21) 84
BOX Plot
 Limits are located (not drawn) using the
interquartile range (IQR).
 Data outside these limits are considered outliers.
 The location of each outlier is shown with the
symbol

Subhodip Pal (MBA 19-21) 85


IQR FOR EVEN NUMBER OF DATA SET

IQR FOR ODD NUMBER OF DATA SET

Subhodip Pal (MBA 19-21) 86


Variance
• The variance is a measure of variability that utilizes
all the data.
• It is based on the difference between the value of each obser
and the mean (𝑥ҧ for a sample, m for a population).

• The variance is useful in comparing the variability of


two or more variables.

Subhodip Pal (MBA 19-21) 87


Variance
• The variance is the average of the squared deviations
between each data value and the mean.
• The variance is computed as follows:

2
σ 𝑥𝑖 − 𝑥ҧ 2
2
σ 𝑥𝑖 − 𝜇 2
𝑠 = 𝜎 =
𝑛−1 𝑁
for a for a
sam popula
ple tion

Subhodip Pal (MBA 19-21) 88


Standard Deviation
• The standard deviation of a data set is the positive
square root of the variance.
• It is measured in the same units as the data, making
it more easily interpreted than the variance.

Subhodip Pal (MBA 19-21) 89


Standard Deviation
• The standard deviation is computed as follows:

s = 𝑠2 s= s2
for a for a
sample population

Subhodip Pal (MBA 19-21) 90


Subhodip Pal (MBA 19-21) 91
z-Scores
• The z-score is often called the standardized value.

𝑥𝑖 −𝑥ҧ
𝑧𝑖 =
𝑠

• It denotes the number of standard deviations a data value


xi is from the mean.

Subhodip Pal (MBA 19-21) 92


z-Scores
• An observation’s z-score is a measure of the relative
location of the observation in a data set.
• A data value less than the sample mean will have a z-
score less than zero.
• A data value greater than the sample mean will have a
z-score greater than zero.
• A data value equal to the sample mean will have a z-
score of zero.

Subhodip Pal (MBA 19-21) 93


z-Scores
• Example: Apartment Rents
 z-Score of Smallest Value (525)
𝑥𝑖 −𝑥ҧ 525−590.80
𝑧𝑖 = = = -1.20
𝑠 54.74

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

Subhodip Pal (MBA 19-21) 94


Measures of Association Between Two
Variables
• Thus far we have examined numerical methods used
to summarize the data for one variable at a time.
• Often a manager or decision maker is interested in
the relationship between two variables.
• Two descriptive measures of the relationship
between two variables are covariance and correlation
coefficient.

Subhodip Pal (MBA 19-21) 95


Covariance
• The covariance is a measure of the linear association
between two variables.
• Positive values indicate a positive relationship.
• Negative values indicate a negative relationship.

Subhodip Pal (MBA 19-21) 96


Covariance
• The covariance is computed as follows:

σ(𝑥𝑖 −𝑥)(𝑦
ҧ ത
𝑖 −𝑦)
For samples: 𝑠𝑥𝑦 =
𝑛−1

σ(𝑥𝑖 −𝜇𝑥 )(𝑦𝑖 −𝜇𝑦 )


For 𝜎𝑥𝑦 =
𝑁
population
s:

Subhodip Pal (MBA 19-21) 97


Correlation Coefficient
• Correlation is a measure of linear association and not
necessarily causation.
• Just because two variables are highly correlated, it
does not mean that one variable is the cause of the
other.

Subhodip Pal (MBA 19-21) 98


Correlation Coefficient
• The correlation coefficient is computed as follows:
𝑠𝑥𝑦
For 𝑟𝑥𝑦 =
𝑠𝑥 𝑠𝑦
samples:
𝜎𝑥𝑦
For 𝜌𝑥𝑦 =
𝜎𝑥 𝜎𝑦
populations:

Subhodip Pal (MBA 19-21) 99


Correlation Coefficient
• The coefficient can take on values between -1 and +1.
• Values near -1 indicate a strong negative linear
relationship.
• Values near +1 indicate a strong positive linear
relationship.
• The closer the correlation is to zero, the weaker the
relationship.

Subhodip Pal (MBA 19-21) 100


Covariance and Correlation Coefficient
• Example: Golfing Study
A golfer is interested in investigating the relationship, if any,
between driving distance and 18-hole score.
Average Driving Average
Distance (yds.) 18-Hole Score
277.6 69
259.5 71
269.1 70
267.0 70
255.6 71
272.9 69

Subhodip Pal (MBA 19-21) 101


Covariance and Correlation Coefficient
• Example: Golfing Study
x y (𝑥𝑖 -𝑥)ҧ (𝑦𝑖 -𝑦)
ത (𝑥𝑖 -𝑥)ҧ

277.6 69 10.65 -1.0 -10.65


259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
Std. Dev.8.2192 .8944

Subhodip Pal (MBA 19-21) 102


Covariance and Correlation Coefficient
• Example: Golfing Study
• Sample Covariance
σ(𝑥𝑖 −𝑥)(𝑦
ҧ ത
𝑖 −𝑦) −35.40
𝑠𝑥𝑦 = = = -7.08
𝑛−1 6−1

• Sample Correlation Coefficient


𝑠𝑥𝑦 −7.08
𝑟𝑥𝑦 = = = −.9631
𝑠𝑥 𝑠𝑦 8.2192 .8944)

Subhodip Pal (MBA 19-21) 103


Subhodip Pal (MBA 19-21) 104
Probability
• Probability is a numerical measure of the
likelihood that an event will occur.
• Probability values are always assigned on a scale
from 0 to 1.
• A probability near zero indicates an event is quite
unlikely to occur.
• A probability near one indicates an event is almost
certain to occur.

Subhodip Pal (MBA 19-21) 105


Probability as a Numerical Measure
of the Likelihood of Occurrence

Increasing Likelihood of Occurrence

Probability: 0 .5 1

The event The occurrence The event


is very of the event is is almost
unlikely just as likely as certain
to occur. it is unlikely. to occur.

Subhodip Pal (MBA 19-21) 106


Random Experiment and Its Sample Space
Experiment Experiment Outcomes
Toss a coin Head, tail
Inspect a part Defective, non-defective
Conduct a sales call Purchase, no purchase
Roll a die 1, 2, 3, 4, 5, 6
Play a football game Win, lose, tie

Subhodip Pal (MBA 19-21) 107


Union of Two Events
• The union of events A and B is the event containing
all sample points that are in A or B or both.
• The union of events A and B is denoted by A  B.

Sample
Event Event B Space S
A

Venn Diagram

Subhodip Pal (MBA 19-21) 108


Intersection of Two Events
• The intersection of events A and B is the set of all
sample points that are in both A and B.
• The intersection of events A and B is denoted by A  B.
Intersection of A and B

Sample
Event Event B Space S
A

Venn Diagram

Subhodip Pal (MBA 19-21) 109


Discrete Probability Distributions
• Random Variables
• Developing Discrete Probability Distributions
• Expected Value and Variance
• Binomial Probability Distribution
• Poisson Probability
Distribution

• Hypergeometric
Probability Distribution

Subhodip Pal (MBA 19-21) 110


Random Variables
• A random variable is a numerical description of the
outcome of an experiment.
• A discrete random variable may assume either a
finite number of values or an infinite sequence of
values.
• A continuous random variable may assume any
numerical value in an interval or collection of
intervals.

Subhodip Pal (MBA 19-21) 111


Discrete Probability Distributions
• The probability distribution is defined by a
probability function, denoted by f(x), that provides
the probability for each value of the random variable.
• The required conditions for a discrete probability
function are:

f(x) > 0 and ∑f(x) = 1

Subhodip Pal (MBA 19-21) 112


Discrete Probability Distributions

Using past data on TV sales, a tabular representation


of the probability distribution for sales was developed.

Number
Units Sold of Days x f(x)
0 80 0 .40 = 80/200
1 50 1 .25
2 40 2 .20
3 10 3 .05
4 20 4 .10
200 1.00

Subhodip Pal (MBA 19-21) 113


Discrete Probability Distributions
• The discrete uniform probability distribution is the
simplest example of a discrete probability
distribution given by a formula.
• The discrete uniform probability function is
f(x) = 1/n
where: n = the number of values the
random variable may assume
• The values of the random variable are equally
likely.

Subhodip Pal (MBA 19-21) 114


Expected Value
• The expected value, or mean, of a random variable
is a measure of its central location.
E(x) = m = ∑xf(x)
• The expected value is a weighted average of the
values the random variable may assume. The
weights are the probabilities.
• The expected value does not have to be a value the
random variable can assume.

Subhodip Pal (MBA 19-21) 115


Variance and Standard Deviation
• The variance summarizes the variability in the
values of a random variable.
Var(x) = s 2 = S(x - m)2f(x)
• The variance is a weighted average of the squared
deviations of a random variable from its mean. The
weights are the probabilities.
• The standard deviation, s, is defined as the positive
square root of the variance.

Subhodip Pal (MBA 19-21) 116


Expected Value

x f(x) xf(x)
0 .40 .00
1 .25 .25
2 .20 .40
3 .05 .15
4 .10 .40
E(x) = 1.20 = expected number of TVs sold in a day

Subhodip Pal (MBA 19-21) 117


Variance

x x-m (x - m)2 f(x) (x - m)2f(x)


0 -1.2 1.44
.40 .576
1 -0.2 0.04
.25 .010
2 0.8 0.64
.20 .128
3 1.8 3.24
.05 .162
4 2.8 7.84
.10 .784
Variance of daily sales = s 2 = 1.660
Standard deviation of daily sales = 1.2884 TVs

Subhodip Pal (MBA 19-21) 118


Binomial Probability Distribution
 Four Properties of a Binomial Experiment
1. The experiment consists of a sequence of n identical trials.
2. Two outcomes, success and failure, are
possible on each trial.
3. The probability of a success, denoted by p, does
not change from trial to trial. (This is referred to as
the stationarity assumption.)
4. The trials are independent.

Subhodip Pal (MBA 19-21) 119


Binomial Probability Distribution
• Our interest is in the number of successes
occurring in the n trials.
• Let x denote the number of successes occurring in
the n trials.

Subhodip Pal (MBA 19-21) 120


Binomial Probability Distribution
• Binomial Probability Function
𝑛!
𝑓 𝑥 = 𝑝 𝑥 (1 − 𝑝)(𝑛−𝑥)
𝑥! 𝑛 − 𝑥 !
where:
x = the number of successes
p = the probability of a success on one trial
n = the number of trials
f(x) = the probability of x successes in n trials
n! = n(n – 1)(n – 2) ….. (2)(1)

Subhodip Pal (MBA 19-21) 121


Binomial Probability Distribution
• Example: Evans Electronics
Evans Electronics is concerned about a low
retention rate for its employees. In recent years,
management has seen a turnover of 10% of the hourly
employees annually.

Thus, for any hourly employee chosen at random,


management estimates a probability of 0.1 that the
person will not be with the company next year.

Choosing 3 hourly employees at random, what is the


probability that 1 of them will leave the company this
year?

Subhodip Pal (MBA 19-21) 122


Binomial Probability Distribution
• Example: Evans Electronics
Using the probability function:
Let: p = .10, n = 3, x = 1

𝑛!
𝑓 𝑥 = 𝑝 𝑥 (1 − 𝑝)(𝑛−𝑥)
𝑥! 𝑛 − 𝑥 !
3!
𝑓 1 = 0.1 1 (0.9)2 = .243
1! 3−1 !

Subhodip Pal (MBA 19-21) 123


Binomial Probability Distribution
• Expected Value
E(x) = m = np

• Variance
Var(x) = s 2 = np(1 – p)

• Standard Deviation
𝜎= 𝑛𝑝(1 − 𝑝)

Subhodip Pal (MBA 19-21) 124


Binomial Probability Distribution
• Example: Evans Electronics
• Expected Value
E(x) = np = 3(.1) = .3 employees out of 3

• Variance
Var(x) = np(1 – p) = 3(.1)(.9) = .27

• Standard Deviation
𝜎= 3 .1 . 9) = .52 employees

Subhodip Pal (MBA 19-21) 125


Poisson Probability Distribution
• A Poisson distributed random variable is often
useful in estimating the number of occurrences
over a specified interval of time or space.
• It is a discrete random variable that may assume
an infinite sequence of values (x = 0, 1, 2, . . . ).

Subhodip Pal (MBA 19-21) 126


Poisson Probability Distribution
 Poisson Probability Function
𝜇 𝑥 𝑒 −𝜇
𝑓 𝑥 =
𝑥!
where:
x = the number of occurrences in an interval
f(x) = the probability of x occurrences in an interval
m = mean number of occurrences in an interval
e = 2.71828
x! = x(x – 1)(x – 2) . . . (2)(1)

Subhodip Pal (MBA 19-21) 127


Poisson Probability Distribution
• Example: Mercy Hospital
Patients arrive at the emergency room of Mercy
Hospital at the average rate of 6 per hour on
weekend evenings.

What is the probability of 4 arrivals in 30


minutes on a weekend evening?

Subhodip Pal (MBA 19-21) 128


Poisson Probability Distribution
• Example: Mercy Hospital
Using the probability function:
m = 6/hour = 3/half-hour, x = 4
34 (2.71828)−3
𝑓 4 = = .1680
4!

Subhodip Pal (MBA 19-21) 129


Continuous Probability Distributions
f (x) Exponential
 Uniform Probability Distribution
• Normal Probability Distribution
• Exponential Probability Distribution
x
Uniform
f (x)

Normal
f (x)

x
Subhodip Pal (MBA 19-21) 130
Continuous Probability Distributions
• A continuous random variable can assume any value in
an interval on the real line or in a collection of
intervals.
• It is not possible to talk about the probability of the
random variable assuming a particular value.

• Instead, we talk about the probability of the random


variable assuming a value within a given interval.

Subhodip Pal (MBA 19-21) 131


Normal Probability Distribution
 The normal probability distribution is the most important
distribution for describing a continuous random variable.

• It is widely used in statistical inference.

Subhodip Pal (MBA 19-21) 132


Normal Probability Distribution
• Characteristics
The entire family of normal probability distributions
is defined by its mean m and its standard deviation s
.

Standard Deviation s

x
Mean m

Subhodip Pal (MBA 19-21) 133


Normal Probability Distribution
• Characteristics
The highest point on the normal curve is at the
mean, which is also the median and mode.

Subhodip Pal (MBA 19-21) 134


Normal Probability Distribution
• Characteristics
The mean can be any numerical value: negative, zero, or positive.

x
-10 0 25

Subhodip Pal (MBA 19-21) 135


Normal Probability Distribution
• Characteristics
The standard deviation determines the width of the
curve: larger values result in wider, flatter curves.

s = 15

s = 25

Subhodip Pal (MBA 19-21) 136


Normal Probability Distribution
• Characteristics
Probabilities for the normal random variable are
given by areas under the curve. The total area under
the curve is 1 (.5 to the left of the mean and
.5 to the right).

.5 .5
x

Subhodip Pal (MBA 19-21) 137


Normal Probability Distribution
• Characteristics (basis for the empirical rule)

68.26% of values of a normal random variable


are within +/- 1 standard deviation of its mean.

95.44% of values of a normal random variable


are within +/- 2 standard deviations of its mean.

99.72% of values of a normal random variable


are within +/- 3 standard deviations of its mean.

Subhodip Pal (MBA 19-21) 138


Normal Probability Distribution
• Characteristics (basis for the empirical rule)
99.72%
95.44%
68.26%

m x
m – 3s m – 1s m + 1s m + 3s
m – 2s m + 2s

Subhodip Pal (MBA 19-21) 139


Standard Normal Probability Distribution
• Characteristics
A random variable having a normal distribution with
a mean of 0 and a standard deviation of 1 is said to
have a standard normal probability distribution.

Subhodip Pal (MBA 19-21) 140


Standard Normal Probability Distribution
• Characteristics
The letter z is used to designate the standard normal
random variable.
s=1

z
0

Subhodip Pal (MBA 19-21) 141


Standard Normal Probability Distribution
• Converting to the Standard Normal Distribution
𝑥−𝜇
z=
𝜎
We can think of z as a
measure of the number of
standard deviations x is from
m.

Subhodip Pal (MBA 19-21) 142


Standard Normal Probability Distribution
• Example: Pep Zone
Pep Zone sells auto parts and supplies including a
popular multi-grade motor oil. When the stock of
this oil drops to 20 gallons, a replenishment order is
placed.
The store manager is concerned that sales are
being lost due to stockouts while waiting for a
replenishment order.

Subhodip Pal (MBA 19-21) 143


Standard Normal Probability Distribution
• Solving for the Stockout Probability
Step 1: Convert x to the standard normal distribution.

z = (x - m)/s
= (20 - 15)/6
= .83
Step 2: Find the area under the standard normal curve to the left of z = .83.

Subhodip Pal (MBA 19-21) 144


Standard Normal Probability Distribution
• Cumulative Probability Table for the Standard Normal
Distribution
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
. . . . . . . . . . .
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
. . . . . . . . . . .

P(z < .83) = .7967

Subhodip Pal (MBA 19-21) 145


Standard Normal Probability Distribution
• Solving for the Stockout Probability
Step 3: Compute the area under the standard normal
curve to the right of z = .83.

P(z > .83) = 1 – P(z < .83)


= 1- .7967
= .2033

Subhodip Pal (MBA 19-21) 146


Standard Normal Probability Distribution
• Solving for the Stockout Probability

Area = .7967 Area = 1 - .7967


= .2033

z
0 .83

Subhodip Pal (MBA 19-21) 147

You might also like