Professional Documents
Culture Documents
SUBJECT – MANAGEMENT
SUBJECT CODE – 17 UNIT
- VII
990000000
[2]
CONTENTS
Chapters Titles
1 Fundamentals of Probability
2 Probability Distribution
3 Sampling
4 Hypothesis Test
[3]
CHAPTER 1
FUNDAMENTALS OF PROBABILITY
Probability:
The term probability refers to the chance of happening or not happening of an event.
In any statement when we use the word chance it means that there is an element of
uncertainty in that statement. A numerical measure of uncertainty is provided by the theory
of probability.
In the words of Morris Hamburg, “probability measures provide the decision maker
in the business and in government with the means for quantifying the
the uncertainties which
affect his choice of appropriate action.”
Origin:
The theory of probability has its origin in the games of chance related gambling like
drawing cards from a pack or throwing a dice, etc. Jerome Cardan (1501 – 1576) an Italian
mathematician
atician was the first man to write a book on the subject entitled, “Book on Games of
Chance”, which was published after his death in 1663. The foundation of the theory of
probability was laid by French mathematicians Blasé Pascal (1623 – 1662) and Pierre de
Fermat.
Terminology:
In order to understand the meaning and concept of probability, we must know
various terms in this context.
Random Experiment:
An experiment is called a random experiment if when conducted repeatedly under
essentially homogenous conditions, the result is not unique, i.e. it does not give the same
result. The result may be anyone of the various possible outcomes.
Sample Space:
The set of all possible outcomes of an experiment is called the sample space of that
experiment and is usually denoted by S. Every outcome (element) of the sample space is
called sample point.
Some random experiments of sample space:
(i) If an unbiased coin is
is tossed randomly, then there are two possible outcomes for
this experiment, viz., head (H) or tail (H) up. Then the sample space is
S= H,T
(ii) When two coins are thrown simultaneously, the sample space is
(v) When two dice are thrown simultaneously and the sum of the points is noted,
then the sample space is:
S = 2, 3, 4, 5, 6, 7, 8, 9, 10,11, 12
[4]
Trial and Event.:
Performing of a random experiment is called a ‘trial’ and outcome or outcomes are
termed as ‘events’. For instance, tossing of a coin would be called a trial and the result
(falling head or tail upward) an event.
Types of Events:
Exhaustive Cases:
The total number possible outcomes
outcomes of a random experiment Is called the exhaustive
cases for the experiment. For instance, in toss of a single coin, we can get head or tail. Hence
exhaustive number of cases is 2 because they between themselves exhaust all possible
outcome of the random experiment.
Symbolically, a set of events E1, E2, …..En is mutually exclusive if Ei Ej = (i j). This means
the intersection of two events is a null set ( ).
For example, let a dice be thrown once, the event E1 of getting an even number
is E1 = 2, 4, 6
The event E2 is getting an odd number is
E2 = 1, 3, 5
Independent Events.:
Events are said to be independent if occurrence of one does not affect the outcome
of any of the others. For instance, the result of the first toss of a coin does not affect the
result of successive tosses at all.
al
Dependent Events:
If the occurrence of the one event affects the happening of the other event, then they
are said to be dependent events. For instance the probability of drawing a king from a pack
of 52 cards is . If this card is not replaced before the second draw, the probability of
getting a king again in 3/51 as there are now only 51 cards left and they contain only 3
kings.
[5]
Compound Events:
Two events are said to be compound when their occurrences are related to each other. For
example, a dice is thrown once. The sample space S is
S = 1, 2, 3, 4, 5, 6
Let one event be E1 that is, of getting an even digit upper most,
i.e., E1 = 2, 4, 6
Let the other event be E2, that is, that of getting a number greater than 4,
i.e., E2 = 5, 6
Complementary Events:
If E is anyy subset of the sample space, then its complement denoted by E (read as E E-bar)
contains all the elements of the sample space that are not part of E. If S denoted the sample
space, then
E= S – E
= All sample elements not in E
Expressions of Probability:
Probability will always be a number between 0 and 1. If an event is certain to happen
its probability would be 1 and if it is certain that the event would not take place, then the
probability of its happening is zero.
The general rule of the happening of an event is that if an event can happen in m
ways and fail to happen ion n ways, then the probability (P) of the happening of the event is
given by
P =
or
P =
Example:
The odds against an event are 2 : 5. Find the probability of its happening.
Solution:
Odds against the event E are b : a i.e., 2 : 5
P (E) = = =
P (E) + P(E) = 1
P(E) = 1 – P(E)
[6]
=1- =
Hence, the probability of happening of the event is
Approaches of Probability:
1. Classical Approach:
This approach of defining probability is based on the assumption that all possible
outcomes (finite in number) of an experiment are mutually exclusive and equally likely.
likely.
If a random experiment is repeated finite number of times, out of which outcomes ‘a’ are
in favour of event A, outcomes ‘b’ are not in favour of event A and a all these possible
outcomes are mutually exclusive, collectively exhaustive and equally likely, then
probability of occurrence of event A is defined as:
as:
P(A) = = =
3. Subjective Approach:
The Subjective Approach of calculating probability is always based on the degree of
beliefs, convictions and experience concerning the likelihood of occurrence of a
random event. It is a way to quantify an individual’s beliefs, assessment and judgement
about a random phenomenon.
Probability assigned for the occurrence of an event
event may be based on just guess or on
having some idea about the relative frequency of past occurrences of the event. This
approach must be used when either sufficient data are not available or sources of
information giving different results are not known.
………………..
[7]
CHAPTER 2
PROBABILITY DISTRIBUTION
Probability Theory is the branch of mathematics concerned with probability, the
analysis of random phenomena.
Probability is a way of assigning every "event" a value between zero a and one, with the
requirement that the event made up of all possible results (in our example, the event
{1,2,3,4,5,6}) be assigned a value of one.
The central objects of probability theory are random variables, stochastic processes, and
events.
If an individual coin toss or the roll of dice is considered to be a random event, then if
repeated many times the sequence of random events will exhibit certain patterns, which
can be studied and predicted. Two representative mathematical results describing such
patterns are :
1. Law of large numbers
2. The central limit theorem
The mathematical theory of probability has its roots in attempts to analyze games of
chance by Gerolamo Cardano in the sixteenth century.
Probability theory mainly considered discrete events, and its methods were mainly
combinatorial.
1. Probability Theory deals with events that occur in countable sample spaces.
Examples: Throwing dice, experiments with decks of cards, random walk, and tossing coins.
3. Modern Definition: The modern definition starts with a finite or countable set called the
sample space, which relates to the set of all possible outcomes in classical sense.
[9]
Categorical Distribution, for a single categorical outcome (e.g. yes/no/maybe in
a survey); a generalization of the Bernoulli distribution
Multinomial Distribution, for the number of each type of categorical outcome,
given a fixed number of total outcomes; a generalization of the binomial
distribution
Multivariate Hypergeometric Distribution, similar to the multinomial
distribution, but using sampling without replacement; a generalization of the
hyper geometric distribution
VII) Related to events in a Poisson process (events that occur independently with a
given rate)
I. Poisson Distribution, for the number of occurrences of a Poisson-type event in a
given period of time
II. Exponential Distribution, for the time before the next Poisson-type event
occurs
III. Gamma Distribution, for the time before the next k Poisson-type events occur
…………………..
[10]
CHAPTER 3
SAMPLING
Sampling
It is concerned with the selection of a subset of individuals from within the population
to estimate characteristics of the whole population. Each observation measures one or more
properties (such as weight, location, color) of observable bodies distinguished as
independent objects or individuals.
I. Population Definition:
A population can be defined as including all people or items with the
characteristic one wishes to understand.
understand.
There is very rarely enough time or money to gather information from everyone
or everything in a population, the goal becomes finding a representative sample
(or subset) of that population.
population.
Population often consists of physical objects.
objects.
Sampling theory treat the observed population as a sample from a larger 'super
population'
II. Sampling Frame:
it is possible to identify and measure every single item in the population and to
include any one of them in our sample.
sample.
A sampling frame which has the property that we can identify every single
element and include any in our sample
sample
III. Sampling Methods:
Within any of the types of frame identified above, a variety of sampling methods can be
employed, individually or in combination. Factors commonly influencing the choice
between these designs include:
Nature and quality of the frame
frame
Availability of auxiliary information about units on the frame
frame
Accuracy requirements, and the need to measure accuracy
accuracy
Whether detailed analysis of the sample is expected
expected
Cost/operational concerns
concerns
Methods of Sampling:
There are two methods of Sampling
Probability Sampling
Sampling
Non Probability Sampling
Sampling
[11]
Probability Sampling :
Where very unit in the population has a chance (greater than zero) of being selected
in the sample.
This probability can be accurately determined.
The combination of these traits makes it possible to produce unbiased estimates of
population totals, by weighting sampled units according to their probability of
selection.
Every element in the population does have the same probability of selection, this is
known as an 'equal probability of selection' (EPS) design.
Example: we want to estimate the total income of adults living in a given street. We
visit each household in that street, identify all adults living there, and randomly select
one adult from each household. (For example, we can allocate each person a random
number, generated from a uniform distribution between 0 and 1, and select the
person with the highest number in each household). We then interview the selected
person and find their income.
People living on their own are certain to be selected, so we simply add their income
to our estimate of the total. But a person living in a household of two adults has only a
one-in-two chance of selection. To reflect this, when we come to such a household, we
would count the selected person's income twice towards the total. (The person who is
selected from that household can be loosely viewed as also representing the person who
isn't selected.)
[12]
from the high end and too few from the low end (or vice versa), leading to an
unrepresentative sample. Selecting (e.g.) every 10th street number along the street
ensures that the sample is spread evenly along the length of the street, representing
all of these districts. (Note that if we always start at house #1 and end at #991, the
sample is slightly biased towards the low end; by randomly selecting the start
between #1 and #10, this bias is eliminated.
Stratified Sampling:
i. The population frame can be organized by these categories into separate "strata."
ii. Each stratum is then sampled as an independent sub-population, out of which
individual elements can be randomly selected.
iii. Independent strata can enable researchers to draw inferences about specific
subgroups that may be lost in a more generalized random sample.
iv. Stratified sampling method can lead to more efficient statistical estimates.
v. Each stratum is treated as an independent population.
vi. There are, however, some potential drawbacks to using stratified sampling.
vii. It is costly and complex sample selection.
viii. It complicates the design, and potentially reducing the utility of the strata.
ix. Stratified sampling a larger sample.
A stratified sampling approach is most effective when three conditions are met:
i. Variability within strata are minimized
ii. Variability between strata are maximized
iii. The variables upon which the population is stratified are strongly correlated with
the desired dependent variable.
Cluster Sampling:
Sampling is often clustered by geography, or by time periods.
Clusters can be chosen from a cluster-level frame, with an element-level frame
created only for the selected clusters.
Cluster sampling is commonly implemented as multistage sampling.
It is a complex form of cluster sampling in which two or more levels of units are
embedded one in the other.
The 1st stage consists of constructing the clusters that will be used to sample from.
The 2nd stage, a sample of primary units is randomly selected from each cluster. In
each of those selected clusters, additional samples of units are selected, and so on.
……………….
[13]
CHAPTER 4
HYPOTHESIS TEST
Hypothesis Test
It is a method of statistical inference using data from a scientific study.
A result is called statistically significant if it has been predicted as unlikely to have
occurred by chance alone, according to a pre-determined
pre determined threshold probability, the
significance level.
The phrase
rase "test of significance" was coined by statistician Ronald Fisher.
These tests are used in determining what outcomes of a study would lead to a
rejection of the null hypothesis for a pre-specified
specified level of significance.
The critical region of a hypothesis test is the set of all outcomes which cause the null
hypothesis to be rejected in favour of the alternative hypothesis.
In the Neyman-Pearson
Pearson framework (see below), the process of distinguishing
between the null & alternative hypothesis is aided by identifying two conceptual
types of errors (type 1 & type 2)
H0 is true Truly not guilty H1 is true Truly guilty
Accept Null Hypothesis Acquittal Right decision Wrong decision Type II Error
Reject Null Hypothesis Conviction Wrong decision Type I Error Right decision
Definition of important terms used:
Statistical Hypothesis: A statement about the parameters describing a
population (not a sample).
sample).
Statistic: A value calculated from a sample, often to summarize the sample for
comparison purposes.
Simple Hypothesis:
Hypothesis Any hypothesis which specifies the population distribution
completely.
Composite Hypothesis: Any hypothesis which does not specify the population
distribution completely.
completely.
Null Hypothesis (H0): A simple hypothesis associated with a contradiction to a
theory one would like to prove.
alternative Hypothesis (H1): A hypothesis (often composite) associated with a
theory one would like to prove.
Statistical Test: A procedure whose inputs are samples and whose result is a
hypothesis.
Region of Acceptance: The set of values of the test statistic for which we fail to
reject the null hypothesis.
hypothesis.
Region of Rejection / Critical Region:
Region The set of values of the test statistic
for which the null hypothesis is rejected.
rejected.
Critical Value: The threshold value delimiting the regions of acceptance and
rejection for the test statistic.
[14]
Power of a test (1 − β)
The test's probability of correctly rejecting the null hypothesis. The complement of the
false negative rate, β. Power is termed sensitivity in biostatistics.
Size / Significance Level of a Test (α): For simple hypothesis, this is the test's
probability of incorrectly rejecting the null hypothesis. The false positive rate. For
composite hypothesis this is the upper bound of the probability of rejecting the null
hypothesis over all cases covered by the null hypothesis. The complement of the false
positive rate, (1 − α), is termed specificity in biostatistics.
p-value: The probability, assuming the null hypothesis is true, of observing a result at
least as extreme as the test statistic.
Statistical Significance Test: An experimental result was said to be statistically
significant if a sample was sufficiently inconsistent with the (null) hypothesis. The
statistical hypothesis test added mathematical rigor and philosophical consistency to the
concept by making the alternative hypothesis explicit. The term is loosely used to
describe the modern version which is now part of statistical hypothesis testing.
Conservative Test: A test is conservative if, when constructed for a given
nominal significance level, the true probability of incorrectly rejecting the null
hypothesis is never greater than the nominal level.
Exact Test: A test in which the significance level or critical value can be computed
exactly, i.e., without any approximation. A statistical hypothesis test compares a test
statistic. The test statistic (the formula found in the table below) is based on
optimality. For a fixed level of Type I error rate, use of these statistics minimizes Type
II error rates (equivalent to maximizing power).
Uniformly Most Powerful Test (UMP): A test with the greatest power for all
values of the parameter(s) being tested, contained in the alternative hypothesis.
…………………..
[15]
CHAPTER 5
Correlation:
It is a statistical technique that can show whether and how strongly pairs of variables
are related.
It is a statistical measure that indicates the extent to which two or more variables
fluctuate together.
Correlation can be Positive or Negative.
Negative.
A positive correlation indicates the extent to which those variables increase or
decrease in parallel.
A negative correlation indicates the extent to which one variable increases as the
other decreases.
The main result of a correlation is called the correlation
correlation coefficient (or "r").
"r").
It ranges from -1.0
1.0 to +1.0.
+1.0.
The closer r is to +1 or -1,
1, the more closely the two variables are related.
related.
If r is close to 0, it means there is no relationship between the variables.
variables.
If r is positive, it means that as one variable gets larger the other gets larger. If r is
negative it means that as one gets larger, the other gets smaller.
smaller.
While correlation coefficients are normally reported as r = (a value between -1 and +1),
squaring them makes then easier to understand. The square of the coefficient (or r square) is
equal to the percent of the variation in one variable that is related to the variation in the
other. After squaring r, ignore the decimal point. An r of .5 means 25% of the variation is
related (.5 squared =.25). An r value of .7 means 49% of the variance is related (.7 squared =
.49).
Types of Correlation:
1. Positive correlation occurs when an increase in one variable increases the value
in another.
[16]
2. Negative correlation occurs when an increase in one variable decreases the value
of another.
[17]
5. Strong Correlation: A correlation is stronger the closer the points are located to one
another on the line.
6. Weak Correlation: A correlation is weaker the farther apart the points are located
to one another on the line.
…………………
[18]
CHAPTER 6
Assumptions:
The assumptions underlying a t-test are that
Z follows a standard normal distribution under the null hypothesis
2 2
s follows a χ distribution with p degrees of freedom under the null hypothesis,
where p is a positive constant
Z and s are independent.
independent
[19]
where :
= the sample mean,
s = the sample standard deviation of the sample
and n = the sample size.
Two sample ‘t’ test: Two samples can be Independent or Dependent samples.
Independent (unpaired) Samples: The independent samples t-test is used when two
separate sets of independent and identically distributed samples are obtained, one from
each of the two populations being compared.
The randomization is not essential here.
This test is only used when both:
the two sample sizes (that is, the number, n, of participants of each group) are equal;
it can be assumed that the two distributions have the same variance.
where
here,
is the grand standard deviation (or pooled standard deviation),
1 = group one, 2 = group two.
and are the unbiased estimators of the variances of the two samples.
Paired Samples:
Paired samples t-tests typically consist of a sample of matched pairs of similar units, or one
group of units that has been tested twice (a "repeated measures" t-test).
Average (XD) and standard deviation (sD), constant μ0 is non-zero if you want to test
whether the average of the difference is significantly different from μ0. The degree of
freedom used is n − 1.
Each subject having to be tested twice. Because half of the sample now depends on
the other half, the paired version of Student's t-test has only 'n/2 − 1' degrees of
freedom (with 'n' being the total number of observations).
A paired samples t-test based on a "matched-pairs sample" results from an unpaired
sample that is subsequently used to form a paired sample, by using additional
variables that were measured along with the variable of interest
…………………..
[20]
CHAPTER 7
DATA ANALYSIS AND MANAGEMENT INFORMATION SYSTEM
[21]
There are numerous ways under which data analysis procedures are broadly defined. The
following diagram makes it evident.
[22]
multidimensional scaling, brands are shown in a space of attributes in which distance
between the brands represents dissimilarity. An example of multidimensional scaling in
market research would show the manufacturers of single serving coffee in the form of "K-
cups." The different K-cup brands would be arrayed in the multidimensional space by
attributes such as strength of roast, number of flavored and specialty versions,
distribution channels, and packaging options.
Other Methods:
a. Mechanical Devices
b. Projective Techniques
c. Depth Interviews
Observation Method:
Information is sought by way of investigation own direct observation without asking
from the respondents.
It is most commonly used method.
Generally used in studies relating to behavioural sciences.
Systematically planned and recorded.
Its reliability depends on checks and control.
Subjective biasness is eliminated.
Information obtained under this method related to what is currently happening.
Independent of respondents willingness to response.
It is an expensive method.
The information provided by this method is very limited.
………………………………
[23]
SAMPLE QUESTIONS
1. The level of significance is the probability of committing the :
(a) Type I error (b) Type II error (c) standard error (d) probable error
Ans. A
(Dec. 2008, Paper-II)
2. A researcher wants to test the significance of the differences of the average performance of
more than two sample groups drawn from a normally distributed population, which one of
the following hypothesis testing in appropriate.
(1) Chi- square test (2) F – test (3) z- test (4) t- test
Ans. 2
(July 2016, Paper-II)
3. When a researcher wants to test whether two samples can be regarded as drawn from the
normal populations having same variance by using variance ratio, which one of the following
tests of hypothesis is appropriate.
(1) Z- test (2) t- test (3) Chi- square test (4) F- test
Ans. 4
(July 2016, Paper-III)
4. The collective set of tools and techniques used to develop a quality assurance system when
business processes show variations, is known as
(A) Quality Assurance Process
(B) Quality Management System
(C) Statistical Quality Control
(D) Statistical Process Control
Ans. D
(Dec. 2014, Paper-III)
[24]