You are on page 1of 37

Mailam Engineering College Statistics for Management (BA5106)

UNIT 1
PART A
1. Define Probability.
The term probability means “It is a chance of occurrence of a certain event when
appeared quantitatively”.

Probability = m = Number of favorable cases


n Total Number of Outcomes

2. What is mean by Random trial experiment?


Any experiment whose outcome cannot be predicted or determined in advance.
E.g., tossing a coin or throwing a dice.
3. Define /what is mean by Exhaustive cases?
All the possible outcomes of an experiment are called Exhaustive events. E.g., throw
dice, any one of the 6 faces may turn up and therefore, there are 6 possible outcomes.
4. What are mutually exclusive and independent events? (Jan 2015)
Mutually Exclusive events:
Two events are said to be mutually exclusive if the occurrence of anyone of them excludes the
occurrence of the other in a single experiment.
Eg: If a coin is tossed, the event head (H) and tail (T) are mutually exclusive.
Independent event:
Two or more events are independent if the occurrence of one does not affect the occurrence of the other.
Eg: If a coin is thrown twice, the result of the second throw is not affected by the result of the first throw.
5. Define /what is mean by equally likely events?
Events are said to be equally likely if there is no reason to expect any one in preference to
other. E.g., in throwing a dice, all the 6 faces (1, 2, 3, 4, 5, and 6) are equally likely to occur.
6. Define /what is mean by Dependent event?
Two events are said to be dependent. If the occurrence or non-occurrence of an event in
the trial affects the occurrence of other event in the trail.
7. What is probability distribution?
Let X be discrete random variable and which takes the value X1 , X2,Xn Such that
p[X=x1]=p[X=x2]=…..pn . The function p(X) is called probability mass function [p.m.f]
It satisfies the following conditions

Prepared by A.Anitha – AP/DOMS Page 1


Mailam Engineering College Statistics for Management (BA5106)

1. P[X=X]=0
2. ∑ P(x)=1
8. Write the usefulness of Poisson distribution?
The Poisson distribution can be considered to be a good approximation of the binomial
distribution when the number of trials (n) is large and the probability of success (p) is very small
(i.e) as n∞ and p0 . It is given by the function.

e-λ λx
P(X=x) = -----, x=0,1, 2…..
X!

9. Define conditional probability. (June 2013)


Let A be any event in the sample space S and P(A) >0. The probability that an event B
occurs subject to the A has already occur is called the conditional probability of B given that A
has already occur. It is denoted by P(B/A).
10. What is random variable?
A random variable is a variable whose value is unknown or a function that assigns
values to each of an experiment's outcomes.
Eg: In tossing a coin the outcome head may be assigned the value ‘1’ and the outcome of tail
may be assigned the value ‘0’.
11. State baye’s theorem on rule of inverse probability? (Jan 2013)
Solution:
P(B).P(A/B)
P(B/A) = ---------------

∑ P(A/Bn).P(Bn)
n=1
where B1,B2….Bn are mutually exclusive and exhaustive set of events.

12. Probability (Classical Definition):


If an experiment has n mutually exclusive, equally likely and exhaustive cases, out of
which m are favorable to the happening of the event A, then the probability of the happening of
A is denoted by P(A) and is defined as:
No. of cases favorable to A (m) m
P (A) = ------------------------------------------- = ----
Total (Exhaustive) number of cases (n) n

 Probability of an event which is certain to occur is 1

Prepared by A.Anitha – AP/DOMS Page 2


Mailam Engineering College Statistics for Management (BA5106)

 Probability of an impossible event is 0


 The probability of occurrence of any event lies between 0 and 1, both inclusive.
13. Define Baye’s theorem.
Let A1, A2,A3…….An be the set of mutually exclusive and exhaustive events whose union
is the random sample space S, of an experiment .If E be any arbitrary event associated with all
A1,A2,A3,………An and P(E) ≠ 0 then
P (Ai) P (E/Ai)
P (Ai/E) = n
∑ P (Ai) P (E/Ai)
i = 1.
14. Define /what is mean by Binomial distribution?
 Binomial distribution is a discrete distribution. Consider n independent trails of a random
experiment having only two outcomes success and failure
 Let the probability of success in any trail be p and failure be q (q=1-p)
 Let x represents the random variable by denoting ‘X’ number of success in n number of
trails.
 Then the probability mass function of binomial random variable X is defined as follow
P(X=x) = nCrprqn-r ,x= 0,1,2…..n
where n and p are called parameters of the binomial distribution and it is simply denoted as
B(n,p).
∑ fx
Mean = np =
∑f (N )
Variance= npq
Standard Deviation = √npq
15. What is normal distribution?
The normal distribution is a bell-shaped theoretical distribution that predicts the frequency of
occurrence of chance events. The curve is symmetrical: half of the total area is to the left and the other
half to the right.

16. Find the expected value of Binomial distribution.(May/June 2016)

Prepared by A.Anitha – AP/DOMS Page 3


Mailam Engineering College Statistics for Management (BA5106)

E[ X ] = (np) (p +(1 – p))n – 1 = np.


The expected value of the binomial distribution B( n, p) is n p.
17. State the theorem of total probability.(Nov/Dec 2018)
Additional probably is otherwise called as Total probnability. Given   mutually exclusive
events  , ...,   whose probabilities sum to unity, then

18. Write the mean and variance of uniform distribution. (Nov/Dec 2018)
Mean = a+b = 1
2
Variance = (a+b)2
12
19. What are the different types of variables?( Jan 2015)

i) Numerical variables: There are 2 types


a) Discrete numerical variable
A variable whose values are whole numbers (counts) is called discrete.
Example: The number of items bought by a customer in a supermarket is discrete.
b) Continuous numerical variable
A variable that may contain any value within some range is called continuous.
Example: The time that the customer spends in the supermarket is continuous
ii) Categorical variables
Categorical variables can be further categorized into ordinal and nominal variables.
a)Ordinal categorical variable
A categorical variable whose categories can be meaningfully ordered is called ordinal.
Example, a student's grade in an exam (A, B, C or Fail) is ordinal.
b) Nominal categorical variable
It does not matter which way the categories are ordered in tabular or graphical displays of the
data -- all orderings are equally meaningful.
Example: a student's religion (Atheist, Christian, Muslim, Hindu,) is nominal.

PART-B

1. Characteristics of Binomial Distribution:

1. It is a discrete distribution which gives the theoretical probabilities.


2. It depends on the parameters p or q, the probability of success or failure and n (the number of

Prepared by A.Anitha – AP/DOMS Page 4


Mailam Engineering College Statistics for Management (BA5106)

trials). The parameter n is always a positive integer.


3. The distribution will be symmetrical if p=q. it is skew-symmetric if p≠q although with n tending to
be large it is approximately so.
4. The statistics of the binomial distribution are mean=np; variance =npq; and standard
deviation=√npq.
5. The mode of the binomial distribution is equal to that value of x which has the largest frequency.
6. It can be represented graphically, taking the x-axis to represent the number of successes and y-axis
to represent the probabilities or frequencies.
7. The shape and the location of a binomial distribution changes as p changes for a given n or n
change for a given p.
8. The binomial distribution coefficients are given by the Pascal’s Triangle.

2. Characteristics of Poisson distribution:


1. Poisson distribution is a discrete distribution. It gives theoretical probabilities and theoretical
frequencies of a discrete variable.
2. It depends mainly on the value of the mean m.
3. This distribution is positively skewed to the left. With the increase in the value of the mean m, the
distribution shifts to the right and the skewness diminishes.
4. Its arithmetic mean in relative distribution is P and in absolute distribution is np.
5. If n is large and P is small, this distribution gives a close approximation to Binomial distribution.
Since the arithmetic mean of Poisson is same as that of Binomial, so the Poisson distribution can be
used instead of Binomial if n of p is not known.
6. Poisson distribution has only one parameter, viz., m, the arithmetic mean. Thus the entire
distribution can be determined once the arithmetic mean is known.
7. Poisson distribution is based on the following assumptions:
(i) Statistical independence is assumed, i.e., the occurrence or non-occurrence of an event does not
influence the other events.
(ii) The probability of happening of more than one eventin a very small interval is negligible.
(iii) The probability of success for a small space or a short interval of time is proportional to the
space or length of time intervals as the case may be.

3. Properties of a Normal Cure / Normal Distribution:


Prepared by A.Anitha – AP/DOMS Page 5
Mailam Engineering College Statistics for Management (BA5106)

The normal probability cure with mean µ and standard deviation σ has the following
properties:

The equation of the cure is: y = e-(x-µ)2/2σ2


σ √2 π -∞<x<∞
1. It is a bell-shaped. The top of the bell is directly above the mean µ.
2. The cure is symmetrical about the line x=µ and x ranges from -∞<x<∞
3. Mean, mode and median coincide at x=µ as the distribution is symmetrical.
4. It can be show that it has arithmetic mean = µ and variance =σ2.
5. X-axis is asymptote to the curve.
6. The points of inflexion of the curve are at x=µ+σ, x=µ-σ are the curve changes from concave to
convex at x= µ+σ to x=µ-σ.
7. The total area under the normal curve is equal to unity and the percentage distribution of area under
the normal curve is given below and is shown also in the figure.
i) about 68% of the area falls between µ-σ and µ+σ.
ii) about 95.5% of the area falls between µ-2σ and µ+2σ.
iii) about 99.7% of the area falls between µ-3σ and µ+3σ.
8. in a normal distribution Q.D:M.D:S.D:: 10:12:15;
Where Q.D=Quartile deviation, M.D=Mean deviation and S.D=Standard deviation.
9. The mean deviation from the mean is normal distribution is equal to (4/5) of its standards
deviation i.e M.D=0.8σ.
10. The first and third quartiles are given by: q1=µ-0.675σ and q3=µ+0.675σ. Also Q.D=0.675σ.
11. The maximum ordinate lies at the mean, i.e at x=µ.
12. The curve of normal distribution has a single peak i.e, it is a unimodal.
13. The two tails of the curve extend indefinitely and never touch the horizontal line.14. The
mathematical equation is completely determined if µ and σ are known.
4. Uses of Normal Distribution:
1. The normal distribution can be used to approximate the Binomial and Poisson distributions.
2. It has extensive use in sampling theory. It helps us to estimate parameter from statistic and to
find confidence limits of the parameter.
3. It has a wide use in testing Statistical Hypothesis and Tests of Significance in which it is
always assumed that the population from which the samples have been drawn should have normal
Prepared by A.Anitha – AP/DOMS Page 6
Mailam Engineering College Statistics for Management (BA5106)

distribution.
4. It has significant applications in statistical quality control as the control chart in statistical
quality control is closely related to normal distribution.
5. It can be used for smoothing and graduating a distribution which is not normal, simply by
contracting a normal case.
6. It serves as a guiding instrument in the analysis and interpretation of statistical data.

5. Assumption for the binomial, poisson and normal distributions. Under what conditions can
you approximate a binomial and a poisson distribution as a normal distribution? How will you
translate the distribution parameters into normal distribution parameters? (Jan 2015)
Assumption for binomial

i) ‘n’ independent trials


ii) Each trial results 2 out comes
iii) Prob-success ‘p’ remain constant

Assumption for poission

i) N->∞
ii) P->0
iii) P=π\n

Assumption for normal

i) Mean = median = mode


ii) The point of inflation of the normal curve is µ±σ
n→∞ n→∞
Binomial → Poisson → normal p→0
6. What is random variable? Explain its types
Random variable:
A random variable is a variable whose value is unknown or a function that assigns values to
each of an experiment's outcomes.
Eg: In tossing a coin the outcome head may be assigned the value ‘1’ and the outcome of tail may be
assigned the value ‘0’.

Prepared by A.Anitha – AP/DOMS Page 7


Mailam Engineering College Statistics for Management (BA5106)

Types of Random variables


 We classify random variables based on their probability distribution. They are:
i) Probability distribution - Discrete Random Variable)
ii) Probability density function - Continuous Random Variable
 Therefore, we have two types of random variables – Discrete and Continuous.
i) Discrete Random Variables
 Discrete random variables take on only a countable number of distinct values. If a random variable
can take only a finite number of distinct values, then it is discrete.
 The probability distribution of these variables is a list of probabilities associated with each of its
possible values. It is also called the probability function or the probability mass function.
 If a random variable (X) takes ‘k’ different values, with the probability that X = xi is defined as P(X
= xi) =pi, then it must satisfy the following:
 0 < pi < 1 (for each ‘i’)
 p1 + p2 + p3 + … + pk = 1
Example:
i) Number of members in a family, ii) number of cars sold by a car dealer in one month, iii) number of
defective light bulbs in a box of 10 bulbs, etc.

Example of Discrete Random Variables

You toss a coin 10 times. The random variable X is the number of times you get a ‘tail’. X can only
take values 0, 1, 2, … , 10. Therefore, X is a discrete random variable.

Continuous Random Variables


Continuous random variables take up an infinite number of possible values which are usually in a
given range. Typically, these are measurements like weight, height, the time needed to finish a task,
etc.
EG: i) The depth of drilling to find oil ii) The weight of a truck in a truck-weighing station, iii)The
amount of water in a 12-ounce bottle
Example:
 The life of an individual in a community is a continuous random variable. Let’s say that the
average lifespan of an individual in a community is 110 years.

Prepared by A.Anitha – AP/DOMS Page 8


Mailam Engineering College Statistics for Management (BA5106)

 Therefore, a person can die immediately on birth (where life = 0 years) or after he attains an age
of 110 years. Within this range, he can die at any age. Therefore, the variable ‘Age’ can take any
value between 0 and 110.
 Hence, continuous random variables do not have specific values since the number of values is
infinite. Also, the probability at a specific value is almost zero. Instead, it is defined over an
interval of values and represented by the area under a curve.

UNIT-II
1. What is sampling?
 “Sampling” basically selecting people/objects from a “population” in order to test the whole
population for something.
For example, we might want to find out how people are going to vote at the next election.
Obviously we can’t ask everyone in the country, so we take a sample.

2. What is Population? / Define Population. (Jan 2016)

 Population in statistics means the whole of the information which comes under the preview
of Statistical investigation.

 A population may be finite or infinite according as the number of individuals. E.g.: The
population of the heights of the students in a school.

3. What is mean by Sample? (Jan 2016)


A Part of population selected for a study is called sample. In statistics, a sample is a subset
of a population that is used to represent the entire group as a whole.

4. What is Sample size?


The number of individuals included in the finite sample is called sample size . It is typically
denoted by n, a positive integer (natural number).

5. Define Parameter and Statistic. (Jan 2016) , (May / June 2016)


Parameter:
Any Statistical measure computed from population data is known as parameter. Population
mean, Population median, Population variance, Population Co-efficient of variation ect., are all
parameters.

Prepared by A.Anitha – AP/DOMS Page 9


Mailam Engineering College Statistics for Management (BA5106)

Statistic:
Any Statistical measure computed from sample data is known as statistic.
6. Define Sampling.
Sampling is the procedure or process of selecting a sample from the population. It is the study of
existing relationship between a population and a sample drawn from the population.
7. What is meant by Statistical estimation?
It helps in estimating an unknown population parameter (such as population mean, median,
mode, standard deviation, kurtosis ect.) on the basis of suitable statistic ( such as sample mean,
median , mode, variance ect.) computed from the sample drawn from such parent population.

8. What is mean by Statistical Inference?


Statistical inference means drawing conclusion about some matters on the basis of certain
statistical results.
9. Define Unbiased estimate.( Apr/May 2018)
An estimator is said to be unbiased if its expected value is equal to the population parameter
of estimates. E(t) = Ө.
Example: The mean value of sampling distribution of statistic ‘t’ is equal to the parameter of
population. E(x ) = μ
10. What is the central limit theorem?(Apr/May 2019)
Explain central limit theorem and its implications.(JAN 2015)
 The relationship between the shape of the population distribution and the shape of the
population distribution of mean is called the central limit theorem.

 When sampling is done from a population with mean μ and finite standard deviation σ , the
sampling distribution of sample mean x will tend to be normal distribution with mean μ and
S.D σ/ √n as sample size becomes large.

For “large enough” n X ~ N (µ, σ2 / n) (or)

11. What is sampling distribution?(Apr/May 2019)


The probability distribution of the statistic that would be obtained if the number of
samples, each of same size were infinitely large is called its sampling distribution.

Prepared by A.Anitha – AP/DOMS Page 10


Mailam Engineering College Statistics for Management (BA5106)

12. Distinguish between Statistic and parameter.

Statistic Parameter

A numerical measure of sample is called as A numerical measure of population is called as


statistic or sample statistic. parameter or population parameter.

Sample statistic is used to estimate the Population parameter are estimated by sample
population parameter and it is called as statistics,
estimator of parameter.

13. What is Standard error of mean?

 It is the Standard deviation of sampling distribution of sample means.

 It is denoted by σᵪ = σ/√n.

Where σ = Standard deviation of population; n – sample size

14. If Random samples Come from the normal population, what can be said about the sampling
distribution of mean? (Nov/Dec 2016)

If the population is normal , the sampling distribution of mean (x) is also normal for Samples of
all sizes

15. What are the properties of an good estimator?(Jan 2013)


1) Consistency.
2) Unbiased.
3) Efficiency and
4) Sufficiency.
16. What is sampling errors? (May /June 2016)
Error in a statistical analysis arising from the unrepresentativeness of the sample taken is called
sample error. Sampling depends on chance and due to the existence of chance in sampling, the
sampling error occur. Errors in sampling arise primarily due to the following
1. Faulty selection of the sample
2. Substitution
3. Faulty demarcation of sampling units

Prepared by A.Anitha – AP/DOMS Page 11


Mailam Engineering College Statistics for Management (BA5106)

4. Variability of population.
18. What is non-sampling errors?
Non-sampling errors automatically creep in due to human factor which always varies from
one investigator to another. The non-sampling error arises from the following reason.
1. Quality planning
2. Error in response
3. Non-response by ours
4. Errors in design of the survey
5. Errors in complication
6. Publication error
19. What are the types of estimation?
1. Point estimation
2. Interval estimation
20. Define point estimation?
 In point estimation, a single statistics is used to provide an estimate of the population
parameter.
 The estimate of the population parameter given by a single number is called the point
estimation.
21. Define interval estimation?(Nov/Dec 2018)
 Interval estimation is the range of values used in making estimation of a population
parameter.
 The interval estimation of a population parameter ‘ө’ is the estimation of the parameter
‘ө’ with the help of the interval (t-s, t+s) Where ‘t’ is sample statistics (i.e) t-s ≤ ө ≤ t+s

22. Distinguish between one tail and two tail tests.


One tailed test Two tailed test

1.A test of statistical hypothesis is one sided is When the test of hypothesis is made on the
called as one tailed test. basis of rejection region represented by both
side of the standard normal curve it is called
a two tailed test.
2. It may be left or right side. It accepts the both side.

Prepared by A.Anitha – AP/DOMS Page 12


Mailam Engineering College Statistics for Management (BA5106)

23. Distinguish between point estimation and interval estimation of population parameter.(jan
2015)
Point estimate Interval estimate
When a single statistic is used to provide an In an estimate parameter lies between the
estimate it is called as point estimate of range of values (i.e) two numbers is called as
population parameter.
interval estimate. This is otherwise called as
confidence interval.
It is deterministic It is Probabilistic

24. Why does sampling introduce errors in research studies? Jan 2014
Sampling assumes that a small subset represents the whole population which might not be the
case.
25. Define standard error.(May/ June 2016)
The standard deviation of sampling distribution of a statistic in testing the hypothesis is called as
Standard error of statistic.
26. Write two properties of the sampling distribution of mean when population is normally
distributed. (Jan 2016)
 Mean = population mean i.e µx = µ
 Standard deviation = Population standard deviation / square root of sample size i.e σx =
σ / √n

27. What is meant by Sampling distribution of Mean?

The sampling distribution of x is the probability distribution of all possible values of the sample
mean x. If a population is normal, the sampling distribution of the mean (x) is also normal for all
sizes.

28. Define Confidence Co-efficient or Confidence level


A confidence coefficient, or confidence level, is a measure of the accuracy and
repeatability of a statistical test. ... Together with margin of error, a confidence coefficient
defines the expected results of subsequent tests.
29. Give 2 rules for determining the sample Size.
 The value of population σ and population Proportion may be actual or estimated.

Prepared by A.Anitha – AP/DOMS Page 13


Mailam Engineering College Statistics for Management (BA5106)

 The value of level of significance (zα) and Error (E) must be specified.
30. What is standard error of proportion?
 It is computed from proportions of all possible samples of same size drawn from a
population.
 It is denoted by σp = √pq/n
Where P= population proportion; q = 1-p ; n= Sample Size

Part B
1. EXPLAIN SAMPLING TECHNIQUES.
DESCRIBE THE PROBABILITY AND NON-PROBABILITY SAMPLING METHODS

INTRODUCTION:

Sampling is a process of selecting a sufficient number of elements from the population. Sample
designs are basically of two types viz.,i) Probability Sampling ii) Non-Probability sampling

I. PROBABILITY SAMPLING

 Probability sampling is also known as 'Random sampling' or 'chance sampling'.

 Under this sampling design, every item of the universe has an equal chance of inclusion in the
sample. .

TYPES OF PROBABILITY SAMPLING

1) SIMPLE RANDOM SAMPLING:


 In this each unit of the population has equal chance of being included in the sample.
 In this method ‘N’ = the size of the population and 'n' units are to be drawn in the sample, and then
the sample should be in the form of NCn.
Advantages:
 Easy method to use
 Equal and independent chance of selection to every element
Disadvantages:
 If sampling frame is large, this method impracticable.
2) SYSTEMATIC SAMPLING:

Prepared by A.Anitha – AP/DOMS Page 14


Mailam Engineering College Statistics for Management (BA5106)

 In this method every nth element in the population starting with a randomly chosen element between
1 and n.
 In this sampling, one unit is selected at random from the universe and the other units are at a
specified interval from the selected unit.
Example: If the researcher wants to conduct a study on consumption rate of ‘Aavin Milk’ in a colony
of 400 houses. The sample size is supposed to be 100. So the researcher chooses every 4 th house like
4, 8, 12, and 16.
Advantages:
 Sample easy to select and Cost effective
 Suitable sampling frame can be identified easily
Disadvantages:
 Sample may be biased
 Each element does not get equal chance

3) STRATIFIED RANDOM SAMPLING:


Stratified random sample is one in which random selection is not done from the
heterogeneous universe as a whole but from different homogeneous parts or 'strata' of a
universe.
Advantages:

 Higher statistical efficiency


 Easy to carry out
Disadvantage:
 Classification error will occur
 Time consuming and expensive
4) CLUSTER SAMPLING:

 In this method, the universe is divided into some recognizable sub-groups which are called
'clusters'. After this a simple random sample of these clusters is drawn and then all the units
belonging to t clusters constitute the sample.

Prepared by A.Anitha – AP/DOMS Page 15


Mailam Engineering College Statistics for Management (BA5106)

 In cluster sampling, groups of elements that are heterogeneous form a group and then the groups are
chosen randomly.

Example: A researcher wants to know the climatic changes and its effects over people throughout
Tamilnadu. He divided TN into different parts like Northern TN, Southern TN, Eastern TN,
Western TN, Central TN and selects some samples in these areas.

Advantages:

 Easy to implement, cost effective


Disadvantages:

 Imprecise, difficult to compute and interpret results


5) MULTI-STAGE SAMPLING:
 This is a modified form of cluster sampling. In thisthe sample units are selected in two or three
or four stages.
 The universe is first divided into first-stage sample units, from which the sample is selected.
The selected first stage samples are then sub-divided into second stage units from which another
sample is selected. Third stage and fourth-stage sampling is done in the same manner if necessary.
6) AREA SAMPLING:
 The area sampling design constitutes geographical clusters.
 Area Sampling is a form of multi-stage sampling. It is more frequently used in those countries
which do not have a satisfactory sampling frame such as a population list.
7. DOUBLE SAMPLING:
 A sampling design where initially a sample is used in a study to collect some preliminary
information of interest, and later a sub-sample of the primary sample is used to examine the
matter in more detail, is called double-sampling.\
Eg: In an organization, the overview of the company is reviewed with 20 samples. Again with the
same 20 samples, the detail of the organization is surveyed

II. NON- PROBABILITY SAMPLING


Non- probability sampling is that sampling procedure which does not afford any basis for estimating
the probability that each item in the population has been included in the sample.

Prepared by A.Anitha – AP/DOMS Page 16


Mailam Engineering College Statistics for Management (BA5106)

TYPES OF NON PROBABILITY SAMPLING

The various non- probability sampling methods are:

1) CONVENIENCE SAMPLING:

 In this researcher chooses the sampling units on the basis of convenience or accessibility. It is called
accidental samples because the sample-units enter by accident. This is also known as a sample of the
man in the street.
Eg: MD of that company wants to know about the competitive products, its features and pricing
strategies, he enquires only with those 5 officers.

Advantage:

 A sample selected for ease of access, immediately known population group and good response
rate.
Disadvantage:

 It cannot generalise findings , so cannot move beyond describing the sample


2) JUDGMENT SAMPLING:
 It involves the choice of subjects who are most advantageously placed or in the best position to provide the
information required.
 The research expert points out the exact sample from where and which information obtained is more
accurate & reliable.
Advantages:
 Based on the experienced person’s judgment
Disadvantages:
 Cannot measure the representativeness of the sample
3) QUOTA SAMPLING:
 The sample sizes called quotas are established for each stratum.
 Field-workers are then instructed to conduct interviews with the designated quotas, with the
identification of individual respondents being left to the field-workers.
 A convenience sample is drawn for each cell until the quota is met. (similar to stratified sampling)
Advantages:

Prepared by A.Anitha – AP/DOMS Page 17


Mailam Engineering College Statistics for Management (BA5106)

 Easy to manage, quick


Disadvantages
 Dependent on subjective decisions
 Only reflects population in terms of the quota, possibility of bias in selection, no standard error
4) PANEL SAMPLING:
 The initial samples are drawn on random basis and information from these is collected on
regular basis. It is a semi-permanent sample where members may be included repetitively for
successive studies.
 Here there is a facility to select and quickly contact such well balanced samples and to have
relatively high response rate even by mail.
5) SNOWBALL SAMPLING:
 It is a special non-probability method used when the desired sample characteristic is rare.
 It may be extremely difficult or cost prohibitive to locate respondents in these situations.
 Snowball sampling relies on referrals from initial subjects to generate additional subjects.
Advantages:
 Identifying small, hard-to reach uniquely defined target population
 Useful in qualitative research
Disadvantages:
 Bias can be present
 Limited generalizability

2. What is sample size? Explain the factors to be considered while deciding the sample size.
Sample Size:
The number of individuals included in the finite sample is called sample size . It is typically
denoted by n, a positive integer (natural number).

Factors to be considered while deciding the sample size:

1. The margin of error :


 It is also referred to as the confidence interval which measures the precision with which an
estimate from a single sample approximates the population value.

Prepared by A.Anitha – AP/DOMS Page 18


Mailam Engineering College Statistics for Management (BA5106)

 For example, in a national voting poll the margin of error might be + or – 3%. This means that if
60% of the people in a sample favor Mr. Smith, you could confident 1 that, if you surveyed the
entire population, between 57% (60-3) and 63% (60+3) of the population would favor Mr. Smith.
The margin of error in social science research generally ranges from 3% to 7% and is closely
related to sample size. A margin of error will get narrower as the sample size increases. The
margin of error selected depends on the precision needed to make population estimates from a
sample.
2. The confidence level :
 It is the estimated probability that a population estimate lies within a given margin of error.
 Using the example above, a confidence level of 95% tells you that you can be 95% confident that
between 57% and 63% of the population favors Mr. Smith. Common confidence levels in
social science research include 90%, 95%, and 99%. Confidence levels are also closely related
to sample size. As the confidence level increases, so too does the sample size. A researcher
that chooses a confidence level of 90% will need a smaller sample than a researcher who is
required to be 99% confident that the population estimate lies within the margin of error.
Looking at it another way, with a confidence level of 95%, there is a 5% chance that an
estimate derived from a sample will fall outside the confidence interval of 57% to 63%.
Researchers will chose a higher confidence level in order to reduce the chance of making a
wrong conclusion about the population from the sample estimate. For all samples used in the
MGAP Outcome Evaluation, the confidence level is 95%.
3. proportion (or percentage) :

 Proportion of a sample that will choose a given answer to a survey question is unknown, but it’s
necessary to estimate this number since it is required for calculating the sample size.
 Most researchers will use a proportion (or percentage) that is considered the most conservative
estimate – that is, that 50% of the sample will provide a given response to a survey question.
This is considered the most conservative estimate because it is associated with the largest sample
size. Smaller sample sizes are needed if the proportion of a sample that will choose a given
answer to a question is estimated at 60% (or 40%) while an even smaller sample size is needed if
the estimated proportion of responses is either 70% (or 30%), 80% (or 20%), or 90% (or 10%).

Prepared by A.Anitha – AP/DOMS Page 19


Mailam Engineering College Statistics for Management (BA5106)

 Thus, when determining the sample size needed for a given level of accuracy (i.e., given
confidence level and margin of error), the most conservative estimate of 50% should be used
because it is associated with the largest sample size.

3. Explain any methods of sample size determination.


The methods for determining a proper sample size for the following two cases:
1.Sample size for estimating a population mean
The confidence interval for estimating a population mean is given by
x ± zα σ / √n.
(or)
x ± E where E= zα σ / √n
i.e , the minimum allowable error for the difference between the population mean and the sample
mean
√n = zα .σ / E (or)
n = zα2 σ2 / E2
Here both the values of zα and E must be specified. The value of the population σ may be actual or
estimated.
2.Sample size for estimating a population proportion.
P ± zα √ pq / n
Where E = zα √ pq / n
√n = zα √pq / E
n= zα2 pq / E2
Where the values of zα and E are predetermined. The value of proportion p may be actual or estimated
from the past experience.
4. Differentiate point and interval estimate.
Point Estimate Interval Estimate
When a single value is used as an estimate, the In an estimate the parameter lies between the range
estimate is called as point estimate of population of values (i.e) two numbers is called as interval
parameter estimate. This is otherwise called confidence
interval.
It is deterministic It is Probabilistic

UNIT 3

Prepared by A.Anitha – AP/DOMS Page 20


Mailam Engineering College Statistics for Management (BA5106)

Part A

1. Explain Type I and Type II Error. / What are decision errors?


Difference between Type I and Type II Error.
Two types of errors can result from a hypothesis test.
 Type I error.
i) A Type I error occurs when the researcher rejects a null hypothesis when it is true.
ii) The size of type I error is called as Producer’s Risk.
iii) The probability of committing a Type I error is called the significance level.
iv) This probability is also called alpha, and is often denoted by α.
 Type II error.
i) A Type II error occurs when the researcher fails to reject a null hypothesis that is
false.
ii) The size of Type II Error is called as Consumer’s risk.
iii) The probability of committing a Type II error is called Beta, and is often denoted
by β.
iv) The probability of not committing a Type II error is called the Power of the test.

2. Explain one-tailed and two-tailed tests?

One-tailed test:
 A test of a statistical hypothesis, where the region of rejection is on only one side of
the sampling distribution, is called a one-tailed test.
 In the normal curve any one of the side either positive or negative.

Two-tailed test:

 A test of a statistical hypothesis, where the region of rejection is on both sides of the
sampling distribution, is called a two-tailed test.
 If the normal curve, have both sides (right & left) is called two tailed test

Prepared by A.Anitha – AP/DOMS Page 21


Mailam Engineering College Statistics for Management (BA5106)

3. What is ANOVA?
ANOVA - Analysis of variance. It is a statistical method in which the variation in a set of
observations is divided into distinct components.

4. What is the F-test?


 The F-test is used for comparisons of the components of the total deviation.
 For example, in one-way or single-factor ANOVA, statistical significance is tested for by
comparing the F test statistic.

5. Why is ANOVA helpful?


 ANOVAs are helpful because they possess a certain advantage over a two-sample t-
test.
 Doing multiple two-sample t-tests would result in a largely increased chance of
committing a type I error.
 For this reason, ANOVAs are useful in comparing three or more means.

6. What is Null hypothesis:


 The hypothesis which states that there is no significant difference between assumed and
actual value of the parameter is called as Null hypothesis.
 Null hypothesis is represented by H0

7. Describe level of significance.(June 2013)


 The probability α that a random value of the test statistic belongs to the critical region is
known as the level of significance.
 Level of significance is the size of Type I Error.
8. Define hypothesis.
When we attempt to make decisions about the population on the basis of sample, we make
assumptions about the nature of population, such assumptions which may or may not be true is
called as Hypothesis.
10. Define critical region or rejection region.
It is the region on the standard normal curve corresponding to a pre-determine level of
significance α.
11. What is mean by critical value?
The value of the sample statistics ‘Z” that define the region of acceptance and rejection is
called critical value.

Prepared by A.Anitha – AP/DOMS Page 22


Mailam Engineering College Statistics for Management (BA5106)

12. What do you mean by decision in testing of hypothesis?


In this we compare the Z with the critical value of Zα at a given level of significance α and
decided as under .
If |Z| > Zα Ho is rejected.
If |Z| < Zα Ho is accepted.

13. Form one way ANOVA table.

Source of Sum of Degree of


Mean square F-ratio
variance square freedom

Between
SSC C-1 MSC=SSC/C-1
samples

Within F=MSC/MSE
samples SSE C(R-1) MSE=SSE/C(R-1)
(error)

Total SST CR-1

14. Form of two way ANOVA table.

Source of Sum of Degree of


Mean square F-ratio
variance square freedom

Between
SSC C-1 MSC=SSC/C-1
Column F=MSC/MSE

Between
SSR (R-1) MSR=SSR/(R-1)
rows F=MSR/MSE

Error
SSE (C-1)(R-1) MSE=SSE/(R-1)(C-1)
(residual)

15. What is the aim of design of experiment? (Jan 2013)


The aim of design experiment is to control the extraneous variable and to minimize the
experimental error.
16. When does Z test Apply?

Prepared by A.Anitha – AP/DOMS Page 23


Mailam Engineering College Statistics for Management (BA5106)

 When there is a need to determine whether the given two population means are different,
when the variance is known with Large Sample Size (n ≥30).
 The test statistic is assumed to have a normal distribution and parameters such as
standard deviation should be known , Z test is performed.

17. Give two examples for a hypothesis. (Jan 2015)


i) Null Hypothesis:
Ho: µ1=µ2
The difference between means of two districts is not statistically significant.
ii) Alternative Hypothesis:
H1: µ1≠µ2
The difference between means of two districts is statistically significant.
21. Distinguish between z-test and t-test. (Jan 2015)
Z test t test
Z test is used when sample size n≥30 i.e Large t test is used when sample size n≤ 30 i.e small
sample sample
Z test is used when the parameter such as t test is used when the parameter such as
variance, standard deviation is known. variance, standard deviation is unknown.

22. State the assumptions of ‘F’ test.( Apr/May 2019)


 Normality: The values in each group should be normally distributed.
 Independence of Error: The variation of each value around its own group mean i.e ., error
should be independent of each value
 Homogeneity: The variances within each group should be equal for all groups, i.e., σ12 = σ22 =
σ32=…= σn2
23. Write the properties of t distribution.
 The t - distribution ranges from -∞ to ∞
 The t – distribution , like Standard normal distribution is bell shaped and symmetrical around
mean Zero.
 The shape of the t – distribution changes as the number of degrees of freedom changes.
24. Name the basic principles of experimental design.( Apr/May 2018)
According to Professor Ronald A. Fisher , the basic principles of the design of experiments are
 Principles of Replication
 Principles of Randomization
 Principles of Local Control.
25. What is the assumption of t test? (Apr/May 2017)

Prepared by A.Anitha – AP/DOMS Page 24


Mailam Engineering College Statistics for Management (BA5106)

 The sample drawn from the parent population is normal.


 The sample observations are independent.
 The population standard deviation σ is unknown.
26. Define student t –test for difference of mean of two samples.
The samples are dependent if they are paired so that each observation in one sample is
associated with some particular observation in second sample, we use paired ‘t’ test or
student ‘t’ test.

27. State the application of t tests.

 To test the hypothesis about population mean.


 To test the hypothesis about the difference between two means.
 To test the hypothesis about the difference between two means with dependent samples.
 To test the hypothesis about observed sample correlation coefficient and sample
regression co-efficient

28. Discuss the test procedure to test hypothesized population proportion using single sample
population.

Procedure:
1. Hypothesis:
Null hypothesis : Two proportion are equal (or) µ1=µ2.
Alternate hypothesis: Two proportion are unequal (or) )µ1≠µ2.
2. Test statistics
p1− p 2
Z=
S . E( p 1− p 2)
p 1 q 1 p 2q 2
S.E.(p1-p2)=
√ n1
+
n2
where q=1-p

3. To find the level of significance we can use 5%,1%, or 2%.


4. Find the table value using one tailed test or two tailed test.
5. Find the decision by comparing calculated value and table value and write the
decision whether it is accepted or rejected.

UNIT 4

Prepared by A.Anitha – AP/DOMS Page 25


Mailam Engineering College Statistics for Management (BA5106)

1. What is yates’s Correction?(Apr/May 2019)


 Yate’s correction, also known as Yate’s chi-squared test, is used to test independence of
events in a cross table i.e. a table showing frequency distribution of variables.
 It is used to test if a number of observations belonging to different categories validate a null
hypothesis.
 It is a correction made to chi-square values in a binomial frequency distribution table.
2. Write the formula for Kruskal Wallis test.(Nov/Dec 2018)

3. What are the uses of Chi-square distribution? (Nov/Dec 2018)


State any 2 uses of Chi-square test.(Jan 2018)
 To test if the hypothetical value of population variance
 To test the ‘goodness of the fit’. It is used to determine whether an actual sample
distribution matches a known theoretical distribution.
 To test the independence of attributes i.e., if a population is known to have two
attributes , then chi-square distribution is used to test whether the two attributes are
associated or independent , based on sample.
 To test the homogeneity of independent estimates of population correlation coefficient.
4. Define Chi-square distribution.(Apr/May 2018)
The Chi Square distribution is the distribution of the sum of squared standard normal
deviates.
If X1,X2,…,Xm are m  independent random variables having the standard normal distribution, then
the following quantity follows a Chi-Squared distribution with m degrees of freedom.
Its mean is m, and its variance is 2m.

5. Write formula for Runs test. (Apr/May 2018)

v−µ v
V statistics, Z = σv
2n 1 n 2
µv = n 1+ n 2 + 1 2n1n2 (2n1n2 -n1-n2)
σ2v =
Prepared by A.Anitha – AP/DOMS Page 26
Mailam Engineering College Statistics for Management (BA5106)

(n1+n2)² (n1+n2 – 1)
V = No. of runs.

6. Distinguish between Mann –whitney U test and Kruskal Wallis test. (Jan 2018)

Mann –whitney U test Kruskal Wallis test


It is used when there is only two population It is used when there is more than two populations.
It is also called as U test It is also called as H test or KW test
It is always two tail test , so Z table is used It uses Chi-square Distribution table.

7. List out the working rules of Mann –whitney U test. (Nov/Dec 2017)
Null Hypothesis:
Ho: µ1=µ2, the two population are identical and
Alternative Hypothesis:
H1: µ1≠ µ2, the two populations are not identical
Test Statistic:
1. Combine all the given samples (from smallest to the largest), and assign ranks to all these
values.
2. Assign the average of ranks if the sample values are same (i.e there are tie score)
3. Find the sum of the ranks for each of the sample. Let us denote these sums by R1 and R2.
Also n1 and n2 are their respective sample sizes.
4. Calculate U –statistic:
U 1=n 1 n2+ n1 ( n 1+1 )
-R1 [For sample 1]
2
(OR)

U 2=n 1 n2+ n2 ( n 2+1 )


- R2 [ For Sample 2]
2
Now the mean and variance of the sampling distribution of U are
n 1n 2
Mean = µv = 2

n 1n 2(n 1+n 2+1)


Variance = σ2v =
12

Prepared by A.Anitha – AP/DOMS Page 27


Mailam Engineering College Statistics for Management (BA5106)

Therefore the standard normal variate of U in

U−μ v
Z= σv
4. Level of significance:
 Find the level of significant for 5%,1% ect
 Find the table value by using Z test table value.
5. Conclusion:
 Find the decision for this problem by comparing calculated value and table value and
write the decision whether it is accepted or rejected.
 If │Z│≤ Zα , we accept H0 and reject H0 if │Z│≥ Zα . where Zα is the tabulated value of Z
for the given level of significance α.

8. List any 4 Non –parametric test.


i) Kruskal Wallis test
ii) Mann –whitney U test
iii)one sample run test
iv) kolmogoror-smirnov test
v) sign test
9. Write the importance of Kolmogrov-Smirnov test.(Apr/may 2017)
It is used to test whether there is a significant difference between an observed frequency
and theoretical frequency distribution.

Dn=max|Fe −F o|

For table value see Kolmogorov-Smirnov table

10. When we use Mann –whitney U test? (May/June 2016)


Mann –whitney U test is used when there are only two populations.

11. What are the advantages of Non – parametric methods? (May/June 2016)
Advantages:
 It does not require any parameters and the population to be normal
 It is simple and easy to understand
 It is based on assumptions
Disadvantages:
 They ignore a certain amount of information
 They are not efficient as parametric test.

Prepared by A.Anitha – AP/DOMS Page 28


Mailam Engineering College Statistics for Management (BA5106)

 The non –parametric tests cannot be used to estimate parameters in the population or the
confidence intervals for such parameters.
12. Distinguish between Non-Parametric and Parametric tests.(Jan 2015)

BASIS FOR
PARAMETRIC TEST NONPARAMETRIC TEST
COMPARISON

Meaning A statistical test, in which specific A statistical test used in the case of
assumptions are made about the non-metric independent variables, is
population parameter is known as called non-parametric test.
parametric test.

Basis of test statistic Distribution Arbitrary

Measurement level Interval or ratio Nominal or ordinal

Measure of central Mean Median


tendency

Information about Completely known Unavailable


population

Applicability Variables Variables and Attributes

Correlation test Pearson Spearman

13. What do you mean by Non-parametric test? (Nov/Dec 2014)


A non parametric test (sometimes called a distribution free test) does not assume anything
about the underlying distribution (for example, that the data comes from a normal
distribution). ... It usually means that you know the population data does not have a normal
distribution.
14. Explain Rank sum test. (Nov/Dec 2014)
Mann-Whitney ‘U ‘ test and Kruskal Wallis test are called Rank sum test because the
test depends on the rank of the sample observations. Mann- Whitney test is used when there are

Prepared by A.Anitha – AP/DOMS Page 29


Mailam Engineering College Statistics for Management (BA5106)

only two population whereas Kruskal Wallis test is employed when more than two populations
are involved.

15. Explain the k-W test procedure with appropriate examples.


Kruskal – Wallis test is a non –parametric test, which is used to compare three or more
groups of sample data.
Procedure:
1. Arrange the data of both samples in a single series in ascending order.
2. Assign rank to them in ascending order, in the case of repeated values; assign ranks to
them by averaging their rank position.

3. Ranks for the different samples are separated and summed up as R1, R2, R3 ect.

4. To calculate the k-S test, the formula is

5. Find the level of significant for 5%


6. Find the table value by using chi-square test table value.
7. Find the decision for this problem by comparing calculated value and table value and
write the decision whether it is accepted or rejected.
19. What is sign test? Elaborate the steps involved in sign test.(Apr/may 2019)
Sign test:
 The Sign test is a non-parametric test that is used to test whether or not two groups are
equally sized. 
 The sign test is used when dependent samples are ordered in pairs, where the bivariate
random variables are mutually independent.
 It is based on the direction of the plus and minus sign of the observation, and not on their
numerical magnitude. 
 It is also called the binominal sign test, with p = 0.5.. 
 The sign test is considered a weaker test, because it tests the pair value below or above
the median and it does not measure the pair difference. 

Types of sign test:

Prepared by A.Anitha – AP/DOMS Page 30


Mailam Engineering College Statistics for Management (BA5106)

1. One sample: We set up the hypothesis so that + and – signs are the values of random
variables having equal size.
2. Paired sample: This test is also called an alternative to the paired sample t-test.  This test
uses the + and – signs in paired sample tests or in before-after study. In this test, null
hypothesis is set up so that the sign of + and – are of equal size, or the population means
are equal to the sample mean.
Procedure:
1. Calculate the + and – sign for the given distribution.  Put a + sign for a value greater than
the mean value, and put a – sign for a value less than the mean value.  Put 0 as the value
is equal to the mean value; pairs with 0 as the mean value are considered ties.
2. Denote the total number of signs by ‘n’ (ignore the zero sign) .
3. Sign test for paired data:-
It is based on the direction of a part of observation and not on their numerical
magnitude.
When sign value is less than 5 then we use Binomial (or) Poisson distribution,
e−α α x
P ( X=x )=n c r p r qn−r , p ¿X=x) =
x!
When sign value is greater than 5 or equal to 5 then we use normal distribution
x−μ pQ
z=
σ √
, SE( Ṕ)=
n
For table value see one tailed test or two tailed test.

One sample sign test:-


The population sample is continuous and symmetrical
If n value is less than 30 (n<30) we use Binomial (or) Poisson distribution,
e−α α x
P ( X=x )=n c r p r qn−r , p ¿X=x) =
x!
If n value is greater than 30 (n>=30) we use normal distribution
x−nQ
Z=
√nQ( 1−Q)
For table value see one tailed test or two tailed test.

UNIT 5
1. What is mean by Correlation analysis?
Correlation analysis is a statistical technique used to describe not only the degree of
relationship between the variables, but also the direction of influences.

2. Define Correlation.(May/June 2016)


Correlation analysis attempts to determine the degree of relationship between two

Prepared by A.Anitha – AP/DOMS Page 31


Mailam Engineering College Statistics for Management (BA5106)

variables.
According to A.M.Tuttle “Correlation is an analysis of the co-variation between two or
more variables.”

rxy = n ∑XY – ( ∑X)( ∑Y)

√n ∑X2 – (∑X2) √n ∑Y2 –( ∑Y2)

3. What are the methods to study correlation?


 Scatter diagram
 Karl perason’s coefficient of correlation or Covariance method
 Spearman’s Rank Correlation method
 Two way frequency method
 Concurrent deviation method

4. Explain Coefficient of Determination.( Apr/May 2019)


It is the Square of the coefficient of correlation i.e r2, where r is the coefficient of correlation.

5. Explain Types of Correlation:


Correlation can be classified into different ways. The 3 of the most important ways of
classifying correlations are;
Positive or negative correlation:
 If the increase in one variable causes the proportionate increase in the other variable,
then the variable is said to be positively correlated.
 If the increase in one variable causes the proportionate decrease in the other variable,
then the variable is said to be negatively correlated.
Simple, partial & multiple correlations
 When only one variable are studied it is a problem of simple correlation.
 When three or more variable are studied it is a problem of either multiple or partial
correlation.
Linear and non linear correlation
 If the amount of change in one variable tend to bear constant ratio to the amount of
change in the other variable then the correlation is said to be Linear.
 If the amount of change in one variable does not bear constant ratio to the amount of
change in the other variable then the correlation is said to be non Linear.

6. Regression analysis:
Regression is the measure of the average relationship between 2 or more variables
in terms of the original units of the data.
E.g., If we know that advertising & sales are correlated, we find out expected
amount of sales for a given advertising expenditure or the required amount of expenditure

Prepared by A.Anitha – AP/DOMS Page 32


Mailam Engineering College Statistics for Management (BA5106)

for attending a given amount of sales.


7. Write the formula for Regression line?

(a) Regression line of x on y:

σy
y− y=r (x−x )
σx
σy
Where r regression Coefficient of Y on X
σx

(b) Regression line of y on x:

σx
x−x=r ( y− y )
σy

σx
Where r regression Coefficient of X on Y
σy
Ʃx Ʃy
x= ; y=
n n

8. Write down the Regression Coefficients.( Nov/Dec 2018)

Regression coefficient of Y on X

σy
r =byx …………….( 1)
σx

Regression coefficient of Y on X

σx
r =bxy …………….( 2)
σy

From (1) and (2) we get

r =± √ byx ×bxy

σy
b yx= r = n Ʃ xy −( Ʃ x )( Ʃ y)
σx
n Ʃx2 –( Ʃx)2

σx
b xy = r = n Ʃ xy −( Ʃ x )( Ʃ y)
σy
n Ʃy2 –( Ʃy)2

Where X = X –A , Y =Y ̶ B, A and B are assumed means.

Prepared by A.Anitha – AP/DOMS Page 33


Mailam Engineering College Statistics for Management (BA5106)

9. Difference between Correlation and Regression( Apr/May 2018)

S.N Basis of Correlation Regression


o comparison
1 Meaning Correlation is a statistical Regression describes how an
measure which determines co- independent variable is numerically
relationship or association of related to the dependent variable.
two variables.
2 Usage To represent linear relationship To fit a best line and estimate one
between two variables. variable on the basis of another variable.

3 Objective To find a numerical value To estimate values of random variable


expressing the relationship on the basis of the values of fixed
between variables. variable.
4 Nature of Variables are not designated as One variable is dependent and other
variables dependent and independent. variable is independent.
5 Indicates Correlation coefficient indicates Regression indicates the impact of a unit
the extent to which two variables change is known variable on the
move together estimated variable.
6 Range Correlation coefficient range lies In regression analysis byx > 1 and bxy < 1
from -1.00 to +1.00
7 Nature of Correlation coefficient is Regression coefficient is not
coefficient symmetrical and mutual. symmetrical and mutual.
8 Scope Correlation analysis has limited Regression analysis has wider
applications applications
9 Relationship Correlation is confined to linear Regression studies Linear and non-
relationship between variables linear relationships.
only.
10 Association Correlation coefficient measures Linear regression allows experiments to
the extent and direction of linear describe one variable as linear function
association between 2 variables. of another variable.

10. Write formula for Least square / linear trend:


Yc = a+bx

Ʃy
a= ; b=Ʃ xy
N
Ʃx2
Second degree trend / parabolic trend:
Y = a+bx+cx2

When x ≠ 0;

Prepared by A.Anitha – AP/DOMS Page 34


Mailam Engineering College Statistics for Management (BA5106)

∑xy ∑y – c (∑x2) (∑x²y) – (∑x²) (∑y)


a = ------; b = -------------; c = -------------------
2
n N ∑x N∑(x4) – (∑x²)²

When x = 0;
Y = a+bx+cx²
∑y = Na+b∑x+c∑x²
∑xy =a∑x+b∑x²+c∑x3
∑x²y = a∑x²+b∑x3+c∑x4

11. How should one forecast by linear regression?


Regression is the study of relationships among variables, a principal purpose of which is
to predict, or estimate the value of one variable from known or assumed values of other variables
related to it.

12. Types of Regression Analysis


Simple Linear Regression: A regression using only one predictor is called a simple
regression.
Multiple Regressions: Where there are two or more predictors, multiple regression
analysis is employed.

13. what for regression analysis used? (Jan 2015)

Regression analysis shows us how to determine both nature and the strength of relationship
between two variables.
14. What do you interpret if the r=0 and r=-1? (Jan 2011)
r=0 then the variables are uncorrected.
r=-1 then it is a perfect negative correlation.

15. What do you mean by error variation? (Jan 2011)


Sampling error or estimation error is the amount of inaccuracy in estimating some value that
is caused by only a portion of a population than the hole population.

16. What is the purpose of correlation analysis?


Correlation analysis shoes the extent to which two quantitative Variables vary together,
including the strength and direction of relationship. The strength & the relationship refer to the
extent to which one variable predicts the others.

18. What is nonsense correlation? (Jan 2014)


Two unrelated variables showing a high coefficient of correlation.

Prepared by A.Anitha – AP/DOMS Page 35


Mailam Engineering College Statistics for Management (BA5106)

19. Briefly explain how a scatter diagram benefits the researcher? (June 2014)
i) Relation –ship between variables.
ii) Combination of the two variables can be easily identified.
20. Explain the difference between linear and curvilinear relationships.( Jan 2016)
Linear relationships curvilinear relationships
Linear relationship has direct proportionality Nonlinear/curvilinear relationship does not
that causes the dependent variable to change have proportionality between the dependent
and independent variables (there is not
when the independent variable changes
consistent change)
 In a linear relationship all the points on the Nonlinear/curvilinear relationship are depicted
scatter diagram tends to lie near a straight line, graphically by anything other than a straight
line.

21. Define Rank Correlation. Write down the formula to calculate rank correlation co-
efficient.(Jan 2013),(Nov/Dec 2018)

The correlation coefficient between the rank xi and yi is called the rank correlation
between the two characteristics A and B for the group of individuals

When ranks are not repeated

Where d= x-y
For Repeated ranks:
6 [Σ d2 + m1(m12 – 1) + m1(m12 – 1) + ……. ]
12 12
rs = 1 ̶
n ( n2 -1)

Where d2 is the square of differences of corresponding ranks and n is the number of pairs
of observation.

22. Explain standard error of estimate.( Apr/May 2019)

The standard error of estimate is similar to Standard deviation. It is a measure of variation


or scatteredness about the line of regression, whereas standard deviation measures of variation or
scatteredness about arithmetic mean.

a) Standard Error of estimate Y on X:


It is denoted by Syx . It is derived from the formula
Syx = σY √ 1- r2,
Where r is the coefficient of correlation Y and X.

Prepared by A.Anitha – AP/DOMS Page 36


Mailam Engineering College Statistics for Management (BA5106)

b) Standard Error of estimate X on Y:


Sxy = σx √ 1- r2
Where r is the coefficient of correlation X and Y.
23. Difference between Coefficient of determination and the coefficient of correlation.

Coefficient of determination coefficient of correlation


Coefficient of determination is the R2 coefficient of correlation is the R value.
value.
Correlation measures linear coefficient of determination (R-squared)
relationship between two variables measures explained variation

24. When Linear regression is used?(Nov/Dec 2017)

Linear regression is a common Statistical Data Analysis technique. It is used to determine


the extent to which there is a linear relationship between a dependent variable and one or more
independent variables.

Prepared by A.Anitha – AP/DOMS Page 37

You might also like