Measuring Covariation Between Variables Using Correlation & Regression Analysis

1
UACOAL41: BUSINESS STATISTICS & OPERATIONS RESEARCH - II

U5COAL41: BUSINESS STATISTICS - II
UNIT - I
Correlation is the analysis of covariation between two or more variables.
Significance / Advantages / Benefits / Importance of Correlation:

• Correlation helps to measure in one figure, the degree of relationship existing between two variables.
• Correlation serves as a basis for the study of regression, which in turn helps in estimation.
• Correlation analysis contributes to the understanding of economic behavior.
• It measures the strength of linear relationship.
• It is a widely used measure to understand the phenomenon of price changes and their causes.
Cautions in using Correlation Analysis

• Correlation is not a complete description of the relationship between two variables. it usually requires the
help of mean and standard deviation.
• Correlation measures only the strength of the relationship between two variables. It does not describe the
relationship between them.
• Correlation is not suitable to be used as a standalone measure.
Correlation and Causation:

Correlation usually signifies a covariation between two variables, but it does not mean that the two
variables are possibly related. Hence, a personal inspection of the variables using common sense is necessary to
rule out any errors in interpretation of data. This may be because of the following reasons:
• The correlation may be due to pure chance, especially in a small sample.
• Both the correlated variables may be influenced by one or more other variables.
• Both the variables maybe mutually influencing each other such that neither can be designated as the cause
and the other as effect.
Types of Correlation:
(i) Based on the direction of change:
• If both the variables are varying in the same direction they have Positive Correlation.
• If both the variables are varying in the different direction they have Negative Correlation.
(ii) Based on the number of variables studied:
• A study of covariation between only two variables is called Simple Correlation.
• A study of covariation between more than two variables is called Multiple Correlation.
• A study of covariation between two variables with the other one or more variables kept as constant is called
Partial Correlation.
(iii) Based on the constancy of rate of change:
• If the amount of change in one variable tends to bear a constant ratio to the amount of change in other
variable, it is called Linear Correlation.
• If the amount of change in one variable does not bear a constant ratio to the amount of change in other
variable, it is called Curvilinear Correlation.
2
The difference in period before a cause and effect relationship is established is called ‘Lag’. The difference in
period after a cause and effect relationship is established is called ‘Lead’.
Correlation Coefficient is a numerical value that quantifies the extent and type of correlation existing between
two or more variables.
Properties of Correlation Coefficient(r):

• The correlation coefficient ‘r’ lies between – 1 and + 1.
• Correlation coefficient is independent of change of scale and origin.
• Coefficient of correlation is the geometric mean of two regression coefficients.
• The degree of relationship between the two variables is symmetric.
Spurious Correlation is a mathematical relationship in which two variables are not causally related to each
other, yet it may be wrongly inferred that they are, due to either coincidence, or the presence of a certain third,
unseen factor.
Regression is the measure of average relationship between two or more variables in terms of the original units
of data.
Regression lines are two intersecting lines representing the average relationship of first variable on second
variable and vice versa.
Regression equations are the algebraic expression of regression lines.
Differences between correlation and regression

• Correlation is a measure of degree of covariability, regression is the study of average relationship between
two variables.
• Correlation is a tool for ascertaining degree of relationship between two variables, and hence cannot
determine the cause and effect. However, in regression, one variable is taken as the dependent and the other
as independent variable.
• Correlation coefficients are symmetric, whereas regression coefficients are not symmetric.
• There may be nonsense correlation between two variables, which is purely due to chance and has no
practical relevance, but there is nothing like nonsense regression.
• Correlation coefficient is independent of change of scale and origin, whereas regression coefficients are
independent of change of origin but not scale.
Uses of Regression:
• Regression analysis provides estimates of values of dependent variable from values of the independent
variable.
• Regression obtains a measure of error involved in the using the regression line as a basis for estimation.
• Regression coefficients can be used to computecorrelation coefficient, as coefficient of correlation is the
geometric mean of two regression coefficients.
3
Limitations of Regression Analysis

• Regression equation is constructed with the assumption that the relationship between variables is linear.
• Regression estimates can be extrapolated only to some extent correctly, beyond which it may not yield
correct values.
• The relationship between two variables maybe strongly influenced by other variables that are strongly
lurking in the background.
Standard Error of Estimate is the measure of accuracy of estimates made using regression.
UNIT - II
Index Number is a device for measuring differences in the magnitude of a group of related variables.
Uses of Index Numbers

• They help in framing suitable policies.
• They reveal trends and tendencies.
• They are important in forecasting future economic activity.
• Index numbers are very useful in deflating.
• Index numbers are specialized averages.
• Index numbers measure net change in a group of related variables.
• Index numbers measure the effect of changes over a period of time.
Problems in the construction of index numbers / Steps in the construction of an Index number
1. Purpose of the index
Since every index number is of limited and particular use, failure to indicate its purpose would lead to
confusion and wastage of time with no fruitful results. Selecting purpose acts as a base for the following
steps. Additionally since the scope is tied with the purpose of index, availability of data may necessitate
modification of purpose.
2. Selection of a base period
A Base period / Reference period is the period against which comparisons are made in a index number. A
base period should be a normal one, It should not be too distant in the past. A decision should be made with
regard to selection of either a fixed or chain base.
3. Selection of number of items
Depending on the purpose of the index number, number of commodities and their qualities are to be
determined. Larger the number of items, greater will be the representation of index numbers.
4. Price quotations
A decision with regard to the place and persons of sourcing prices is to be made. A decision must also be
taken if wholesale or retail prices are to be used.
5. Choice of an average
Simple arithmetic mean and Geometric mean are widely used for averages.
6. Selection of appropriate weights
The term ‘Weight’ refers to the relative importance of different items in index numbers. Weights can be
quantity weights or value weights. Production figures, Consumption figures, Distribution figures can be
used as weights.
4
7. Selection of an appropriate formula

An index number can be constructed using four methods
(i) Simple Aggregative index numbers: Total of current period prices is divided by the total of base
period prices.
(ii) Simple Average of Price Relatives index numbers: An average of price relatives is used. A price
relative is the ratio of the price of a single commodity in a given period to its price in base period.
(iii) Weighted Aggregative index numbers: In this method weights are assigned to various items and a
product of price and weight is used.
(iv) Weighted average of Relatives index numbers: In this method, price relatives are multiplied with the
relative weights.
Time Reversal Test is that the formula for calculation of index number should be such that it will give the
same ratio between one point of comparison and the other, no matter which of the two is taken as the base.
Factor Reversal Test is that the formula for calculation of index number should be such that its product with
the formula of altered factors (price and quantity) will yield the true value ratio.
An Index number formed as a result of taking the immediately preceding year as base year is called Chain
Index Number.
The decision to alter the base period due to obsolescence of previous base period is called Base Shifting.
The process of combining two or more index numbers covering different bases into a single series is called
Splicing.
Deflating is a policy decision to make allowances for the effect of changing price values.
Consumer Price Index numbers / Cost of Living Index numbers represent the average change over time in
the prices paid by the ultimate consumer, of a specified basket of goods and services.
UNIT - III
Time Series is a set of statistical observations arranged in a chronological order.
Utility of Time Series Analysis

• It helps in understanding past behavior.
• It helps in planning future operations
• It helps in evaluating current accomplishments.
• It facilitates comparisons.
Components of Time Series

• Trend is the general tendency of any variable to either grow or decline over a long period of time. Trend
does not show short term oscillations but rather shows steady movements. Such movements are attributable
to factors such as population change, technological progress and large scale shifts in consumer tastes.
5
• Seasonal Variations are those periodic movements which occur regularly every year and have their origin
in the nature of season, climate, customs, traditions, habits, culture etc.
• Cyclical Variations refer to the recurrent long-term variations occurring anywhere between one to twenty
years. They portray consistently recurring rises and decline in activity.
• Irregular (Erratic/Accidental/Random) Variations refer to such variations in business activity which do
not repeat in a definite pattern, occurring because of natural disasters, wars etc.
A Semi Average in Time Series Analysis refers to the average of a variable of one equal half of the available
data based on chronological order.
Moving Average is a calculation to analyse data points by creating series of averages of successive overlapping
subsets of constant size, of the full data set.
Method of Least Squares is a mathematical method of determining trend such that (i) the sum of differences
between actual and trend values is zero, (ii) the sum of squared differences between actual and trend values is
least.
UNIT - IV
Statistical Quality Control (SQC) involves statistical analysis of the inspection data, based on sampling and
principles of normal curve.
Statistical Quality Control (SQC) is a statistical method for determining the extent to which quality goals are
being met without necessarily checking every item produced and for indicating whether or not the variations
which occur are exceeding normal expectations.
Types of Variations The variation of a quality characteristic can be divided under two heads.
(i) Chance Variations are those variations that result from many minor causes that behave in a random manner.
This type of variation is permissible and indeed inevitable. If the variability in production process is confined to
chance variations alone, the process is said to be in a state of SQC. It is difficult to detect.
(ii) Assignable Variations are those variations that may be attributed to special non-random causes. Such
variations can be the result of a change in raw material, a new operation, improper machine setting, broken or
worn out parts, mechanical faults in a plant etc. Assignable variations can be detected and corrected.
Need for SQC

Chance variations cannot be detected and hence cannot be corrected. But Assignable variations can be detected
and corrected to increase quality. Quality of a product can be controlled by either an 100% inspection or by use
of SQC. Usually SQC is used because of following reasons:
• A full inspection is expensive,
• A full inspection is unreliable because of its routine, resulting in the workers classifying a product wrongly
out of boredom.
• A full inspection is made at the end of the manufacturing process rendering the objective of finding a
defective before completion of production useless.

Measuring Covariation Between Variables Using Correlation & Regression Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Measuring Covariation Between Variables Using Correlation & Regression Analysis

Uploaded by

Copyright:

Available Formats

1

UACOAL41: BUSINESS STATISTICS & OPERATIONS RESEARCH - II

Significance / Advantages / Benefits / Importance of Correlation:

Cautions in using Correlation Analysis

Correlation and Causation:

Properties of Correlation Coefficient(r):

Regression equations are the algebraic expression of regression lines.

Differences between correlation and regression

Limitations of Regression Analysis

Uses of Index Numbers

7. Selection of an appropriate formula

Utility of Time Series Analysis

Components of Time Series

Need for SQC

You might also like