You are on page 1of 13

PROJECT REPORT

ON
REGRESSION
ANALYSIS
[IMBA]

PRESENTED
BY

NAME: - D.SRIKANTH

ENROLL NO: - 6NI14059


INTRODUCATION:-
Trevor Bull - Managing Director
Mr. Trevor Bull joined Tata AIG Life as Managing Director in
January 2006. Prior to this, Trevor was Senior Vice President and
General Manager at American International Assurance in Korea

Tate AIG Life Insurance Company Ltd. and


Tata AIG General Insurance Company Ltd. (collectively "Tata
AIG") are joint venture companies, formed from the Tata Group
and American International Group, Inc. (AIG). Tata AIG
combines the power and integrity of the Tata Group with AIG's
international expertise and financial strength. Tata Group holds
74 per cent stake in the two insurance ventures, with AIG
holding the balance 26 per cent stake.

Tata AI G Life Insurance Company Ltd. provides


insurance solutions to individuals and corporate. Tata AI G Life
Insurance Company was licensed to operate in India on February 12,
2001 and started operations on April, 2001. Tata AIG Life offers a
broad array of life insurance coverage to both individuals and
groups, providing various types of add-ons and options on basic life
products to give consumers flexibility and choice.

Tata AIG Life Insurance Company offers products


in Ahmedabad, Bangalore, Chandigarh, Chennai, Guwhati,
Hyderabad, Jaipur, Jamshedpur, Jodhpur, Kochi, Kolkata, Mangalore,
Muinbai, New Delhi, Pune, Rajkot, Trichi, - Vijay Wada and Lucknow

Objective of the Study


The objective of this study is to
measure the regression analysis method used by TATA AIG in
the city of Hyderabad.

Questionnaire Development

For the purpose of this study, a structured questionnaire was


developed. In this stage, an exploratory study was carried
out using personal and focus group interviews

Collection of Data
The above mentioned questionnaire
was used to collect the primary data. For secondary data,
research papers, journals and magazines were referred.
Regression analysis

In statistics, regression analysis is a collective name


for techniques for the modeling and analysis of numerical data
consisting of values of a dependent variable (also called response
variable or measurement) and of one or more independent variables
(also known as explanatory variables or predictors). The dependent
variable in the regression equation is modeled as a function of the
independent variables, corresponding parameters ("constants"), and an
error term.

The error term is treated as a random variable. It


represents unexplained variation in the dependent variable. The
parameters are estimated so as to give a "best fit" of the data. Most
commonly the best fit is evaluated by using the least squares method,
but other criteria have also been used.

Regression can be used for prediction (including


forecasting of time-series data), inference, hypothesis testing, and
modeling of causal relationships. These uses of regression rely heavily
on the underlying assumptions being satisfied. Regression analysis has
been criticized as being misused for these purposes in many cases
where the appropriate assumptions cannot be verified to hold. One
factor contributing to the misuse of regression is that it can take
considerably more skill to critique a model than to fit a model
Underlying assumptions
Classical assumptions for regression
analysis include:
The sample must be representative of the population for the
inference prediction.
The error is assumed to be a random variable with a mean of zero
conditional on the explanatory variables.
The independent variables are error-free. If this is not so,
modeling may be done using errors-in-variables model
techniques.
The predictors must be linearly independent, i.e. it must not be
possible to express any predictor as a linear combination of the
others. See Multicollinearity.
The errors are uncorrelated, that is, the variance-covariance
matrix of the errors is diagonal and each non-zero element is the
variance of the error.
The variance of the error is constant across observations
(homoscedasticity). If not, weighted least squares or other
methods might be used.

These are sufficient (but not all necessary)


conditions for the least-squares estimator to possess desirable
properties, in particular, these assumptions imply that the parameter
estimates will be unbiased, consistent, and efficient in the class of
linear unbiased estimators. Many of these assumptions may be relaxed
in more advanced treatments.

Regression Analysis that involves two variables


is termed bi-variate linear Regression Analysis. Regression Analysis
that involves more than two variables is termed as Multiple
Regression Analysis.
The Bi-variate linear Regression Analysis
involves Analyzing the straight line relationship between two continues
variables the Bi-variate linear Regression can be expressed as:

Y=+X

Where,

Y represents the dependent variable

X is independent

and are two constraint which are know as regression coefficient.

is slope of coefficient

can be symbolically represented as Y/X

= Yi-Xi

= (Yi-Yj)/ (Xi-XJ)

Least square method

The method of least squares or ordinary least squares


(OLS) is used to solve over determined systems. Least squares are
often applied in statistical contexts, particularly regression analysis.

Least squares can be interpreted as a method of fitting data. The best fit
in the least-squares sense is that instance of the model for which the
sum of squared residuals has its least value, a residual being the
difference between an observed value and the value given by the
model. The method was first described by Carl Friedrich Gauss around
1794.[1] Least squares correspond to the maximum likelihood criterion
if the experimental errors have a normal distribution and can also be
derived as a method of moments estimator. Regression analysis is
available in most statistical software packages.

The relationship between the amount spent on advertisement per


month & number of customer visited because of advertisement
given by TATA AIG Life Insurance Co.

The equation for regression line assume by least square is shown below
Y=a+bX+ci

Where,

Y is dependent variable

X is independent variable

a is a Y intersect

b is a slope of line

The below table shows the amount spent on advertisement &


number of customer visited through advertisement.

AMOUNT N.O OF CUSTOMERS

SPENT VISITED

ON ADVERTISING (IN 000S) [Y]

(IN CRORES)[X]
JAN 3.6 9.3
FEB 4.8 10.2
MAR 2.4 9.7
APR 7.2 11.5
MAY 6.9 12
JUN 8.4 14.2
JUL 10.7 18.6
AUG 11.2 28.4
SEP 6.1 13.2
OCT 7.9 10.8
NOV 9.5 22.7
DEC 5.4 12.3

The constant b can be calculated using formula

b=m (XY)-X Y/n (X2)-(X) 2

X is dependent variable

Y is independent variable

a is calculated as shown below:

a = -b

Where,

= the mean of value of dependent variable

= the mean of value of independent variable


ei= is the error. It is called as residual value.
The criterion for the least squar method is given below.

e2i
i=1

Where
ei = Yi i

Yi is the actual value of the


Dependent variable

i is the value lying on the


Estimated regression line.
Let a solve the example previously discussed
using the least square method.

We need to determine the constant a&b to


develop the regression equation. The required
calculation for determining the constant are shown in
table

AMOUNT N.O OF
2
SPENT CUSTOMERS XY X

ON VISITED

ADVERTISING (IN 000S) [Y]

(IN CRORES)[X]

3.6 9.3 33.48 12.96

4.8 10.2 48.96 23.04

2.4 9.7 23.28 5.76

7.2 11.5 82.8 51.84

6.9 12 82.8 47.61

8.4 14.2 119.28 70.56

10.7 18.6 199.02 114.49

11.2 28.4 318.08 125.44

6.1 13.2 80.52 37.21

7.9 10.8 85.32 62.41


9.5 22.7 215.65 90.25

5.4 12.3 66.42 29.16

x=84.1 Y=172.9 XY=1355.61 XY=1355.61

b = 12(1355.61)-
(84.1)(172.9)/12(670.73)-(84.1)2

= 1.768
The step is to calculate a
To calculate the value of small a we need to first determine the mean
of value of variable X&Y

= 84.1/12
=7.0

= 172.9/12
=14.40

Substituting the value in equation

a = 14.40-(1.768)(7)
= 14.40-12.39
= 2.01

We know develop the estimated regression equation by


substituting the value of a & b in equations

= 2.01+1.768X

represents the estimated value of dependent variable


for a given value of X
The Strength of Association R2

R2 can be calculated using the following formula:


R2 = explained variance/total variance

Total variance = explained variance unexplained variance

Explained variance = total variance unexplained variance

Therefore
R2= total variance unexplained variance/total variance

R2 = 1-unexplained variance/total variance


The unexplained variance is given by (Yi ) 2

The total variance by (Yi - ) 2

R2 = 1-(Yi ) 2 / (Yi - ) 2

X Y XY X2 Y- (Y- ) 2 (- (Y-
) 2 ) 2
3.6 9.3 33.48 12.96 8.37 0.925 0.85599 36.30 26.01
48 2 504 304
4.8 10.2 48.96 23.04 10.4 - 0.08785 15.23 17.64
964 0.296 296 809
4
2.4 9.7 23.28 5.76 6.25 3.446 11.8804 66.37 22.09
32 8 3024 035
7.2 11.5 82.8 51.84 14.7 - 10.4950 0.115 8.41
396 3.239 0816 328
6
6.9 12 82.8 47.61 14.2 - 4.88056 0.036 5.76
092 2.209 464 405
2
8.4 14.2 119.28 70.56 16.8 - 7.08198 6.057 0.04
612 2.661 544 505
2
10.7 18.6 199.02 114.49 20.9 - 5.41772 42.60 17.64
276 2.327 176 956
6
11.2 28.4 318.08 125.44 21.8 6.588 43.4070 54.93 196
116 4 1456 181
6.1 13.2 80.52 37.21 12.7 0.405 0.16418 2.576 1.44
948 2 704 667
7.9 10.8 85.32 62.41 15.9 - 26.8033 2.487 12.96
772 5.177 9984 56
2
9.5 22.7 215.65 90.25 18.8 3.894 15.1632 19.41 68.89
06 36 284
5.4 12.3 66.42 29.16 11.5 0.786 0.61779 8.328 4.41
14 6 996
x= Y= XY=1 XY=1 (Y- (Y-
84.1 172. 355.61 355.61 ) 2 (- ) 2
9 ) 2 =38
=126. 1.29
=7. 855 =25
0 =14. 4.4
40 682

Therefore

R2 = 1- (Yi ) 2 / (Yi - ) 2

= 1- 126.885/381.29

= 1- 0.33

= 0.67

= 67%
Conclusion
This implies that of the total variation of Y, nearly 67% is
explain by the variation in X.

Hence there is strong linear relationship between the two


variables.