This action might not be possible to undo. Are you sure you want to continue?
Three Types of Analysis we can classify analysis into three types – Univariate, involving a single variable at a time,
2. Bivariate, involving two variables at a time, and 3. Multivariate, involving three or more variables simultaneously.
The correlation coefficient measures this association. Correlation and Regression are generally performed together. or -1 (perfect negative correlation). The application of correlation analysis is to measure the degree of association between two sets of quantitative data. It has a value ranging from 0 (no correlation) to 1 (perfect positive correlation).Revision : Application Areas: Correlation 1. .
how are sales of product A correlated with sales of product B? Or. For example.2. are daily ice cream sales correlated with daily maximum temperature? . how is the advertising expenditure correlated with other promotional expenditure? Or.
Correlation is usually followed by regression analysis in many applications. . Given any two strings of numbers. Correlation does not necessarily mean there is a causal effect. It does not imply that one variable is causing a change in another.3. there will be some correlation among them. or is dependent upon another. 4.
The applications areas are in ‘explaining’ variations in sales of a product based on advertising expenses. or on all the above variables. or number of sales people.based on the variation in one or more other variables (called the independent variables).Application Areas: Regression 1. or number of sales offices. If there is only one dependent variable and one independent variable is used to explain the variation in it. then the model is known 2. 3. The main objective of regression analysis is to explain the variation in one variable (called the dependent variable). .
. we will limit our discussion to linear (straight line) models.4. it is called a multiple regression model. Even though the form of the regression equation could be either linear or non-linear. If multiple independent variables are used to explain the variation in a dependent variable. 5.
+ bnxn ( OR where Y = a + b1x1 + b2x2 +…….xn are the independent variables expected to be related to y and expected to explain or predict y. b2.The general regression model (linear) is of the type Y = b0 + b1x1 + b2x2 +……. x3….+ bnxn ) y is the dependent variable x1. b1. x2 . b3…bn are the coefficients of the respective independent variables. which will .
Purposes of Regression Analysis To establish the relationship between a dependent variable (outcome) and a set of independent (explanatory) variables To identify the relative importance of the different independent (explanatory) variables on the outcome To make predictions .
Steps of Regression Analysis Step 1: Construct a regression model Step 2: Estimate the regression and interpret the result Step 3: Conduct diagnostic analysis of the results Step 4: Change the original regression model if necessary Step 5: Make predictions .
It also gives the results of a ‘t’ test for the significance of each variable in the model. Input data on y and each of the x variables is required to do a regression analysis. This data is input into a computer package to perform the regression analysis. The output consists of the ‘b’ coefficients for all the independent variables in the model. 2.DATA (INPUT / OUTPUT) 1. . and the results of the ‘F’ test for the model on the whole.
3 Assuming the model is statistically significant at the desired confidence level (usually 90 or 95%). . the coefficient of determination or R2 of the model is an important part of the output. The R2 value is the percentage (or proportion) of the total variance in ‘y’ explained by all the independent variables in the regression equation.
The variables used (independent and dependent) are assumed to be either interval scaled or ratio scaled. with dummy variable coding. Nominally scaled variables can be used as independent variables in a regression model. Dependent variable essentially METRIC Independent variables Metric or Dummy . discriminant analysis should be the technique used instead of regression. If the dependent variable happens to be a nominally scaled one. 4. 3. 2.Requirements for applying Multiple regression analysis 1.
Past data has been collected for 15 sales territories. on Sales and six different independent variables.Worked Example: Problem A manufacturer and marketer of electric motors would like to build a regression model consisting of five or six independent variables. . Build a regression model and recommend whether or not it should be used by the company. to predict sales.
The data are for a particular year. in different sales territories in which the company operates. and the variables on which data are collected are as follows: .
X4 = Index of competitor activity in the territory on a 5 point scale (1=low. X6 = No. X2 = No. of existing customers in the territory.lakhs).Dependent Variable Y =sales in Rs.lakhs in the territory Independent Variables X1 = market potential in the territory (in Rs. X5 = No. X3 = No. The following slide gives the Data file : . of salespeople in the territory. 5=high level of activity by competitors). of service people in the territory. of dealers of the company in the territory.
1 SALES 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 POTENTL 3 DEALERS 4 PEOPLE 5 COMPET 6 SERVICE 7 CUSTOM 5 60 20 11 45 6 15 22 29 3 16 8 18 23 81 25 150 45 30 75 10 29 43 70 40 40 25 32 73 150 1 12 5 2 12 3 5 7 4 1 4 2 7 10 15 6 30 15 10 20 8 18 16 15 6 11 9 14 10 35 5 4 3 3 2 2 4 3 2 5 4 3 3 4 4 2 5 2 2 4 3 5 6 5 2 2 3 4 3 7 20 50 25 20 30 16 30 40 39 5 17 10 31 43 70 .
& .. b3. by entering all the 6 'x' variables in the model Y= b0+ b1x1 + b2x2 + b3x3 + b4x4 + b5x5 + b6x6 ………. b2. b4. b5.Equation 1 [ OR Y= a + b1x1 + b2x2 + b3x3 + b4x4 + b5x5 + b6x6 ………. b1..Regression We will first run the regression model of the following form.Equation 1] and determine the values of b0.
MULTIPLE REGRESSION RESULTS: All independent variables were entered in one block Dependent Variable: Multiple R: Multiple R-Square: Adjusted R-Square: Number of cases: SALES .988531605 .960090784 15 .977194734 .
581 57. .sta) MULTIPLE REGRESS. Depen.000004. the last column indicates the p-level to be 0. This indicates that the model is statistically significant at a confidence level of (1-0. Sums of Mean Effect Squares df Squares F Regress.484 6 1101.249 8 19.Var: SALES (regdata1.999996)*100.000004)*100 or (0.281 Total 6763.733 From the analysis of variance table.9996.13269 . Analysis of Variance.The ANOVA Table STAT. or 99.000004 Residual 154. 6609.
BETA of BETA .413967 .158646 .: STAT.3910 N=15 St.084871 .98853160 R2= .22685 .81938 1.96009078 F(6. MULTIPLE REGRESS.Err.230457 .09104 -1.031161 .631266 .8)=57.149302 B -3. Regression Summary for Dependent Variable: SALES R= .97719473 Adjusted R2= .41276 -.600084 .743935 Intercept POTENTL DEALERS PEOPLE COMPET SERVICE CUSTOM .35024 .00000 Std.439073 .568233 .1729 .813394 .33817 p-level . of B 5.735204 .06594 St.116511 .164315 .29800 2.418122 1.195427 .016052 .54925 .89270 -.144411 .126591 .133 p< .54581 3.074611 .060074 .339712 1.095002 t(8) -.Error of Estimate: 4.050490 .60937 -1. Err.04044 1.040806 .
as .Column 4 of the table.06594 Substituting these values of a. titled ‘B’ lists all the coefficients for the model.b6 in equation 1 we can write the equation (rounding off all coefficients to 2 decimals). b2.09104 b4 = -1.. .17298 b1 = .54925 b6 = 0.22685 b2 = . b1.89270 b5 = -0.81938 b3 = 1. These are : a (intercept) = -3.
55 (service people) + 0. Similarly. if 1 more dealer is added.1. if the number of sales people is increased by 1..Equation 1] The estimated increase in sales for every unit increase or decrease in the independent variables is given by the coefficients of the respective variables.09 (salespeople) .Sales = -3.23 (potential) + . lakhs.09. sales in Rs .82 lakh.07 (existing customers) [Y= a + b1x1 + b2x2 + b3x3 + b4x4 + b5x5 + b6x6 ………. are estimated to increase by 1. For instance.82 (dealers) + 1. if all other variables are unchanged.17 + .0.89 (competitor activity) . sales are expected to increase by 0. if .
of Service People" (SERVICE). Therefore. sales are estimated to decrease according to the –0. as it may lead to wrong conclusions.The SERVICE variable does not make too much intuitive sense. Strictly speaking. the coefficient for SERVICE is not to be used in interpreting the regression. If we increase the number of service people.735204). of sales people (PEOPLE) are significant statistically at 90 percent confidence level since their p. potential (POTENTL) and No. only two variables. we find that the coefficients of the variable SERVICE is statistically not significant (p-level 0. Now look at the individual variable ‘t’ tests. One should .level is less than 0.10.55 coefficient of the variable "No.
Different modes of entering independent variables in the model Enter Forward Stepwise Regression Backward step wise Regression Step wise regression .
6164 +. then expected sales.The final model Sales = -10. 50 lakhs.4244(6) = 10. using the above equation would be = -10. we could use this model to make predictions regarding sales in any territory for which Potential and No.2433(50) +1. Similarly. .095 lakhs. of Sales People were known. and the territory had 6 salespeople.6164 + .4244 (PEOPLE)…………Equation 3 Predictions: If potential in a territory were to be Rs.2433 (POTENTL) + 1.
For exploratory research. the R² value should not be interpreted. It is also recommended that unless the model is itself significant at the desired confidence level (as evidenced by the F test results printed out for the model). and only such variables should be used in the regression analysis. 3.Recommended usage 1. the hit-and-trial approach may be used. . It is recommended that for serious decision-making. 2. there has to be a-priori knowledge of the variables which are likely to affect y.
this may indicate that they are not independent of each other. If they are. and we may be able to use only 1 or 2 of them to predict the dependent variables. Independent variables which are highly correlated with each other should not be included in the model together .Multicollinearity and how to tackle it Multicollinearity : Interrelationship of the various independent variables It is essential to verify whether independent variables are highly correlated with each other.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.