You are on page 1of 11
In this chapter, we will Introduce the concept of correlation for interval-scaled or metric variables ~ Highlight the use of regression in marketing research, particularly for sales forecasting based on marketing mix variables * Provide a worked example to show a typical application of regression analysis in marketing Introduce forward and backward stepwise regression using a software package Discuss the use of regression in explaining causation and predicting the value of the dependent variable based on a set of independent variables Explain how to judge the si: how to use ® as a measur the significance of individu: ignificance of the regression model through an test, € of explained variation, and the Atest as a measure of al variables in the regression model APPLICATION AREAS ated with s 10 any ing expenditure cor oreenesret Virtually no limits to applying correlation analy: With another and how to interpret pet t makes sense to study a particular varieble's comelti? tO interpret it, is a different iss i esearcher’s responsibility to enst ‘ation is usually followed by regressi a ion analysis in ™ Scanned with CamScanner Ny (| | jag. bickage to perform the regress stive of regressio 9 objective OF Eression 6 variable), based on the | The applications areas a (796) gpenses, OF number of VOM ayere is only one depen SOC tion ‘tion and Causetion ‘@riation ji ~ her vy ae One variable (called the me isbles (called the independent In sales of a ses 1 I one d sales offene product based on ~ in it then the model is knov © indepe - OF On all the above io Fegression. If mo “ ndent variable used : le used to expla PM apiain the variation in a dependent variable — multiple ind * jon in 3 lependent variables are 39 form of the reBtession equation couig nz > “led & multiple regresion mode. Even ig ro lingacmadels. As seen from the preceding i Hm " i ear oF non-lin : i is 5 eding discygaq ee We will limit our set) marketing is in the atea of sales forecaaine ee & ies, This does not mean that regression analy ; "some independent (or explanatory) are variety of quantitative and qualitative methods w “ one of the better known (and often used) quantitative am ay ues, m, the major applica ne tio ieee pplication of regression 005 There are basically two approaches to regression: + A hit-and-trial approach, + A pre-conccived approach |, Inthe hit and trial approach, we collect data on a large number of independent variables and then try to fit a regression model with a stepwise regression model, Entering one Variable into the regression equation at a time. ‘The general regression model (linear) is of the type at bx, + nt gk where y is the dependent variable and %, are the independent variables expected to be related to y and expected to explain or predict y. b;. bs. bs. b,, are the coefficients of the respective independent variables, which will be determined from the it da — esearc easonably well which variables The pre-conceived approach assumes the researcher knows reasonably i 3i ent va 4, Xay.X3. Therefore, explain y and the model is pre-conceived. say, with 3 ee pei X3 ce Not too much experimentation IS done. The main objective ! f the pre-con « same form as earlier. ; model is good or not. The equation 1s of the oe im a eyeion analysis. This data is input lath one | ariables ee red to o “ . “pul data on yand each of the x variables Fea output consists of the b coefficients ral the jon analysis. cults of a t-test for the significance Sides variables in the model. The output also 2° pea a othe whole. vail im the model and the resus oF he ESF rahe desired confidence level (usually 90 or 95% for \ " fic ‘pig MNEthe model is statistically Si ian atthe oe eri mn applications in the marketing area). the & on or R° of the model is an vtion) of the total variance in y neg ea OF th “The R? value is the percentse® (or prope "a \e output. The equation. Med by al the independent variables in the reeressiO! Scanned with CamScanner PZ) Maron ev roh, Hut for erin decinir y to affect y, and only supe Ch va RECOMMENDED USAGE The bitand-trial approach may be there has to be aprion knowledge of the varia lysis rode) iv ivelf significant at the desired con fi the 12 value should not be interpr {for explorstory lew which are like should be used in the repression Inis abo recommended that unless the 1 evidenced by the test results printed out for the ables used (both independent and dependent) a 1 ali be us model ‘al in a re ither int assumed 10 b { as independent variable her Marketing Research: Methoduy, (olay, The ratio sealed, Nominally sealed variables model, with dummy variable coding. Please re c Io Research for Marketing: Decistons by Green, Tall & Atbaum §, details on the use of dummy variables in regression analysis, Our worked example confines in se Foundations by Chure Id be the metric interval scaled variables, Ifthe dependent variable happens to be a nominally scaled one, discriminant analysis sho Ou technique used instead of repression Scanned with CamScanner whether metric variables are associag ea Wj atained by some independent varia eC gether to test sression are best applied toge' ae ble, Correlation ar eter the dependent variable can DE oer ly sales (thougp each other, and whet », the dependent variable oe af redited from them. In marketing the dependent wa A which al Ich as Fiways). The independent variables can be a Fe, promotional expenditure, and So.on. They may athe advertising expenditure, number of slesPeorne ry related industries (for example, sales of rubhe, include macroeconomic variables or A of o res, cables, etc.). mblesimay ibe media . depending on sales ofr dy have an idea of which independent variables: ae © predictors of the or may alreat et ve] (equation) in regt eeereenic tattle aie predetermined mee equation oe or significance through an F-test. If he is not sure, the procedures 0 aariables in the teen ee try out assorted variables as independent variables in the regression can : minithe}ind variance in the regression i uuter suggest the best ones, that explain the maximum variance in the equation and let the computer suggest the be: {all the variables also provide a good indicator of dependent variable. The correlation coefficients of a eae ae i elated with the depe a variables are highly correlated wi t var aoc Ea correlated a themselves, it may lead to complications in determining which variables are c selves Tene er el vases occur, one uly inter-correlated variable one is better among two or three. When such cases occur, one or two strongly bles may have to be removed from the regression model. The regression equation is judged for its usefulness interest is usu able. If the independent ed on: I. the overall F-test for the model. If this is significant at say, 95 per cent confidence level, it indicates that the model is good overall. This shows up as a p-value of less than .05 on the ANOVA table in the regression output. 2. to decide which variables in the model are good expl: individual t-test for each variable needs to be looked -05), it indicates that the concerned variable is significant in the model. 3. the R? value of a model tells you what percentage of the variation in the explained by all the independent variables in the model Simple or multiple linear regression is a u carefully with an eye on its limitati from the range of Values usi variables in the model lanatory variables of the dependent, the . If this value is significant (less than dependent variable is ful technique for expl ‘ons. It generally should not be ed in building the model (equation), A should not be highly correlated with one Se studies further illustrate the use of Fegression models in explanation and achaetaee s in marketing. SPSS commands for getting Correlation and regression outputs Questions. (We have used a different Package called STATISTICA for fanation and prediction, if used used to predict values too far away Iso, as far as possible, independent another, Scanned with CamScanner SPSS COMMANDS FO! REGRESSION Correlation After the input data has been typed along with variable labels’and value labels in an SPSg file, Meas S file, to pe the output for a Correlation problem similar to that described in the chapter on REGRESSION CORRELATION in the text, - click on ANALYZE at the SPSS menu bar (in older versions of SPSS, clicl wep - click on CORRELATE, followed by BIV . click OK to get the matrix of painwise Pe: ang K on STATISTi¢g i ad of ANALYZE). RIATE. on the dialogue box which appears, select all the variables for which correlations are required by clicking on the right arrow to transfer them from the variable list on the left Then select Pearson under the heading Correlation Coefficients, and select 2-tailed under the heading Tests of Significance, arson Correlations among all the variables seleve along with the two-tailed significance of each pairwise correlation. Regression After the input data has been typed the output for a Regres along with variable labels and value |: labels in an SPSS file, to ge ion problem simil: ar to that described in the chapter on REGRESSION ani CORRELATION in the text, + click on ANALYZE at the SPSS menu bar (in older versions of SPSS, click on STATISTICS instead of ANALYZE), click on RI SION, followed by LINEAR in the dialogue box which appears, select a de to the dependent box after highlighting the ap; Pendent variable by clicking on the arrow leadié left side. a te propriate variable from the list of variables 0° select the independent Variables to be ine! transferring them from left si called independent variables / ; ai. luded in the regression model in the same wa de to the right side box by clicking on the arrow leading 0 t or independents, Scanned with CamScanner Conelation and Petes, Eye. WCW: Egeseiog beperichen or 6 Canter take «, inthe same dialogue box, select the METHOD, Cheeye ~ ‘TER as the method if you want al) independent EPWISE if you want to ue forward + ; ; y opis + BACKWARD if you want to use backward + 6, select OPTIONS if you want additional outp CONTINUE. 1, select PLOTS if you want to see some plots such as resid click CONTINUE. 4, click OK from the main dialogue box to get the REGRESSION outpat. Note: You can go back to your data file by clicking on that, change the method at Step 5. and gett rgession output using another method of your choice in the same way 2s General: AJ] output files can be saved using File Save command. They can be printed Print command. Input data also can he separately saved, or printed, using the same commands (FILE SAVE. FILE PRINT) while the cursor is on the input data file. ut Options, » Scanned with CamScanner CASE STUDY TT Correlation and Regression —_ Problem - . s ising in th Monsanto inc. a biotech major. specialising to know the degree to which yield of wheat per hs op, and a few other variables. . ; — an ce, it deputed an employee in its Bangalore H.O.. to ae nae aah to Study effect that six diferent conditons (fodependent variables) have on 7 viel per Sc variable) for a crop of wheat. The research was conducted by accumulating the fifteen nay 2 development of genetically modified ectare is influenced by the variety oF seeds wt ’ states in India The six independent variables are 4%, Rainfall (in cms.) of soil suitable for wheat) Soil type (I—low quality, $—high qua 4; Quantity of fertilizer (in quintal sq. kilometer of land) X, Percentage of land being irrigated by state agriculture department 5 Seed quality (I = low, 5 = high level of Genetically modified quality) X, Percentage of automation in cultivation process The dependent variable is Y Yield per hectare (in quintals) Input Data The data set for yield of Crop in fifteen states is given in tabular form ANALYSIS OF OUT PUT The Correlation tab} le shows all the Standardised, and Tan Ht Pairwise correlat é Telations, age from 0 to 1 ( e The value: correlation table # Positive and n, ues in the corr ©). We observe that except for the RAINFAY Prepared by Biswa, na Imran Ahmed. Vikram Sharma, Biswait Mishra time Ahmed. \ Si t Sharma a Kur Scanned with CamScanner Correlation and Regression: Explaining Association and Causation umn (rainfall inom), all other variables are highly corre wing fre 9 atm yy ea np ona Wfertlizer in quintal/sq kilometer of land, percentage of land being eased bares ee 1, seed, percentage of automation in cultivation process) to try and correlate with yield. C fall in cms. doesn’t appear to be strongly correlated (correlation coefficient is 0. 665 yw ‘These correlations are one-to-one correlations of each variable with the others The correlation table shows that independent variables are highly correlated with cach other. Th indicates that they are not independent of each other and only one or two of them can be used to pred the dependent variable (yield). Regression is helpful in eliminating some of the independent variabl gs all of them are not required. Some of them, being correlated with other variables. do not add any value to the regression model. : Regression ‘The regression model of the following form has been used by entering all the six “X" variables in the model Y= a+ bx, + byXy + BsXy + BaXy + BeXs + bX and the value of a, bj, by, bys Bas Bs, be. In the output of regression model the value of “B” gives al the coefficients of the model which are as follows: X, = 0.081260 X, =— 0.464805 X, =~ 0.386654 X, = 0.192612 X, = 7.661391 Xz = 0.276314 a (constant) = 4.467521 These values can be substituted in the above equation to get the value of Y (yield) Yield = 4.47 + 0.08* (rainfall) - 0.46% (soil pe) — 0.39* (quantity of fertilizer) +0.19* (Percentage of land being irrigated) + 166* (seed quality) +0.28* (Percentage of automation). The p-level is observed to be 0.000 confidenc 99.99. 7 The Py eo cate We also note that (“Test for significance of individual dependent variables i ani | of 0.10 (confidence level of 90%), only SEED TYPE is Sai eee aa i five independent variables are individually not ly significant i significant at 90% confidence limit. 1, indicating that the model is statistically significant at a del. The other Scanned with CamScanner foxt and Cases ' ne mw sme ahat yield oF wheat Will INCE Ase jy of fertilizer per sq. KM doereage. 5 Markating Researel y SE, 4 eases, oF HE Seed quality inenget ‘ ad, mentioned above have obtained, ment Be ome quannly quation We ec fecrease ae a increases, i soi ua on ation ep Mamatedd inetCase i Yield of f rant sing irrigated by state yee e esti ase in yi an pereentage of ane Dei HT 1 in cultivation incre’ the respective variables, That meang n Prifthe degree of automation iin these variable is given BY ies enti 1 ieTeASE by Q gg TOR rease or decrease per hectares 1 sreases one percent, al every uit imerease OF py one. yield in quintal y euivation INCFEANES ONE Petwent ye ea ps are unehangd Sia if oe TT ning constant, is ariables are unchange Hier variables rennin ater variabi 2 ntal per hectare, o Joesn't make too much j se by 0.28 quintl p PE: variable, which does intuit, . pr: variable, r SOIL TYPE vi expected to incr i i crease by 0.46 quintal per hectane ‘There is one coefficient, that of os i TYPE ah cy 4 rh it f yields @ the quality of soil. gal that coefficient of the independent variable gop able (tests, we find svel 0.8058), Theretore it is NOLL be used ‘hey cial Soe conclude that only one indepenion dence level since sig Tis less than 0.44 sense, If'we increa’ if we look at the individual TYPE is statistically not si cant (significance TYPE is statistically not significant (sit : 'YPE is statistic: ae interpreting regression, as it may lead tow ; variable, SEED TYPE is significant statistically rheref at the relationship of erefore one should look at the rela ! i fen rie There! XX, No and XY, fora partie ular state, Monsanto inc, can use the at 90%o cont { yield with this independent variable. Given the levels of i, Ny. A's regression model for prediction of yield of wheat per hectare of land, Forward Stepwise Regression In forward stepwise regression the algorithm adds one independent variable at a time, starting with the one which explains most of the variation in yield (1), and adding one more V variable to it, techecking the model to see that both variables form a good model, then adding a third variable if it still adds tothe ression ends up with? explanation of Y, and so on. In our output of forward stepwise regression, the reg out of 6 independent variables remaining in the regression model. The two variables in the model (Refer to the FINAL OUTPUT TABLES for forward stepwise method) are AREAIRRG (percentage of land being irrigated by state agriculture department) and SEED TYPE (seed quality), We notice that only one significant variable (with sig 7™< 0,10) at 90% confidence is SEED TYPE (sig 70.0122), But AREAIRRG is now at sig 0.1012, very close to significance at 90% confidence level. The F-test for peer pr oead itis highly significant, F = 69.66873 at sig F = 0000 and R° value for the 's 0.92071, which is very close to the SIX independent variable model (0.94616). If we decide 10 Use this model the equation would be Yield = 7.77 + 6.11* (seed type) + 0,3]* (areairrg) Backward Stepwise Regression Backward ste EPWISe regression Was perfi Bac Stession Was performed on the oe Model starts with SIX variables fT ot expla cho ; explain much of Variation in ¥, until ite ae same set of six independent variabl to pr a 8 ie ually eliminates one after another those which &° tain m S with a F i Preset critey a tor the exit of Variables, re ling ina Pelee 0 dependent ate . resulting in a model with fe h only two independent variables. is SEED Tre and AREAIRRG Teeression. So the remaining, whiel So the equation of yieid win a which 'S same as the outcome of forward stepwis ame Yield = 7.77 + 6 14 S11" (seed ype) +9, 31 *areairrgy Scanned with CamScanner Correlation and Regression: Explaining Association and Cousotion ves oe Input Data Table Rainfall_| Soiltype | Frtiizer | Areaing | Seeatype | 23 3 8 7 | 2 12 1 3 47 1 ee 3 9 82 2 us 2 5 a | 2 2 3 4 80 3 poet 7 86 3 As 5 10 90 4 2 5 50 2 “ 1 4 42 7 7 5 9 92 5 30 2 7 49 2 7 a 4 82 2 7 14 46 a ; 3 41 4 15 15 58 42 : = 3 4 8 89 4 Correlation Coefficients Rainfall_| Soiitype | Fertilizer | Areairrg | Seedtype Rainfall 1.0000 .0240 — .0674 ~.0086 | -.1190 (15) (15) (15) (15) (15) =. P=.932 | P=.811 | p=.976 | p=.673 Soitype 1.0000 8468 8103, 8959 (15) (15) (15) (15) =. | P=.000 | P=.000 | P=.000 Fertilizer 8468 1.0000 7059 7580 (15) (15) ) (18) P=.000| P=. P= 003 | P=.001 | ari 8103 .7059 1.0000 8151 (15) (15) (18) (18) | P=.000 | P=.003 P= P=.000 | P =o re .7580 8151 4.0000 4 4 “ore al (18) (18) (18) oe oy - p= P=.000 | P=.000 p=.000 | P=.001 | P=.000 9155 9407 8032 1.0000 | Automate 8177 7360 (15) (15) (18) (15) (15) (15) p=.000 = P= .000 p= 00 | P=.002 | P=.000 § Scanned with CamScanner eee erasers ceeeeeeeseeeeeees 4 | | | ‘Marketing Research: Text and Cases eaes] 7010 | 9291 ae a 7 a) (15) 9 peat pao p=.000 | P=.000 | P=.000| p= (Coofficiant/(Cases)/2-tailed Significance) *,"is printed if coefficient cannot be computed Multiple Regression Output with all Variables (Table |) Listwise Deletion of Missing Data Equation Number | Dependent Variable. Yield Block Number 1. Method: Enter RAINFALL SOILTYPE FRTILZER AREAIRRG SEEDTYPE AUTOMATE Variable (s) Entered on Step Number 1. AUTOMATE 2. RAINFALL 3. FRTILZER 4, SEEDTYPE 5. SOILTYPE 6. AREAIRRG Multiple R S721 R Square 94616 Adjusted R Square 90579 Standard Error 3.19378 Analysis of Variance i DF ‘Sum of Squares Mean Square Regression 6 1434.13142 239,02190 Residual 8 81.60191 10.2024 F= 2343297 Signif F= 0001 Variable in the Equation —— 8 SEB Bota RAINFALL 081260 : 056255 peace | Ties | sams | cca ~ 386654 6979 = ABEAIRRG ee a 088235 AUTOMATE 7.661391 3.077901 eee TE 276314 25 elcid (Constant) | gage pe 273381 bs 821 4.826069 nd Block Number 1 Al requested varia Scanned with CamScanner

You might also like