You are on page 1of 5

1.

OBJECTIVE
Analysis of the Mean Body Mass Index for 107 countries around the globe with the help of Multiple Linear Regression
considering the correlation between different independent factors such as “Prevalence of obesity”, “Prevalence of Raised
blood pressure”, “Pure Alcohol Consumption per Person”, “Prevalence of insufficient physical activity” and “Raised fasting
blood glucose” percent’s of defined population.
2. CONTEXT OF DATA

According to WHO, to maintain proper fitness the Mean Body Mass Index (BMI) needs to be approximately 21 to 23 kg/m 2
for a population of the country. While one’s aim must be to achieve an average BMI between 18.5 to 24.9 kg/m2 to reduce
variety of health risks. In following study, we will be able to identify relation between Mean BMI and the factors which
have a significance on Mean BMI. Prevalence of obesity, Prevalence of Raised blood pressure, Pure Alcohol Consumption
per Person, Prevalence of insufficient physical activity and Raised fasting blood glucose percent’s of defined population
considered as predictors to estimate the mean BMI.SPSS is used for evaluation of Model.

3. DATA SOURCE AND TRANSFORMATION

Analysis of Multiple linear regression performed by considering 5 independent variables and data set for mean BMI
analysis has been taken from “https://www.who.int”.Initially dataset has been cleaned with the help of Excel by replacing
NA and blank values with mean of variable.

4. LEVELS OF MEASUTREMENTS

Mean BMI(Dependent variable) Continuous


Pure Alcohol Consumption Per Person(lit) Continuous
Prevalence of insufficient physical activity (%) Continuous
Prevalence of Raised blood pressure (%) Continuous
Prevalence of obesity (%) Continuous
Raised fasting blood glucose (%) Continuous

5. CHECKING ASSUMPTIONS
• Multicollinearity
The independent variable present in model shows some strength and direction of the linear relationship with dependent
variable which also called as correlation. The preferable range of correlation between two variables is 0.3 to 0.7, Fig. 5.1
shows correlation between variables present in model with dependent variable.
Fig. 5.1
The correlations between dependent variable and “Prevalence of Raised blood pressure per defined population”, “Pure
Alcohol Consumption per Person” and “Prevalence of insufficient physical activity per defined population” are lower which
is below 0.3.We can observe high correlations between “Prevalence of obesity”, “Raised fasting blood glucose percent of
defined population” and dependent variable are 0.922 and 0.578.There is no bivariate correlation between two independent
variables greater than 0.7 in same analysis.

Coefficients table Fig.5.2 gives information about multicollinearity mainly Tolerance and VIF that may not present in
correlation table. Tolerance is signal of how much variability of independent variable is not explained by another independent
variable in model and it is given by 1-R Squared for each variable. If Tolerance value is low which states that there is multiple
Correlation between other independent variable proves multicollinearity. VIF (Variance inflation factor) which inversely
proportional to Tolerance and if it is greater than 10 then imply multicollinearity.

Fig. 5.2

Values of Tolerance and VIF in above table are in range of acceptance so there is no multicollinearity present in given
independent variables.

• Normally Distributed Residual


Fig. 5.3, the normal probability plot aid hypothesis of normally distributed residual, if plotted points on graph are near to the
straight diagonal line from bottom left to top right. This shows there is no alteration from normality.
Fig.5.3
From above plot we can see there is some diversion from diagonal line on top right of plot, but overall points on plot are not
violating the normally distributed residual assumption.

• Homoscedasticity
Scatterplot of standard residuals should be distributed over rectangular shape and most of them should be condensed at
center. The unexpected plot is the one where residual are situated more on one side than other side which violates
assumption.

Fig.5.4
From scatter plot we can also figure out outliers, which can be define as the standardized residuals are falls apart more than
3.3 or less than -3.3.In above Fig.5.4 we can see few outliers but it may not be necessary to remove.

The outliers can also analyze from Mahalanobis Distance produced by Residual Statistics table, as shown below Fig.5.5.The
number of outliers can be found out by sorting Mahalanobis Distance column in data file. Identification of these outliers has
been carried out by Critical chi-square value using number of independent variables as degree of freedom and alpha. In this
case degree of freedom is equal to 5 and alpha value is equal to 0.05(confidence interval is considered as 95%). The critical
value analyzed from above information is 11.070, which states that the number of outliers are 6.If possible we can remove
these outliers, to improve the model.
Fig.5.5
Maximum value of Mahalanobis Distance obtained from Fig. 5.5 is 24.030 which is larger than threshold value, from above
analysis we can conclude that data set has few outliers. Further, cooks Distance from residuals statistics should not be more
than 1 to avoid potential problem. The cooks distance obtained from table is 0.229 which is in expected range.

6. MODEL EVALUATION
The R Square obtained from Model Summary gives information about how much of variance in predicted variable is
interpreted by the Model. In this case from Fig.6.1, the R-Square is 0.881 which means 88.10% variance of mean BMI can be
evaluated by multiple regression model.

Fig.6.1
Adjusted R-square value from model summary is used when we are dealing with small sample. Adjusted R-square overcomes
optimistic overestimation given by R-square and give better estimate of true population value. The Durbin Watson is equal
to 2.298 which is in range of 1 to 3, therefor we can conclude that there is no autocorrelation in our multiple linear regression
model.

To check significance of statistics performed by model it is necessary to look in the table labelled as ANOVA (Fig.6.2).

Significance of multiple regression model is 0.000 which I less than p value 0.005 and we can conclude that variance of mean
BMI evaluated from model is 88.10%.
7. CONCLUSION

The variables contributed in analysis of dependent variable and their contributions are obtained from table named as
Coefficients (Fig.7.1).The standardized column is more preferred than unstandardized where each value is converted into
same scale so that it is easy for comparison.

Fig.7.1
Regression equation can be formed by values present in Unstandardized Coefficient B .The contribution of each independent
variable is given by Beta value, Fig.7.1 states that “Prevalence of obesity” contributes more in analysis of dependent variable
where as “prevalence of insufficient physical activity” contributes less in analysis. “Prevalence of Raised blood pressure”
contributes negatively in analysis .Significance plays vital role in analysis of multiple regression model which tell us significant
unique contribution of each variable. This depends on which variables are included in equation and multicollinearity between
two independent variable. If Sig., value is less than 0.05 the variable contributing significantly in prediction of Mean BMI.
Independent variables contributing uniquely are “Pure Alcohol Consumption per Person”, “Prevalence of Raised blood
pressure”, “Prevalence of obesity” and “Raised fasting blood glucose”. “Prevalence of insufficient physical activity” has high
sig. value may be because of overlapping with other independent variable. For better model output we can remove this
variable or try with another combination of independent variables. Part correlation also called as partial correlation, square
of this value gives its contribution in R-square. Which means how much variance is uniquely described by that variable and
how much it effects on R-square if we drop from model. Prevalence of Obesity has 0.436 part correlation, if we square this
we will get 0.18 which means it is describing 18% variance in total mean BMI.
The standardize beta values indicates that the number of standard deviation change in dependent variable if one standard
deviation change in independent variable. Therefore we can conclude that “Prevalence of obesity” (beta=0.708) percentage
in define population has larger unique contribution in mean BMI population and including 5 independent variables this model
explains 88.10 percent of variance in Mean BMI.

You might also like