You are on page 1of 6

INTRODUCTION TO BUSINESS ANALYTICS

Assignment - 1

Submitted By

George Thomas

P22251
a. Identify the type of data

b. Number of variables

c. Number of Observations

d. Draw a scatter plot between the variables and check for any association between the variables.

e. Find the covariance matrix of the given data set

f. Find the correlation matrix of the given data set.

g. Identify the variables which are having a strong association (use correlation matrix) h. Draw box plot
for the first two variables.

Answers
a. The data is TIME SERIES data
b. There are 9 variables
c. There are 9 observations ranging from 27 February to 7 March

NO - NO2
40

35

30

25

20

15

10

0
0 20 40 60 80 100 120
d.
PM2.5 - PM10
200
180
160
140
120
100
80
60
40
20
0
0 20 40 60 80 100 120 140

e.

PM2.5 PM10 NO NO2 NOx NH3 SO2 CO Ozone


PM2.5 576.464
PM10 761.3284 1029.632
NO 77.30185 86.42169 488.5468
NO2 -72.472 -110.041 -18.4224 51.70144
NOx 23.7926 11.177 384.0396 12.63273 316.3016
NH3 -6.66254 -6.31722 -39.6642 -0.22094 -32.0916 3.4458
SO2 32.83127 46.37587 -22.9215 1.078667 -17.9019 1.463489 20.3108
CO 1.377494 1.803859 -0.14471 -0.04331 -0.13971 -0.00729 0.118859 0.005358
Ozone -0.88704 -1.27699 -0.25051 0.51087 0.069637 0.002119 0.082719 -0.00068 0.005573
f.

PM2.5 PM10 NO NO2 NOx NH3 SO2 CO Ozone


PM2.5 1
PM10 0.9882 1
NO 0.145663 0.121851 1
-
NO2 -0.41979 -0.47694 0.11592 1
NOx 0.055719 0.019585 0.97695 0.098786 1
-
NH3 -0.14949 -0.10606 0.96672 -0.01655 -0.97207 1
SO2 0.303416 0.320692 -0.2301 0.033287 -0.22335 0.174937 1
-
CO 0.783793 0.767997 0.08944 -0.0823 -0.10732 -0.05362 0.360302 1
- -
Ozone -0.4949 -0.5331 0.15182 0.951746 0.052451 0.015288 0.245867 0.12381 1

g. If correlation value of any variable is greater than +0.7 and -0.7 there is association between
variables

Variables Correlation
PM10 - PM2.5 0.9882
NOx – NO 0.97695
NH3 - NO 0.96672
CO – PM2.5 0.7837
CO – PM10 0.76799
OZONE – NO2 0.9517
NH3 - NOx -0.9720
h.

ASSIGNMENT 2

Create two columns of numbers in excel sheet with first column having numbers from
1,2,3,4,5,6,7,8,9, 10 and second column having numbers from 2,4,6,8,10,12,14,16,18,20.

a. Compute the covariance and correlation between two sets of numbers.


b. Now change the second column to 20,40,60,80,100,120,140,160,180,200 and repeat the
computation of covariance and correlation.
c. What do you observe? Is correlation or covariance insensitive to the scale of the variable?
Column Column Column Column
1 2 Correlation covariance 3 4 Correlation covariance
1 2 1 18.3 1 20 1 183.3
2 4 2 40
3 6 3 60
4 8 4 80
5 10 5 100
6 12 6 120
7 14 7 140
8 16 8 160
9 18 9 180
10 20 10 200

It is seen that the correlation between the variables in columns 1 and 2 & columns 3 and 4 is the
same. It does not change with a change in the variables, whereas the covariance value changes
with a change in the variables. So it can be concluded that correlation is insensitive to the scale
of the variable, and covariance is sensitive to the scale of the variable.

You might also like