Stats As - 2

Assessment -2
(MAT2001- Statistics for Engineers)
Embedded Lab-R Software

Winter semester 2019~20
Reg. no. – 19BIT0144 Name – Pranchal Sihare
[ L21 + L22]
Problem – 1
Age group 10-20 20-30 30-40 40-50 50-60 60-70
Representative age 15 25 35 45 55 65
Hours in the library 340.2 258.3 270.4 234.7 294.6 328.5
Illustrate the relationship between the average age versus the time spent in the library, by
using scatterplot.
R-code
Graph
Problem – 2
Find the coefficient of correlation between x and y. Using karl pearson’s

formula.
X 28 29 21 22 39 26 29 40 32 36
Y
15 25 35 20 30 28 18 17 27 31
R code
Problem – 3
Twelve recruits were subjected to selection test to ascertain their suitability for a certain course
of training. At the end of training they were given a proficiency test. The marks scored by the
recruits are recorded below-
Recruit 1 2 3 4 5 6 7 8 9 10 11 12
Selection test score 41 49 52 66 69 43 57 67 54 63 46 68

Proficiency test
48 59 80 57 64 61 42 53 49 68 52 45
score
Calculate rank correlation coefficient and comment on your result.

R-code
Problem – 4
The following data gives the marks obtained by 12 students in statistics and computer science-
Students 1 2 3 4 5 6 7 8 9 10 11 12
Stats Marks 55 70 72 49 69 54 63 67 54 60 40 68
CS Marks 35 39 80 57 64 61 42 53 49 68 52 45
Compute the coefficient of correlation by the method of concurrent deviations.
R-code
Problem – 5
The body weight and the BMI of 12 school going children are given in the following table-
Weight 19 22 13 24 15 26 17 28 29 30 31 12
BMI 13.2 14.3 11.2 15.1 16.5 15.7 12.6 15.4 12.3 11.9 14.7 16.1
Let us fit a simple regression model BMI on weight and examine the results.
R-code
Interpretation-
Correlation r=-0.06766155, which is the correlation coefficient between the body ‘weight’ and BMI. There
is a negative correlation between these two variables. The value of R2 is 0.004578, which means that
about 00.45% variation in BMI can be explained by ‘weight’ through this linear model. This is apparently
low because more than 99.55% of variation remains unexplained. There could be several reasons for this
and one of them is that there might be some other influencing variables that have not been included in the
present model.
The F value shown in the above output gives the statistics for the variance ratio test of the regression
model. The significance of F, which is given as 0.8345, is the p value of the F-test carried out in ANOVA. If
this value is less than 0.05 we say that the regression is statistical significant at 5% level of significance
.Here regression is significant which means that the relationship is an occurrence by chance.
In the above output we find b0 is the intercept which value of 14.47459 and b1 is the regression coefficient
due to weight with a value of -0.018. The regression coefficient is negative, which shows that the BMI is
negatively related to weight.
The regression output can be written as mathematical equation
BMI = 14.475 - 0.018 * weight
Suppose body weight of one student is known as 25 kg. Using the above equation, the estimated BMI is
14.025, since this is only an estimate we have to intercept it as the average BMI corresponding to the given
weight assuming that other parameters are unchanged.
Problem – 6
Obtain a linear relationship between weight (kg) and height (cm) of 10 subjects.
Weight 90 62 73 54 75 88 97 68 99 70 51 72
Height 179 165 182 149 190 175 180 154 170 165 147 161
R-code
Interpretation-
The regression equation is-
Weight = -56.759 + 0.7834 *height
Since the p- value of the test is 0.01469 is less than 0.05 we accept the hypothesis. Therefore the model we
found is not significant. The Multiple R-squared is the coefficient of determination. It provides a measure of
how well future outcomes are likely to be predicted by the model. In this case the R square value is 0.4643.
Therefore 46.43% of data is well predicted.
Practice Problem – 1
The following data refers to the daily sales of tomatoes (in kg) at different prices (in Rupess)
observed on different days in a market-
Price 4.5 5.5 4.5 4.5 4.0 5.5 5.5 6.5 5.0 5.5 6.0 4.5
Quantity
125 115 140 140 150 150 130 120 130 100 105 150
Sold
Carry out linear regression analysis for this data.
R-code
Interpretation-
price = 8.664 - 0.027 *quantity
Since the p- value of the test is 0.02668 is less than 0.05 we accept the hypothesis. Therefore the model we
found is not significant. The Multiple R-squared is the coefficient of determination. It provides a measure of
how well future outcomes are likely to be predicted by the model. In this case the R square value is 0.402.
Therefore 40.26% of data is well predicted.
The success of a shopping center can be represented as a function of the distance (in miles) from
the center of the population and the number of clients (in hundreds of people) who will visit.
The data is given in the table below-
No. of customers(x) 8 7 6 4 2 1
Distance(y) 15 19 25 23 34 40
a) Calculate the linear correlation coefficient

b) If the mall is located 2 miles from the center of the population, how many customers should
the shopping center except?
c) To receive 500 customers, at what distance from the center of the population should the
shopping centre be located?
R-code
a)Linear correlation coefficient = -0.950
b)The regression equation is –

x = 12.05 – 0.284 * y
Given, y = 2 miles
X = 12.05 – 0.284*2
X = 11.48
Customers = 100*x [as it is given in hundreds of people]
So, the number of customers the shopping center expect = 1148 customers
c)The shop receives 500 customers.

So,
X = 500/100 [as hundreds of people]
X=5
Now, y = -(x – 12.05)/0.284 [regression equation]
Y = 24.83
So, the distance between the shopping center and population should be = 24.83 miles.
Find the correlation between Experience and Income. Also fit a regression equation and
interpret the result.
Experience 0-5 5-10 10-15 15-20 20-25 25-30

Mid-
2.5 7.5 12.5 17.5 22.5 27.5
term(E)
Income(I) 1000 6000 9500 15400 18000 26000
R-code
Interpretation-
The correlation is = 0.992
E= 1.994e+00 + 1.032e-03 * I
Since the p- value of the test is 9.265e-05 is less than 0.05 we accept the hypothesis. The
Multiple R-squared is the coefficient of determination. It provides a measure of how well
future outcomes are likely to be predicted by the model. In this case the R square value is
0.9804. Therefore 98.04% of data is well predicted.

Stats As - 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stats As - 2

Uploaded by

Copyright:

Available Formats

Assessment -2

(MAT2001- Statistics for Engineers)

Embedded Lab-R Software

Age group 10-20 20-30 30-40 40-50 50-60 60-70

Hours in the library 340.2 258.3 270.4 234.7 294.6 328.5

Find the coefficient of correlation between x and y. Using karl pearson’s

Selection test score 41 49 52 66 69 43 57 67 54 63 46 68

Calculate rank correlation coefficient and comment on your result.

Compute the coefficient of correlation by the method of concurrent deviations.

The regression equation is-

Weight = -56.759 + 0.7834 *height

Carry out linear regression analysis for this data.

The regression equation is-

price = 8.664 - 0.027 *quantity

a) Calculate the linear correlation coefficient

a)Linear correlation coefficient = -0.950

b)The regression equation is –

c)The shop receives 500 customers.

Experience 0-5 5-10 10-15 15-20 20-25 25-30

The correlation is = 0.992

The regression equation is-

You might also like