You are on page 1of 35

Mathematics II: Assignment 10

Chapters 19-25
Due on Sunday, 03 May 2020

Dr. Gaurav Bhatnagar

Student, Shubham Singh, 19-11-EC-050

1
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Contents
Problem 1 Part (a)…………………………………………………………………….3
Problem 1 Part (b)…………………………………………………………………...10
Problem 2 Part (a)…………………………………………………………………...11
Problem 2 Part (b)…………………………………………………………………...12
Problem 2 Part (c)…………………………………………………………………...15
Problem 3 Part (a)…………………………………………………………………...20
Problem 3 Part (b)…………………………………………………………………...24
Problem 3 Part (c)…………………………………………………………………...27
Problem 3 Part (d)…………………………………………………………………...30
Problem 3 Part (e)…………………………………………………………………...33

Page 2 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Problem 1 (Homework Problem 1) Part(a)


INPUT (General/ before input 1, 2, 3 and 4):

Page 3 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

INPUT 1:

INPUT 2:

INPUT 3:

INPUT 4:

Page 4 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

OUTPUT 1:

Total number of COVID cases in India is rising daily at a constant rate. Also, the total
number of cases is increasing at a pace slower than what was predicted linear regression
analysis done in the past (Refer Part 3, Answer 5, Assignment 8). As predicted by the
regression analysis the predicted number of cases for the 43rd day (dt. 13-04-20) was 13244.
In actual this is only 9352, as evident from the above chart. Also, the number of cases on the
49th day (dt.19-04-20) as predicted came out to be 36288, which is double of the actual
figure of 17656.

Page 5 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Reference: Part 3, Answer 5, Assignment 8

Page 6 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

OUTPUT 2:

Total number of new cases in India is rising daily, despite some falls in the rate initially. This
can be observed from the pitfalls seen in the left half of the above bar graph.

Page 7 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

OUTPUT 3:

Total number of cases in Delhi state has been rising continuously. The rate of increase is
almost constant with small slowdown observed initially. A strong correlation can be observed
between the total number of cases and the number of days passed.

Page 8 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

OUTPUT 4:

The number of new cases in Delhi is increasing daily at a steady rate. However, there are
quite some outliers in the line plot as can be seen from the above graph. Initially the number
of new cases remained almost same despite some days having 0 number of new cases.

Page 9 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Problem 1 (Homework Problem 1) Part(b)


Community transmission of the corona virus (or any other virus) is said to take place when
the source of infection for a large number of cases in an area cannot be traced; when
individuals pick up the infection without travelling to countries infected where the virus is
circulating or having been in contact with known confirmed cases.
To determine if community spread has begun, a possible way can be to analyze samples of
people showing symptoms of the disease and where the source of this infection cannot be
determined. However, taking samples of every patient showing such symptoms is practically
not possible due to the following constraints:
1. The number of patients showing such symptoms is quite high. Collecting their
samples alone will be a difficult task, apart from the cost involved in collection and
analysis (Size and Cost Constraint)
2. Even if this much samples are collected, analyzing them would take a lot of time;
probably enough that the conclusion of the survey will be useless by the time it is
done (Time constraint)
Hence a sample population has to be selected from the available population for this analysis.
So, the survey will involve the following tasks:
1. Determining the symptoms on the basis of which the samples will be selected and
whether the source of the infection causing such symptoms is known or not
(Determining the condition to select a sample)
2. Identifying the sample population showing such symptoms.
3. Selecting a sample population that nicely represents the actual population. Analyzing
the samples to check whether the infection is causes by corona or some other factor
DETERMING THE CONDITION:
The most visible and identifiable symptoms of corona are influenza like fever and severe
acute respiratory-like illness. So, one condition is whether one shows these symptoms or
not. The second condition is to check whether the source of infection is known or not.
IDENTIFYING THE AND SELCTING SAMPLE POPULATION FROM IT
This can be done by following the multistage cluster sampling method (p.340, Freedman
and Purves’s Intro. To Statistics). In India the country is divided into states, states into
districts, districts into subdivisions.
First 5 different zones are identified in the country – north, south, west, central and east
(includes northeast). Within each zone, all population centers of similar sizes are grouped
together. One such grouping will be to group all the districts with a population between 1
million and 5 million and which are not a green zone. Then a random sample of these
districts is selected.
Now from within the selected districts, some subdivisions are selected at random.
Now from these subdivisions, some hospitals are selected—at random. The selection will
exclude all large private hospitals (since the patients there are not part of the locale
community). From the selected hospitals, the patients with appropriate symptoms and
those for which the source of infection is unknown are identified.
From these identified patients, samples of randomly selected patients are taken.

Page 10 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Problem 2 (Homework Problem 2) Part(a)


INPUT/OUTPUT:

Page 11 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Problem 2 (Homework Problem 2) Part(b)


INPUT 1/OUTPUT 1:

Page 12 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

INPUT 2:

Page 13 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

OUTPUT 2:

The degree of f(x) with respect to x is 2, which is even. Hence the graph of f(x) is symmetric
about the y-axis. Also, this is a similar to the general equation of the normal curve. Thus, it
shows symmetricity about the y-axis like the normal curve.

Page 14 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Problem 2 (Homework Problem 2) Part(c)


INPUT 1:

Page 15 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

OUTPUT 1:

Page 16 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

INPUT 2:

Page 17 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

OUTPUT 2:

Page 18 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

The normal curve is symmetric about zero and the area under it is 100% (1.0 in decimal)
(p.96, Freedman and Purves’s Intro. to Statistics). Hence the areas are also symmetric about
0. This explains equality of the ratios
A1/A0 = A1/A9 and A2/A0 = A3/A0
(Ai represents area of ‘ith’ entry in the above tables)
Because these areas are correspondingly symmetric about 0.
On changing the value of a=0 to a=1 and a=2 we shift the normal curve 1 and 2 unit
respectively. Thus, the obtained curve (and consequently the areas below it) becomes
symmetric about 1 and 2 respectively. Also, when the value of b is increased k times, the
value of the curve gets changed by 1/k times.

Page 19 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Problem 3 (Homework Problem 3) Part(a)


INPUT/OUTPUT: (Note: Comments in cells 18, 21, 25 constitute part of the answer.)

Page 20 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Page 21 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Page 22 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Page 23 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Problem 3 (Homework Problem 3) Part(b)


INPUT 1/OUTPUT 1:

The line plot of the internal and minor marks on the same graph helps to observe a trend
which is that many peaks and falls in the 'internal' marks line plot correspond with peaks and
falls in the 'minor' marks line. This tells that generally, people who work hard on Assignment
and Quizzes tend to do well in the minor also. We do another analysis to test this claim.

Page 24 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

INPUT 2/OUTPUT 2:

There are 4 students whose score is >=95th percentile in both internal and minor assessments
while 10 whose score >=95th percentile in Midterm 1. The analysis was repeated for 90th
percentile score.

Page 25 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

INPUT 3/OUTPUT 3:

There are 9 students whose score is >=90th percentile in internal and minor assessments while
25 whose score >=90th percentile in internal.
Hence it can be concluded that Sir’s claim is right for 40% students if 95th percentile is
considered a good score. If a score of 90th percentile is considered good performance, then
the claim is right for 36% students.
My analysis is based on the fact that more hard work means more marks. This is true in the
general sense (for math this might not be the fact), but there are students who work hard and
yet cannot perform well. These students are represented by the points where the peaks in the
line plot of internal coincide with falls in the line plot of minor marks. In the second analysis
the number of these students is the difference between students with good performance in
internal and students with good performance in minor and internal, both.

Page 26 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Problem 3 (Homework Problem 3) Part(c)


We have 120 students in total out of which 20 students are selected at random.

THE BOX representing the problem

120 STUDENTS
AVERAGE = ?
SD = ?
20 draws

The sample average is the sum of the marks obtained by 20 students divided by 20. Let this
be mu_s. Also, n = 20. Then the average marks for the 120 students will be mu_s +/- some
chance error. The sample average mark is like the average of the draws. So, the SE of the
sample average, which is denoted by SE_s can be obtained as:
SE_s = SE / (sample size) = SE/n
where
SE = sqrt (no. of draws) X SD of box
= sqrt(n) X SD of box
(SE is the standard error of the marks of 120 students)
Since the SD of the box is unknown, the SD of the sample can be estimated as the SD of the
whole data. Let this be denoted by sigma_s. Therefore,
average = mu_s (+-) sqrt(n) X sigma_s.
The 95% confidence interval can be determined by going 2 SEs away from the sample
average. Therefore,
95% confidence interval = ( (mu_s - 2*SE) , (mu_s + 2*SE) )

Page 27 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

INPUT:

Page 28 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

OUTPUT:

It can be observed from the above tables that the actual mean value lies within the range of
predicted mean values and the 95% confidence interval 5 out of 5 times the experiment is
conducted.

Page 29 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Problem 3 (Homework Problem 3) Part(d)


INPUT/OUTPUT: (Note: Comments in cells 56,46 constitute part of the answer.)

Page 30 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Page 31 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Page 32 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Page 33 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

Problem 3 (Homework Problem 3) Part(e)


On analyzing the data from experiment (c) and experiment (d), we find that:
• The histograms for both the data fit the normal curve quite well except for one high
peak in in both of them (Refer below figure).

Experiment (a)

Experiment (d)

Although it is notable that the peaks belong to different class intervals (which
represent marks). The histogram for exp. (a) has a high peak for CI 7-9 while the
histogram for exp. (b) has the high peak for the CI 13-15. This shows that the
randomly generated data is more centered with respect to the extremes unlike real life
data where majority marks is comparatively low.
This is reflected by the fact that the mean marks for randomly generated data is
considerably higher than the mean marks for actual data.

• For both the cases, the actual mean lies within the predicted sample mean range
within the 95% confidence interval 5 out of 5 times.
Following conclusions can be drawn from the experiment:
1. The probability histogram for the average of draws for both the datasets follow the
normal curve quite well. The histograms made by me involve the marks on the x-axis.
However, the probability histogram for the average will also remain the same since
the later operation is just a change of scale (fig 1, p.411). This is in accordance with
what I have studied in the book (box, p.412).
2. For both the data, the SE was taken equal to the SD of the sample because the sample
size was large (box, p.4160). The results came out to be quite accurate (estimated by
the fact that the mean lied within the predicted mean range).

Page 34 of 35
Student, Shubham Singh, 19-11-EC-050, Mathematics II (Dr. Gaurav Bhatnagar): Assignment 9

3. The 95% confidence interval means that actual mean lies within the 2SEs either way
from the sample mean, in 95% of the samples (above Ex 3, p.417). This came out to
be true for both datasets since the actual mean lied within this range 95% of the times
(95% of 5 is 4.75, I round it of to 5).
4. The samples in both the datasets were chosen by probability methods. That is the
reason the formulas for simple random samples could be applied here (point 6, p.394).
5. I performed statistical inference of both the datasets. Selecting students in (c) and
selecting marks, summing them up and then making a draw among those sums in (d)
was equivalent to making draws from a box. My statistical inference is justified
(according to me) since I am able to put a chance model for the data in both cases
(box and above box, p.455).

Page 35 of 35

You might also like