BSc IT Computer Techniques Stats Solutions

BSc.
(Information Technology)
(Semester IV)
2018-19
Computer Oriented Statistical

Techniques
(USIT 403 Core)
University Paper Solution
By
Ms. Maitreyi Joglekar
Ms. Maitreyi Joglekar Page 1

Question 1
a. Explain-
i) Mean Deviation ii) Variance
Ans i) Mean Deviation-
Consider a data with N observations X1, X 2…… X n occurring with the frequencies
f1 , f2 …fn , then Mean deviation is denoted by MD and is defined as-
∑ 𝑓 |𝑋−𝑋̅|
MD = ∑𝑓
𝑀𝐷
Coefficient of Mean Deviation= 𝑋̅
∑ 𝑓𝑋
Where 𝑥̅ = ∑𝑓
ii) Variance-
Variance is a square of standard deviation.
∑(𝑋 − ̅̅̅
𝑋)2 𝑋2
𝑆. 𝐷 = 𝜎 = √ = √∑ − 𝑋̅ 2
𝑁 𝑁
∑𝑋
Where 𝑋̅ = 𝑚𝑒𝑎𝑛 = 𝑁
𝜎
Coefficient of SD= 𝑋̅
Variance= 𝜎 2
b. Calculate mean and std. deviation.
Ans Class (X- X bar) F dev

Interval Mark F fx X- X bar sqr F dev square
20-30 25 3 75 -29.72 883.2784 -89.16 2649.835
30-40 35 61 2135 -19.72 388.8784 -1202.92 23721.58
40-50 45 132 5940 -9.72 94.4784 -1283.04 12471.15
50-60 55 153 8415 0.28 0.0784 42.84 11.9952
60-70 65 140 9100 10.28 105.6784 1439.2 14794.98
70-80 75 51 3825 20.28 411.2784 1034.28 20975.2
80-90 85 2 170 30.28 916.8784 60.56 1833.757
542 29660 1.96 2800.549 1.76 76458.49
Mean = 54.72
∑ 𝑓 |𝑋−𝑋̅|
Mean Deviation= MD = ∑𝑓
= 0.00324
̅̅̅̅2
∑(𝑋−𝑋) 𝑋2
𝑆. 𝐷 = 𝜎 = √ 𝑁
= √∑ 𝑁
− 𝑋̅ 2=
=11.877
c. Calculate semi interquartile range of following-

Ans Class
Interval Mark F fx Cf
170-180 175 32 5600 32
180-190 185 68 12580 100
190-200 195 85 16575 185
200-210 205 92 18860 277
210-220 215 100 21500 377
220-230 225 95 21375 472
230-240 235 70 16450 542
240-250 245 28 6860 570
570 119800
𝒄 𝒊𝑵
𝑸𝒊 = 𝑳𝟏 + { ( − 𝒄𝒇)} where i= 1, 2, 3
𝒇 𝟒
Q1= 195
Q3= 225. 31
Semi-interquartile range= (Q3-Q1)/2

= 15.15
d. Compute mean deviation-
Class f( X- X
Interval Mark F fx X- X bar bar)
-
0-10 5 8 40 -32.667 261.336
10 to -
20 15 12 180 -22.667 272.004
20 - 30 25 20 500 -12.667 -253.34
30-40 35 25 875 -2.667 -66.675
40-50 45 15 675 7.333 109.995
50-60 55 9 495 17.333 155.997
60-70 65 6 390 27.333 163.998
70-80 75 5 375 37.333 186.665
80-90 85 5 425 47.333 236.665
105 3955 65.997 -0.035
Mean- 37.667
∑ 𝑓 |𝑋−𝑋̅|
Mean Deviation= MD = ∑𝑓
Mean deviation= 0.00033
e. Compute median

Ans
f. Define factors and data frames in R.

Ans Factors are the data objects which are used to categorize the data and store it as levels. They
can store both strings and integers. They are useful in the columns which have a limited
number of unique values. Like "Male, "Female" and True, False etc.
They are useful in data analysis for statistical modelling.
Factors are created using the factor () function by taking a vector as input.
Example:
data=c("East", "West", "south", "North")
factdata=factor(data)
print(factdata)
A data frame is a table or a two-dimensional array-like structure in which each
column contains values of one variable and each row contains one set of values
from each column.
Data frames are created in R as follows:
emp.data=data.frame(emp_id=c (1:3),emp_name=
c("Rick","Dan","Michelle"),salary = c(1000,1500,2000))
print(emp.data)
Question 2
a. Obtain Karl Pearson’s coefficient of skewness-
Ans Karl Pearson’s coefficient of skewness

3(𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛)
𝑆�𝑘� =
𝜎
Where ,
∑ fX
Mean= ∑ f = 22
Median= Middle value=22.7
𝜎 = std deviation od data= 0.529
Therefore, putting values in the formula,

Sk= - 1.4
Conclusion- The distribution is negatively skewed.
b. Define-
i) Raw moments ii) Central Moments
Ans Central Moments- Arithmetic mean of various powers of deviations taken from mean 𝑋̅ .
∑(𝑋 − 𝑋̅)𝑟
𝜇𝑟 =
𝑁
∑ 𝑓(𝑋 − 𝑋̅)𝑟
𝜇𝑟 =
∑𝑓
Raw Moments- Arithmetic mean of various powers of deviations taken from the origin.
∑ 𝑋𝑟
𝜇𝑟′ =
𝑁
′
∑ 𝑓𝑋 𝑟
𝜇𝑟 =
∑𝑓
Relation between Raw and Central Moments-
𝜇1 =0
𝜇2 = 𝜇2′ − ( 𝜇1′ )2
𝜇3 = 𝜇3′ − 3 𝜇1′ 𝜇2′ + 2( 𝜇1′ )3
𝜇4 = 𝜇4′ − 4 𝜇1′ 𝜇3′ + 6( 𝜇1′ )2 𝜇2′ − 3( 𝜇1′ )4
c. If a pair of dice is thrown and X denotes sum of numbers on them. Find PDF of X. Also
find expectation of X.
Ans Let X- denote sum of the numbers appearing on dice.
X can take values starting from 2-12.
Total sample space will have 36 combinations.
So PDF of X-
• Mean of X – E(X)
𝐸(𝑋) = ∑ 𝑥 𝑃(𝑥)
𝐸(𝑋) = 7
d. A random variable X has following PDF.

X 0 1 2 3 4 5 6 7
P(X) 0 K 2k 3k 3k 𝑘2 2𝑘 2 7𝑘 2
Find (i) k (ii) P (X < 6) (iii) P (X > = 6) (iv) P( 0 < X < 4)

Ans
e. What is kurtosis? Explain types and measures of Kurtosis.

Ans Kurtosis-
• The degree of “peakedness/flatness” of a distribution is known as Kurtosis.
Usually it is taken compared to the normal distribution.
• Leptokurtic- High peak- Lepto- Skinny peak- +ve kurtosis
• Mesokurtic- Normal distribution curve- zero kurtosis
• Platykurtic- Flat-topped- platy-broad peak- -ve kurtosis
𝜇
Coefficient of Kurtosis= 𝜎44 - 3
f. Define skewness. Compute coefficient of Skewness for 2, 3, 5, 7, 4, 8, 1

Ans Skewness is a degree of asymmetry/ lack of symmetry/ departure from symmetry.
Concerned with the shape of the curve .
Two types-
1. Positive Skewness
2. Negative Skewness
1. Positive Skewness –
• means when the tail on the right side of the distribution is longer.
• the mass of the distribution is concentrated on the left of the figure
• The mean and median will be greater than the mode.
2. Negative Skewness-
• is when the tail of the left side of the distribution is longer.
• the mass of the distribution is concentrated on the right of the figure
• The mean and median will be less than the mode.
N= 7

Mean = ∑ X = 30
∑ X 2 = 168
s.d= 𝜎 = 2.3734
Median= 4
3(𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛)
𝑆𝑘 =
𝜎
𝑆𝑘 = 0.3611
Question 3
a. A random sample of 100 balls gave 10 % defective balls. Find 99 % confidence limits
for the percentage of defective balls. In consignment.
Ans Given-
P= 10/100= 0.1
q = 1-p
q= 0.9
N= 100
To find 99% confidence limits, critical value of Z is- 𝒁𝒄 = 2.58
𝑝𝑞
𝜎𝑝 = √ 𝑁 = 0.03
99 % confidence limits:
𝑷 ± 𝒁𝒄 𝝈𝒑
= (0.1 -2.58* 0.03), (0.1 + 2.58* 0.03)
(0.0226, 0.1774)
b. Explain with example in build functions available in R to generate normal distribution.

Ans
In a random collection of data from independent sources, it is generally observed that
the distribution of data is normal. Which means, on plotting a graph with the value of
the variable in the horizontal axis and the count of the values in the vertical axis we get
a bell shape curve. The center of the curve represents the mean of the data set. In the
graph, fifty percent of values lie to the left of the mean and the other fifty percent lie to
the right of the graph. This is referred as normal distribution in statistics.
R has four in built functions to generate normal distribution. They are described below.
dnorm(x, mean, sd)
pnorm(x, mean, sd)
qnorm(p, mean, sd)
rnorm(n, mean, sd)
Following is the description of the parameters used in above functions −
• x is a vector of numbers.
• p is a vector of probabilities.
• n is number of observations(sample size).
• mean is the mean value of the sample data. It's default value is zero.
• sd is the standard deviation. It's default value is 1.
c. For a 100 sample, 35 are working as professors. Construct 95% confidence interval for
probability intervals.
Ans Given-
P= 35/100= 0.35
q = 1-p
q= 0.65

N= 100
To find 99% confidence limits, critical value of Z is- 𝒁𝒄 = 1.96
𝑝𝑞
𝜎𝑝 = √ 𝑁 = 0.047
99 % confidence limits:
𝑷 ± 𝒁𝒄 𝝈𝒑
= (0.35 -1.96* 0.047), (0.35 +1.96* 0.047)
(0.304, 0.442)
d. What is hypothesis test? Explain.

Ans Statistical Hypothesis:
Any assumption or guess or statement regarding the population is called as a
‘Statistical Hypothesis’. The statistical hypothesis may or may not be true. Generally
the statistical hypothesis is about probability distribution of the population.
Null Hypothesis:
A statistical hypothesis which is formulated for the purpose of rejecting or nullifying it
is called as ‘A null hypothesis’. A null hypothesis is denoted by 𝐻0(H not). Generally
𝐻0 is based on some past data or record.
Example:
If we want to decide whether a given coin is loaded, we formulate the hypothesis that
the coin is fair (i.e.,𝑝=0.5, where p is the probability of heads).
Similarly, if we want to decide whether one procedure is better than another, we
formulate the hypothesis that there is no difference between the procedures (i.e., any
observed differences are due merely to fluctuations in sampling from the same
population). Such hypotheses are often called null hypotheses and are denoted by 𝐻0
Alternate Hypothesis:
Any hypothesis which differs from the given null hypothesis is called as an ‘alternate
hypothesis’. An alternate hypothesis is denoted by 𝐻1(H one).
Example:
If one hypothesis is 𝑝=0.5, alternative hypotheses might be 𝑝=0.7 𝑜𝑟 𝑝≠0.5
Test of Hypothesis:
A procedure which enables us to determine whether observed sample differs
significantly from results expected and thus help us to decide whether to accept or
reject hypothesis is called a ‘Test of Hypothesis’ or ‘Decision Rule’.
Type-I Error and Type-II Error:
When we reject a hypothesis when it should be accepted, we will say that Type-I error
has been made.
When we accept a hypothesis when it should be rejected we say that Type-II error has
been made.
In order for decision rules (or tests of hypotheses) to be good, they must be designed so
as to minimize errors of decision. This is not a simple matter, because for any given
sample size, an attempt to decrease one type of error is generally accompanied by an
increase in the other type of error. In practice, one type of error may be more serious
than the other, and so a compromise should be reached in favour of limiting the more
serious error. The only way to reduce both types of error is to increase the sample size,
which may or may not be possible.
e. A certain coin is showed up head 270 occasions in 500 tosses. Test the claim that coin
is unbiased.
Ans Population Sample
p=0.5 P = 270/500= 0.54
q= 0.5 Q= 0.46
N= 500
Ho: Coin is unbiased
H1: Coin is biased
Z score-

Z= 1.78
Using two tailed test for LOS 1%
Critical value of Z is 2.58.
Z < 2.58
Hence, Accept H0.
Coin is unbiased.
f. A car manufacturer claims that 40 % of all cars build will be still running after 10 years.
A random sample of 400 cars showed 150 cars will still run after 10 years. Test claim
for 1% LOS.
p=0.4 P = 150/400= 0.375
q= 0.6 N = 400
Ho: p= 0.4
H1: p is not 0.4
Z score-
Z= - 1.02
Using two tailed test for LOS 1%
Critical value of Z is 2.58.
Z < 2.58
Hence, Accept H0.
Question 4
a. Test the significance of the following rank correlation coefficient at 5% level-
R = 0.139, n = 10
Ans At 5% level of significance-
The hypotheses are defined as

𝐻0: 𝑅 = 0
𝐻1: 𝑅 ≠ 0
And with 𝑛 = 10 the tables show that

𝑃(|𝑅| > 0.6485) = 0.5
So for two tailed test we should accept 𝐻0 since here 𝑅 = 0.139 which lies in between
−0.6485 𝑡𝑜 0.6485
Therefore,
We can say that there is no significant correlation at 5% level.
b. 20 % of apples in large consignment are found to be bad for probability that at least 25%
appples are bad in sample size of 400 drawn from it.
p=20/100= 0.2 N= 400
q= 1-0.2= 0.8 P= 0.25
Z score for proportion-

Z= 2.5
c. 20 samples of size 100 are selected. Find the expected no of samples that will have at
least 14 defective blades if total consignment have 10 % defective.
Ans N= 100
P= 0.1
Poison’s distribution-
e-m mx
P(X) =
X!
Where m= np
m= 100*0.1= 10
P(X = at least 14)= P(14)+P(15)+ P(16)+ P(17)+ P(18)+ P(19)+ P(20)

P(X = at least 14)= 0.052+ 0.0347+0.0216+0.0127+0.0071+0.0037+0.0018
P(X = at least 14)= 0.13367
d. How to calculate contingency coefficient?

Ans
e. It is known that 30 % male adults are unmarried. A sample of 100 male adults is selected.
Find the chance that-
a. 25-32 %
b. At most 33 % unmarried adults
Ans N= 100
P= 0.3
m=nP= 30
Poison’s distribution-
e-m mx
P(X) =
X!
a) P(25-32%)= P(25) + P(26) + P(27) + P(28) + P(29) + P(30) + P(31) + P(32) = 0.45
b) P (At most 33)= 0.765
f.

Ans
Question 5
a. Explain-
i. coefficient of correlation ii. Std. error of estimate
Ans
b. Fit a parabola for given data-
Ans

c.
Ans
d.
Ans

e.
Ans
f.
Ans

BSc IT Computer Techniques Stats Solutions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BSc IT Computer Techniques Stats Solutions

Uploaded by

Copyright:

Available Formats

BSc.

Computer Oriented Statistical

Ms. Maitreyi Joglekar Page 1

Ans Class (X- X bar) F dev

c. Calculate semi interquartile range of following-

Ms. Maitreyi Joglekar Page 2

Semi-interquartile range= (Q3-Q1)/2

Ms. Maitreyi Joglekar Page 3

f. Define factors and data frames in R.

a. Obtain Karl Pearson’s coefficient of skewness-

Ans Karl Pearson’s coefficient of skewness

𝜎 = std deviation od data= 0.529

Therefore, putting values in the formula,

Ms. Maitreyi Joglekar Page 4

Conclusion- The distribution is negatively skewed.

Relation between Raw and Central Moments-

d. A random variable X has following PDF.

Ms. Maitreyi Joglekar Page 5

e. What is kurtosis? Explain types and measures of Kurtosis.

f. Define skewness. Compute coefficient of Skewness for 2, 3, 5, 7, 4, 8, 1

Ms. Maitreyi Joglekar Page 6

b. Explain with example in build functions available in R to generate normal distribution.

Ms. Maitreyi Joglekar Page 7

d. What is hypothesis test? Explain.

Ms. Maitreyi Joglekar Page 8

The hypotheses are defined as

And with 𝑛 = 10 the tables show that

Z score for proportion-

Ms. Maitreyi Joglekar Page 9

P(X = at least 14)= P(14)+P(15)+ P(16)+ P(17)+ P(18)+ P(19)+ P(20)

P(X = at least 14)= 0.13367

d. How to calculate contingency coefficient?

Ms. Maitreyi Joglekar Page 10

b. Fit a parabola for given data-

Ms. Maitreyi Joglekar Page 11

Ms. Maitreyi Joglekar Page 12

Ms. Maitreyi Joglekar Page 13

You might also like