You are on page 1of 15

Activity Data Type

Number of beatings from Wife Discrete


Results of rolling a dice Discrete
Weight of a person Continuous
Weight of Gold Continuous
Distance between two places Continuous
Length of a leaf Continuous
Dog's weight Continuous
Blue Color Discrete
Number of kids Discrete
Number of tickets in Indian railways Discrete
Number of times married Discrete
Gender (Male or Female) Discrete
Q1) Identify the Data type for the Following:

Q2) Identify the Data types, which were among the following
Nominal, Ordinal, Interval, Ratio.
Data Data Type
Gender Nominal
High School Class Ranking Ordinal
Celsius Temperature Interval
Weight Ratio
Hair Color Nominal
Socioeconomic Status ordinal
Fahrenheit Temperature Interval
Height Ratio
Type of living accommodation ordinal
Level of Agreement ordinal
IQ(Intelligence Scale) Interval
Sales Figures Ratio
Blood Group Nominal
Time Of Day ordinal
Time on a Clock with Hands Interval
Number of Children Ratio
Religious Preference Nominal
Barometer Pressure Ratio
SAT Scores Ratio
Years of Education Interval

Q3) Three Coins are tossed, find the probability that two heads and one tail are
obtained?
Sol: n(E) = {HHT,HTH,THH}
n(s) = 2³
P(E)= n(E)/n(s)
P(E)= 3/8 = 0.375
Q4) Two Dice are rolled, find the probability that sum is
a) Equal to 1
b) Less than or equal to 4
c) Sum is divisible by 2 and 3

Sol: a) n(E) = 0 n(s)= 36


P(E) = 0
b) n(E)= {(1,3), (2,2), (3,1)}
p(E)= 3/36= 0.833
c) n(E) = { (1,5),(2,4),(3,3),(4,2),(5,1),(6,6)}
P(E)= 1/6 = 0.166

Q5) A bag contains 2 red, 3 green and 2 blue balls. Two balls are drawn at
random. What is the probability that none of the balls drawn is blue?

Sol: Probability that one of the two balls drawn in Blue = n(E) = (2+3)C2 = 5C2
=5*4/2*1 = 10
All possible outcomes = n(S) = 7C2 = 7*6/2 =21
P(E) = 10/21

Q6) Calculate the Expected number of candies for a randomly selected child
Below are the probabilities of count of candies for children (ignoring the nature of
the child-Generalized view)
CHILD Candies count Probability
A 1 0.015
B 4 0.20
C 3 0.65
D 5 0.005
E 6 0.01
F 2 0.120
Child A – probability of having 1 candy = 0.015.
Child B – probability of having 4 candies = 0.20
Sol: E(x) = ∑ x . P(x)
E (x) = 1*0.015 + 4*0.20 + 3*0.65 + 5*0.005 + 6*0.01 + 2*0.120
= 3.09

Q7) Calculate Mean, Median, Mode, Variance, Standard Deviation, Range &
comment about the values / draw inferences, for the given dataset
- For Points,Score,Weigh>
Find Mean, Median, Mode, Variance, Standard Deviation, and Range
and also Comment about the values/ Draw some inferences.
Use Q7.csv file
Sol: Points dataset:
Mean = 3.596
Median = 3.695
Mode = 3.92 , 3.07
N
1
Variance =
N
∑ ( x i−x́ )2
i=1

= 0.285
N


Standard deviation = 1 ∑ ( x i− x́ )2
N i=1

= 0.534

Range = 2.17
Median is greater than Mean . The data is left skewed
Score dataset:
Mean = 3.217
Median = 3.325

Mode = 3.44
N
1
Variance =
N
∑ ( x i−x́ )2
i=1

= 0.957
N


Standard Deviation = 1 ∑ ( x i− x́ )2
N i=1

= 0.978

Range = 3.91

Median is greater than Mean . The data is left skewed

Weigh Dataset:
Mean = 17.84
Median = 17.71
Mode = 17.02 , 18.9
N
1
Variance =
N
∑ ( x i−x́ ) 2= 3. 193
i=1

N
Standard Deviation =
√ 1

N i=1
(
2
x i− x́ ) = 1.786

Range = 8.4

Mean is greater than Median. The data is right skewed

Q8) Calculate Expected Value for the problem below


a) The weights (X) of patients at a clinic (in pounds), are
108, 110, 123, 134, 135, 145, 167, 187, 199
Assume one of the patients is chosen at random. What is the Expected
Value of the Weight of that patient?
Sol : E ( x )=∑ x ⋅ p ( x )
P(x) = 1/9
1 1 1 1 1 1 1 1 1
E(x) = 9 ⋅108+ 9 ⋅110+ 9 ⋅123+ 9 ⋅134+ 9 ⋅ 135+ 9 ⋅ 145+ 9 .167+ 9 ⋅187 + 9 . 199

= 145.33

Q9) Calculate Skewness, Kurtosis & draw inferences on the following data
Cars speed and distance
Use Q9_a.csv

SP and Weight(WT)
Use Q9_b.csv
Sol:
Cars speed and distance :
Speed:
N
1 3
Skewness = 3∑( i
x −μ )
N σ i=1

= -0.11395477012828319

N
1 4
Kurtosis = 4∑( i
x −μ )
N σ i =1

= -0.5771474239437371
Distance:
N
1 3
Skewness = 3∑( i
x −μ )
N σ i=1

= 0.7824835173114966

N
1 4
Kurtosis = 4∑( i
x −μ )
N σ i =1

= 0.24801865717051808

SP and WT:

SP:
N
1 3
Skewness = 3∑( i
x −μ )
N σ i=1

= 1.581453679442373
N
1 4
Kurtosis = 4∑( i
x −μ )
N σ i =1

= 2.7235214865269173

WT:
N
1 3
Skewness = 3∑( i
x −μ )
N σ i=1

= - 0.6033099322115126

N
1 4
Kurtosis = 4∑( i
x −μ )
N σ i =1

= 0.8194658792266849

Q10) Draw inferences about the following boxplot & histogram


A: The most of the data points are in the range 50-100 with frequency 200.

least range of weight is 400 with a frequency around 0-5

Since the right tails is larger, the Histogram is right-skewed

A: Since distance between Q3 to max is larger than min to Q1 and the upper side
has outliers,The box plot is right-skewed

Q11) Suppose we want to estimate the average weight of an adult male in


Mexico. We draw a random sample of 2,000 men from a population of
3,000,000 men and weigh them. We find that the average person in our
sample weighs 200 pounds, and the standard deviation of the sample is 30
pounds. Calculate 94%,98%,96% confidence interval?

Sol: n =2000
x́=200

σpop =30

Range Estimate = 200 + Δ

For confidence level 94%:


88∗30
Range Estimate = 200 + −1 ⋅ 2 2000

= 200 + 1.261

Confidence Interval = [198.739 , 201.261]

For confidence level 98%:


30
Range Estimate = 200 + -2.33 * 2 2000

= 200 + 1.562

Confidence Interval = [198.438 , 201.562]

For confidence level 96%:


30
Range Estimate = 200 + -2.05 * 2 2000

= 200 + 1.3751

Confidence Interval = [198.625 , 201.375]


Q12) Below are the scores obtained by a student in tests

34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56
1) Find mean, median, variance, standard deviation.
2) What can we say about the student marks?
Sol: Mean = 738/18
= 41
Median = 81/2 = 40.5
Mode = 41
N
1
Variance=
N
∑ ( x i−x́ )2 = 434/18
i=1

= 24.111
N


Standard deviation = 1 ∑ ( x i− x́ )2
N i=1

= 4.910
Q13) What is the nature of skewness when mean, median of data are equal?
A: Distribution is symmetric and have zero skewness
Q14) What is the nature of skewness when mean > median ?
A: right skewed
Q15) What is the nature of skewness when median > mean?
A: Left skewed
Q16) What does positive kurtosis value indicates for a data ?
A: Leptokurtic . peak is high and has longer tail
Q17) What does negative kurtosis value indicates for a data?
A: platykurtic . peak is low and has shorter tails.
Q18) Answer the below questions using the below boxplot visualization.

What can we say about the distribution of the data?


A: let’s assume box plot represents marks received by students in a class.
75% of the students score above 10 and remaining 25% score less.
And students who’s score is above 15 are approx 40%.
What is nature of skewness of the data?
A: Left skewed, median is greater than mean
What will be the IQR of the data (approximately)?
A: Q1 = 10 Q2 = 15.5 Q3 = 18
IQR = Q3-Q1 = 8(approx.)

Q19) Comment on the below Boxplot visualizations?

Draw an Inference from the distribution of data for Boxplot 1 with respect
Boxplot 2.
A: By observing both the plots whisker’s level is high in boxplot 2
more umber of datapoints are there in boxplot 2 than 1.
mean and median are equal hence distribution is symetrical.

Q 20) Calculate probability from the given dataset for the below cases

Data _set: Cars.csv


Calculate the probability of MPG of Cars for the below cases.
MPG <- Cars$MPG
a. P(MPG>38)
b. P(MPG<40)
c. P (20<MPG<50)
sol:
a) P(MPG>38):
μ=34 ⋅422 σ =9.074
x−μ
Z38 = σ
= 0.3943
Zscore = 0.6517
P(MPG > 38) = 1 – 0.6517 = 0.3483
Probability of p(MPG > 38) is 34.83%

b) P(MPG < 40):


Z40 = 0.6147
Zx = 0.7291
P(MPG < 40) = 72.91%
c) P(20 < MPG < 50):
Z20 = -1.5893 P(MPG <20) = 0.0571
Z50 = 1.7167 P(MPG < 50) = 0.9564
Total probability = P(MPG<50) – P(MPG<20)= 0.95 – 0.05 =0.90
P(20 < MPG < 50) = 90%

Q 21) Check whether the data follows normal distribution


a) Check whether the MPG of Cars follows Normal Distribution
Dataset: Cars.csv
Sol: MPG:
Mean = 34.422
Median = 35.152
Skew = -0.174
Kurtosis = -0.647

Mean is almost equal to Median and Skewness is near to 0.


MPG data follows normal distribution.

b) Check Whether the Adipose Tissue (AT) and Waist Circumference(Waist)


from wc-at data set follows Normal Distribution
Dataset: wc-at.csv

Sol: AT:

Mean = 101.89403669724771
Median = 96.54
Skew = 0.576
Kurtosis = -0.327

Mean is greater than Median, so the data is Right skewed.

Waist:
Mean = 91.90183486238531
Median = 90.8
Skew = 0.1322041763592883
Kurtosis = -1.1072764806858817

Mean is almost equal to Median and Skewness is near to 0.


Waist data follows normal distribution.

Q 22) Calculate the Z scores of 90% confidence interval,94% confidence


interval, 60% confidence interval

Sol: 90% confidence Interval :


1-0.90/2 = 0.10/2=0.05
Zscore = -1.64
94% confidence interval :
1-0.94 /2= 0.06/2=0.03
Zscore = -1.88
60% confidence interval :
1-0.60/2 = 0.40/2 =
Zscore = -0.84
Q 23) Calculate the t scores of 95% confidence interval, 96% confidence
interval, 99% confidence interval for sample size of 25
Sol: 95% confidence interval:
1-0.95/2 = 0.05/2 = 0.025
T score = 1.710

96% confidence interval:


T score = 1.828

99% confidence interval:


T score = 2.492

Q 24) A Government company claims that an average light bulb lasts 270
days. A researcher randomly selects 18 bulbs for testing. The sampled bulbs
last an average of 260 days, with a standard deviation of 90 days. If the
CEO's claim were true, what is the probability that 18 randomly selected
bulbs would have an average life of no more than 260 days

Hint:

rcode  pt(tscore,df)

df  degrees of freedom

sol: n=18
x́=260

σ =90

μ=270

x́−μ
t=
σ
√n
260−270
t=
90
√18

−10× √ 18 √18
= 90
=
9
=¿ -0. 471

P(x<260) = 0.32182

= The probability that t < - 0. 471 with 17 degrees of


freedom assuming the population mean is true, the probability of the bulbs
lasting less than 260 days on average of  0.3218(one-tail).

The probability that 18 randomly selected bulbs would have an average life
of no more than 260 days is 0.3218.

You might also like