You are on page 1of 9

Q1) Calculate Skewness, Kurtosis & draw inferences on the following data

a. Cars speed
Solution:
i) Skewness = -0.117
Q1_a.speed.mean()
Q1_a.speed.median()
Q1_a.speed.mode()

from scipy import stats


stats.mode(Q1_a.speed)

Q1_a.speed.var()
Q1_a.speed.std()
range = max(Q1_a.speed) - min(Q1_a.speed)
range

Q1_a.speed.skew()

ii) Kurtosis: -0.508


Q1_a.speed.kurt()

b. Distance
Solution:
iii) Skewness = 0.806
Q1_a.dist.mean()
Q1_a.dist.median()
Q1_a.dist.mode()

from scipy import stats


stats.mode(Q1_a.dist)

Q1_a.dist.var()
Q1_a.dist.std()
range = max(Q1_a.dist) - min(Q1_a.dist)
range
Q1_a.dist.skew()

iv) Kurtosis: 0.405


Q1_a.dist.kurt()
Skewness for speed= -0.117, skewness value is (-) so it is left skewed. Since
magnitude is slightly greater than 0 it is slightly left skewed. Furthermore, for
distance= 0.806, right skewed(+) slight magnitude to right.
b. Top Speed (SP) and Weight (WT)

Solution:
SP: Skewness = 1.611 Kurtosis = 2.977
WT: Skewness = -0.614 Kurtosis = 0.950
Q2) Draw inferences about the following boxplot & histogram

Solution: We shall say this is right skewed since most of the data gathered at right
with long tail. The most of the data points are concentrated in the range 50-100
with frequency 200. And least range of weight is 400 somewhere around 0-10. So
the expected value the above distribution is 75.
Solution: There are outlier on the upper side of box plot. If the distribution of data
is skewed to the right, the mode is often less than the median, which is less than
the mean and there is less data points between Q1 and bottom point.

Q3) Suppose we want to estimate the average weight of an adult male in


Mexico. We draw a random sample of 2,000 men from a population of
3,000,000 men and weigh them. We find that the average person in our
sample weighs 200 pounds, and the standard deviation of the sample is 30
pounds. Calculate 94%,98%,96% confidence interval?

Solution: SE = s/n = 30/2000


Degrees of freedom= 2000-1= 1999
Confidence interval= 94%
The critical value is the t score having 1999 degrees of freedom and a probability
equal to 0.94. From the t chart, we find that the critical value is 1.882.

(1- σ/2)= 1-0.03) =0.97


Confidence interval for 94% is 1.882
Confidence interval for 98%= 2.33
Confidence interval for 96% = 2.05
Q4) Below are the scores obtained by a student in tests

34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56
1) Find mean, median, variance, standard deviation.
2) What can we say about the student marks?
Solution:
Mean= 41 Median= 40.5
variance= 25.529 Standard deviation= 5.052
Q5) What is the nature of skewness when mean, median of data are equal?
Solution: Symmetrical
Q6) What is the nature of skewness when mean > median?
Solution: Right skewed
Q7) What is the nature of skewness when median > mean?
Solution: Left skewed
Q8) What does positive kurtosis value indicates for a data?
Solution: indicated that the distribution has heavier tails than the normal
distribution.
Q9) What does negative kurtosis value indicates for a data?
Solution: The distribution of the data has lighter tails and a flatter peaks than the
normal distribution.
Q10) Answer the below questions using the below boxplot visualization.

What can we say about the distribution of the data?


Solution: Data is distributed in de assigned format. For example, a group of
tuition student in Center A, Those student who age above 15 years old are
approximately 40% and the remaining are less.
What is nature of skewness of the data?
Solution: Left skewed, median is greater than mean
What will be the IQR of the data (approximately)?
Solution: 18 -10 = 8(IQR)
Q11) Comment on the below Boxplot visualizations?

Draw an Inference from the distribution of data for Boxplot 1 with respect
Boxplot 2.
Solution: By observing the both plots, the whisker level is high for boxplot 2, mean
and median are equal with different distribution. It is symmetrical. The plot 1 is
designed with range 3 while the plot 2 is range is 1.5

Q12)

Answer the following three questions based on the boxplot above.


(i) What is inter-quartile range of this dataset? (please approximate the
numbers) In one line, explain what this value implies.
Solution: IQR: 12 – 5 = 7. Means, the mean is more than median.
(ii) What can we say about the skewness of this dataset?
Solution: Right skewed.
(iii) If it was found that the data point with the value 25 is actually 2.5,
how would the new boxplot be affected?
Solution: 3

Q13)

Answer the following three questions based on the histogram above.


(i) Where would the mode of this dataset lie?
Solution: The mode lie on the 7, on the x axis of the histogram.
(ii) Comment on the skewness of the dataset.
Solution: Right skewed
(iii) Suppose that the above histogram and the boxplot in question 2 are
plotted for the same dataset. Explain how these graphs complement
each other in providing information about any dataset.
Solution: Histograms and box plots are very similar in that they both help to
visualize and describe numeric data. Although histograms are better in
determining the underlying distribution of the data, box plots allow you to
compare multiple data sets better than histograms as they are less detailed
and take up less space.

You might also like