Professional Documents
Culture Documents
OBE Remarks:
Q.No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
CO No. CO4 CO4 CO5 CO5 CO4 CO4 CO5 CO5
Section-A
A. Attempt all the questions. (10x2 =20)
1
to the right; represented by a positive value) or negatively skewed (skew to the left; longer tail to the left;
with a negative value).
Kurtosis - measures how peaked a distribution is and the lightness or heaviness of the tails of the
distribution. In other words, how much of the distribution is actually located in the tails? A normal
distribution has a kurtosis value of zero (0) and is said to be mesokurtic. A positive kurtosis value means
that the tails are heavier than a normal distribution and the distribution is said to be leptokurtic (with a
higher, more acute "peak"). A negative kurtosis value means that the tails are lighter than a normal
distribution and the distribution is said to be platykurtic (with a smaller, flatter "peak").
2
Median – It is that value of the variable which divides the group into two equal parts, one
part comprising all values greater, and the other part having lesser value than median.
Deciles are those values that divide any set of a given observation into a total of ten equal
parts. Therefore, there are a total of nine deciles. These representation of these deciles are as
follows – D1, D2, D3, D4, ……… D9.
Apercentile basically divides any given observation into a total of 100 equal parts. The
representation of these percentiles are given as – P1, P2, P3, P4, ……… P99.A quartile is a
type of quartile. The first quartile (Q1) is defined as the middle number between the smallest
number and the median of the data set. The second quartile (Q2) is the median of the data.
The third quartile (Q3) is the middle value between the median and the highest value of the
data set.
Partition values Division Notation
Median 2 Med
Quartiles 4 Q1 to Q3
Deciles 10 D1 to D9
Percentiles 100 P1 to P99
3
The result of this addition theorem generally written using Set notation,
P (A ∪ B) = P(A) + P(B) – P(A ∩ B),
Multiplication theorem
Multiplication law in probability applies to combination of events. When the events have to occur
together then we make use of the multiplication law of probability. Now two cases arise: whether
the events are independent or dependent.
Multiplication or Conditional Probability
The probability of an event B when it is known that the event A has occurred already:
P(B/A)= P(A∩B) / P(A) ; if P(A)>0
ie. P(A∩B)= P(A).P(B/A)
If A and B are Independent event:
P(A∩B)= P(A).P(B)
Accept H0 Reject H0
Type I error is represented when null hypothesis is rejected when it is true. Type II error
is represented when null hypothesis is accepted when it is false. Type I error is
represented as α and Type II error is represented as ß.
(1-ß) is called power of the test.
4
Section-B
11. In a hotel total of 500 bulbs were installed simultaneously and their failure over time was
observed as given below. You are required to calculate mean life of bulbs:
End of Week 1 2 3 4 5 6 7
No failure 12 40 108 242 346 428 500
Sol.
End of No Mid
Week failure CI F Point FX
1 12 0-1 12 0.5 6
2 40 1-2 28 1.5 42
3 108 2-3 68 2.5 170
4 242 3-4 134 3.5 469
5 346 4-5 104 4.5 468
6 428 5-6 82 5.5 451
7 500 6-7 72 6.5 468
500 2074
12. Given the figures of production (in thousand tones) of a fertilizer factory:
Year 2002 2003 2004 2005 2006 2007
Production 10 12 15 16 18 19
Fit a straight line trend by least square method and estimate the trend for 2008. Also find monthly
rate of growth.
Sol.
X Y x = 2(X-2004.5) x2 xY
2002 10 -5 25 -50
2003 12 -3 9 -36
2004 15 -1 1 -15
2005 16 1 1 16
2006 18 3 9 54
2007 19 5 25 95
Total 90 0 70 64
Using method of least square
Trend Equation: Y= a + bx
∑Y=Na+b∑x
∑ xY = a ∑ x + b ∑ x2
Using these normal equations
a = 15, b = 0.91 then Y = 15 + 0.91 x
For the year 2008, Y=21.39
Monthly rate of growth
5
13. “Our managers can improve managerial decisions to a great extent, if they are adequately
familiar with the basic tools of statistics.” Explain and illustrate.
14. The height (in cm) and weight (in Kg) of 10 basketball players of a team are:
Player 1 2 3 4 5 6 7 8 9 10
Height (X) 186 189 190 192 193 193 198 201 203 205
Weight (Y) 85 85 86 90 87 91 93 103 100 101
Calculate:
a) The coefficient of correlation between X and Y.
b) The regression line of Y on X
c) The estimated weight of a player whose height measure as 208 cm.
6
Sol.
X Y X2 Y2 XY
186 85 34596 7225 15810
189 85 35721 7225 16065
190 86 36100 7396 16340
192 90 36864 8100 17280
193 87 37249 7569 16791
193 91 37249 8281 17563
198 93 39204 8649 18414
201 103 40401 10609 20703
203 100 41209 10000 20300
205 101 42025 10201 20705
1950 921 380618 85255 179971
∑X ∑Y ∑X2 ∑Y2 ∑XY
Coefficient of
Correlation (r) 0.94
x on y bxy Slope 0.87 intercept 114.63
y on x byx Slope 1.02 intercept -107.14
The estimated weight of a player whose height measure as 208 cm would be 105.3 kg
Regression line of Y on X
Y- 92.1 = 1.021 (X – 195)
Estimating y for x = 208, put x=208 in above equation.
Y=105.3 Kg
15. Discuss the concept of Business analytics with its meaning, types and applications in various
functions of management.
Sol. Business Analytics is the use of data, information technology, statistical analysis, quantitative
methods, and mathematical or computer-based models to help managers gain improved insight about
their business operations and make better, fact-based decisions.
7
care industry, you can better manage the patient population by using prescriptive analytics to
measure the number of patients who are clinically obese, then add filters for factors like diabetes and
LDL cholesterol prescriptive model can be applied to almost any industry target group or problem.
Predictive analytics use big data to identify past patterns to predict the future. For example, some
companies are using predictive analytics for sales lead scoring. Some companies have gone one step
further use predictive analytics for the entire sales process, analyzing lead source, number of
communications, types of communications, social media, documents, CRM data, predictive analytics
can be used to support sales, marketing, or for other types of complex forecasts.
Diagnostic analytics are used for discovery or to determine why something happened. For example,
for a social media marketing campaign posts, mentions, followers, fans, page views, reviews, pins,
etc. There can be thousands of online mentions that can be distilled into a single view to see what
worked in your past campaigns and didn’t.
Descriptive analytics or data mining are at the bottom of the big data value chain, but they can be
valuable for uncovering patterns that offer insight. A simple example of descriptive analytics would
be assessing credit risk; using past financial performance. Descriptive analytics can be useful in the
sales cycle, for example, to categorize customers by their likely product preferences and sales cycle.
Finance
It is of utmost importance to the finance sector. Data Scientists are in high demand in investment
banking, portfolio management, financial planning, budgeting, forecasting, etc.
Marketing
Studying buying patterns of consumer behaviour, analysing trends, help in identifying the target
audience, employing advertising techniques that can appeal to the consumers, forecast supply
requirements, etc.
HR Professional
HR professionals can make use of data to find information about educational background of high
performing candidates, employee attrition rate, number of years of service of employees, age,
gender, etc. This information can play a pivotal role in the selection procedure of a candidate.
CRM
It helps one analyse the key performance indicators, which further helps in decision making and
make strategies to boost the relationship with the consumers. The demographics, and data about
8
other socio-economic factors, purchasing patterns, lifestyle, etc., are of prime importance to the
data available.
Manufacturing
It can help us in supply chain management, inventory management, measure performance of
targets, risk mitigation plans, improve efficiency in the basis of product data, etc.
Credit Card Companies Credit card transactions of a customer can determine many factors:
financial health, life style, preferences of purchases, behavioral trends, etc.
Section-C
C. Attempt all the questions. (5x10 = 50)
16.Attempt anyone.
a.The local authorities in a certain city install 10,000 electric lamps in the street of the city. If the average
life of 1000 burning hours with a SD of 200 hours, what number of lamps might be expected to fail
i) In first 800 burning hours
ii) Between 800 and 1200 burning hours
iii) After 1400 burning hours
Given the area under the standard normal curve between z = 0 to z:
Z 0.5 1.0 1.5 2.0
Area 0.1915 0.3413 0.4332 0.4772
9
P (X > 1400)
0.5 - P (0 < X < 1400) = 0.5 - P (0 < Z < 2) = 0.5 – 0.4772
The required probability is 0.0228
Number of bulbs after 1400 burning hours = 228
b. Define the probability distribution. Explain the salient features of Binomial, Poisson and Normal
distribution.
Sol.
Basis Binomial Distribution Poisson Distribution
x n-x
Formula p (x) = n C x p q p (x)= e-λ X=0,1,2…
λx / x!
where λ =np
Property Statistical independence Statistical independence
Dichotomy Dichotomy
Constant Probability Constant Probability
Identical Condition Identical Condition
Parameter n, p λ
Mean P λ
Variance Npq λ
Special Condition ---- When n is large and p is small
10
day’s production and is found to be defective. What is the probability that it was produced by machine
M2?
Sol. Using Bayes Theorem
Required 0.012/0.031
Probability = = 0.384
a) A drug is said to be useful for treatment of cold. In an experiment carried out on 160 persons suffering
from cold, half of the persons were treated with drug and rest of half with sugar pills. The effect of
treatment is described in the following table:
Helped Harmful No Effect
Drug 52 10 18
Sugar Pills 44 10 26
Test the effectiveness of the drug.
[For 2 df the value of chi-square is 5.99 at 5% level of significance]
Sol.
11
H0: There is no significant difference between drug and its effect.
H1: There is a significant difference between drug and its effect.
b) Define null hypothesis, alternate hypothesis, critical region and two sided test, used in testing of
statistical hypothesis.
Sol.
Null Hypothesis: In statistical inference of observed data of a scientific experiment, the null
hypothesis refers to a general or default position: that there is no relationship between two measured
phenomena, or that a potential medical treatment has no effect. Rejecting or disproving the
null hypothesis – and thus concluding that there are grounds for believing that there is a relationship
between two phenomena or that a potential treatment has a measurable effect – is a central task in the
modern practice of science, and gives a precise sense in which a claim is capable of being proven false.
Example
Given the test scores of two random samples of men and women, does one group differ from the other?
A possible null hypothesis is that the mean male score is the same as the mean female score:
H0: μ1 = μ2
where:
H0 = the null hypothesis
μ1 = the mean of population 1, and
μ2 = the mean of population 2.
A stronger null hypothesis is that the two samples are drawn from the same population, such that the
variance and shape of the distributions are also equal.
Critical Region
12
The critical region CR, or rejection region RR, is a set of values of the test statistic for which the null
hypothesis is rejected in a hypothesis test. That is, the sample space for the test statistic is partitioned into
two regions; one region (the critical region) will lead us to reject the null hypothesis H0, the other will
not. So, if the observed value of the test statistic is a member of the critical region, we conclude "Reject
H0"; if it is not a member of the critical region then we conclude "Do not reject H0".
Two-tail Test
In statistical significance testing, a one-tailed test or two-tailed test are alternative ways of computing the
statistical significance of a data set in terms of a test statistic, depending on whether only one direction is
considered extreme (and unlikely) or both directions are considered extreme. Alternative names are one-
sided and two-sided tests; the terminology "tail" is because the extremes of distributions are often small,
as in the normal distribution or "bell curve", pictured above right.
If the test statistic is always positive (or zero), only the one-tailed test is generally applicable, while if the
test statistic can assume positive and negative values, both the one-tailed and two-tailed test are of use.
Figure: A two-tailed test corresponds to both extreme negative and extreme positive directions of the test
statistic, here the normal distribution.
13
25 3 9 30 -4 16
35 1 1
45 11 121
80 380
S12 = 10 S22 = 38
F = S22 / S12
Fcal = 3.8
Ftab = 3.35
Since, Fcal > Ftab, H0 Rejected
b. The sales of a large company conducted a sample survey to examine the daily sales performance of is
salesman posted in two states, Uttar Pradesh and Madhya Pradesh. The result of his survey is shown in
the following table:
Sol.
H0: There is no significant difference between average sales between the two cities.
H1: There is a significant difference between average sales between the two cities.
Submitting the values in the formula
Z = 128.2 which more than table value (1.96)
Hence H0 is rejected, therefore it is concluded that there is a significant difference between
average sales between the two cities.
14
20. Attempt anyone.
a) Calculate price index numbers for the year 2015 with 2014 as the base year from the following data
using:
Sol.
Commodity p0 q0 p1 q1 p0q0 p0q1 p1q0 p1q1
A 10 10 12 12 100 120 120 144
B 15 5 20 6 75 90 100 120
C 8 10 10 11 80 88 100 110
D 20 3 25 2 60 40 75 50
E 50 10 60 9 500 450 600 540
Total 815 788 995 964
b) Marshall-Edgeworth Index =
Answer = 122.2
According to the factor reversal test if the factors i.e. Price and quantity in a price or quantity index
formula be interchanged so that a quantity or price index number formula is obtained, then the product of
these formulae should be the value index number.
15
These test are used to check that which method is best to calculate index number. These tests are used to
check the accuracy of the index number obtained from different resources. They help to compare the
values of index numbers.
b. What are time series? Explain the different components of time series. Discuss any two
methods of forecasting the trend.
Sol) The series of observations recorded over time is known as time series. Time series models uses past
history to predict the future. Components of time series are following:
Secular trend: The Tendency of the time series data to increase, decrease or stagnate over a
long passage of time. For Ex : Population
Seasonal component: is the variability in the behavioral pattern during different seasons in an year.
For Ex: Sale of AC, Fans.
Cyclical component: is almost synonymous with the business cycle reflecting the upswing and
downswing of the data over extended periods of time. For Ex : Recession
Random or Irregular component: irregular variations caused by random factors and sporadic
causes like strikes, natural disasters and so on.
16