You are on page 1of 31

Statistics & Probability Questions and Training

Note:
1. Answers should be in chegg format only
2. Kindly provide step-by-step solutions’
3. Refer chegg guidelines slides before answering for the questions

Question 1

Solution:

From the given information we need to use the ꭓ 2 Goodness of fit as it deals with frequencies.
As we can see the level of significance is not specified then we assume a value of α = 0.05 and
n=4 as they are four number of observations.
From given information we assume that the distribution is uniform that is frequency of the first
peck for each color is equal. The total is 32 (15 + 8 + 5 + 4 = 32) divided by number of color pins
that is 4 that gives us the expected number of pecks for each color being f e = 8

Color First Peck fe f o−f e 2 2


(f ¿ ¿ o−f e ) ¿ ( f ¿ ¿ o−f e )
Frequency ( ¿
fe
f o)
Blue 15 8 7 49 6.125
Green 8 8 0 0 0.000
Yellow 5 8 -3 9 1.125
Red 4 8 -4 16 2.000
Total 32 32 0 9.25

Step 1: H 0 : First peck frequencies are equal for each color. The bobwhites do not have color
preferences.
H 1 : First peck frequencies for each color are not equal and bobwhites do have color
preference.
Step 2: Level of significance α = 0.05

Step 3: We use ꭓ 2 Goodness of fit with n-1 = 4-1 = 3 degrees of freedom

Reject if ꭓ Calc > ꭓ .05, 3 df =7.815


2 2

Step 4: Calculate ꭓ =∑ ¿ ¿¿ = 6.125 + 0.000+ 1.125 + 2.000 = 9.25


2

Step 5: Reject H 0 because 9.25 > 7.815.

P-value = 0.0285
Bobwhites do have color preference and frequencies of the first peck of each color are not equal.
Question 2

The least-squares regression equation is ^y =656.8 x +16.561 where y is the median


income and the percentage of 25 years and older with at least a bachelor's degree in the
region. The scatter diagram indicates a linear relation between the two variables with a
correlation coefficient of 0.7333. Complete parts (a) through (d).

a) Predict the median income of a region in which 25% of adults 25 years and older
have at least a bachelor’s degree.

Solution:

The given equation that is ^y =656.8 x +16,561 is in the form of least square regression equation
is ^y = β^ 1 x+ β^ 0

We know that the population regression line with equation y=β 1 x+ β 0 in which ^β 1and ^β 0 are
the statistics that estimate the population parameter β 1 and β 0.

In the above case just plug in the number 25% i.e., 0.25 in the least square regression equation
line.

That leads to ^y =656.8∗0.25+ 16,561

^y =164.2+16,561

^y =16,725.2
16,725.2 is the median income of a region in which 25% of adults years and older have at least a
bachelor’s degree.
Question 3

Solution:
Given are the information in the above snapshot

Group Runners Individual Runners


n1 = 32 n2 = 30
x 1= 49 miles per week x 2 47.2 miles week
σ 1= 4.2 miles per week σ 2 = 4.8 miles per week

Step 1: H 0 : µ1 = µ2 = 0

H 1 : µ1 ≠ µ2 ≠ 0

Step 2: Level of significance α = 0.05


Step 3: We use Z from independent populations of two tail test comprising two sample means

With large sample test wherein n1 ≥ 30 and n2 ≥ 30 we use the below formula for drawing
conclusion

(x ¿ ¿ 1−x 2 )−(µ1−µ2 )
¿
Z=
√ σ 21 σ 22
+
n1 n 2

( x ¿ ¿ 1−x 2 )−(µ1−µ2 )
¿
Step 4: Calculate Z=

2 2
σ 1 σ2
+
n1 n 2

(49 – 47.2)−0


= 4.22 4.8 2
+
32 30
1.8
=
√ 17.64 23.04
32
+
30
1.8
=
√ 0.55125+0.768
1.8
=
√1.31925
1.8
=
1.1486
=1.5671
Step 5: With a level of 5% significance the critical values of z being ± 1.96.

Hence the decision making that we cannot reject H 0 if -1.96≤ z ≤ 1.96

Or else reject H 0

Since 1.5671 < 1.96, we accept H 0 and conclude that there is not much difference between the
mean number of miles run between the group runner and individual runner.
Statistics Test

Question 1:

Step 1of 2

It is given that the sample size is large (n= 51) and the population is infinite wherein we can assume that
the sample proportion is approximately normal.

The mean of the sample proportion is 0.85. The standard deviation of the sample proportion can be
calculated as

σ ^p=
√ pq Where q=1− p and n is the sample size
n
Now plug in the values we get p=0.85; q=1− p=0.15 ; n=51
σ ^p=
√ 0.85 × 0.15
51

σ ^p=
√ 0.1275
51
σ ^p=√ 0.0025
σ ^p=0.05
Explanation:
The Standard deviation of the sample proportion is 0.05.

Step 2 of 2

To find the probability that the sample proportions fall between 0.9115 and 0.946, we have standardized
the sample proportion using the formula below

^p − p
z=
σ ^p

Where ^p=sample proportion ; ^p1= 0.9115; ^p2= 0.946

p= population proportion = 0.85;


σ ^p=standard deviationof sample proportion=0.05

(0.9115−0.85)
Standardized score for ^p1= z 1= ≈ 1.23
0.05
(0.946−0.85)
Standardized score for ^p2= z 2= ≈ 1.92
0.05
¿ P(1.23< z <1.92)

¿ P(z <1.92)−P(z <1.23)=0.0819 Using a standard normal distribution table or a calculator, we can
find the probability that the sample proportion falls between these two z-scores:

Probability that ^p falls between ^p1∧ ^


p2 : P(z 1< z< z1 )≈ P(1.23 < z< 1.92)≈ 0.0819

Explanation:

Probability that ^p falls between ^p1∧ ^


p2 : P(z 1< z< z1 )≈ P(1.23 < z< 1.92)≈ 0.0819

Final Answer:

The probability that the sample proportion will be between 0.9115 and 0.946 is approximately 8.19%.

Question

2:
Solution:

Step 1 of 2:

To test the hypothesis that the die is fair, we can use the chi-square goodness-of-fit test.

The H 0 (null hypothesis)=dieis fair .

The H 1 ( Alternative hypothesis)=die is not fair .


The expected frequency for each face of the die if it where fair would be 20, since there are 6 faces on a
die. The observed frequencies are given in the table below:

Die Face Observed Frequency (Oᵢ ¿ Expected Frequency ( Eᵢ ¿


1 25 20
2 17 20
3 15 20
4 23 20
5 24 20
6 16 20

Explanation: Expected frequency = Total amount of rolls / Number of faces on the die = 120 / 6 = 20

Step 2 of 3:

To calculate the chi-square statistic, we use the formula:

Chi-square statistic: ꭓ 2=Σ((Oᵢ− Eᵢ)² / Eᵢ)

where Oᵢ is the observed frequency for each face of the die, Eᵢ is the expected frequency for each face of
the die, and the sum is taken over all faces of the die.

Substituting the observed and expected frequencies into this formula, we get:

Chi-square statistic:

ꭓ =Σ((25−20)² /20+(17−20)²/20+(15−20)²/20+(23−20)² /20)+(24−20)² /20+(16−20)²/20


2

ꭓ =Σ(1.25+ 0.45+1.25+0.45+ 0.8+0.8)


2

ꭓ =5
2

Die Face Observed Expected Frequency (Oᵢ−Eᵢ)² (Oᵢ−Eᵢ)²/ Eᵢ


Frequency (Oᵢ ¿ ( Eᵢ ¿
1 25 20 25 1.25
2 17 20 9 0.45
3 15 20 25 1.25
4 23 20 9 0.45
5 24 20 16 0.8
6 16 20 16 0.8
Total 5

Explanation:

Chi-Square test statistics is equal to 5.

Step 3 of 3:
The degrees of freedom for this test are df =k −1=6−1=5 , where k is the number of categories being
tested.

Using a chi-square distribution table or calculator with df =5 and a significance level of α =0.05 , we find
2
that the critical value is approximately χ Critical =11.07.

Since our calculated chi-square value (5) less than the critical value (11.07), we accept the null hypothesis
that the die is fair at a significance level of α = 0.05.

Explanation:

We conclude that there is evidence to suggest that the die may be fair.

Final Answer:

Chi-square value for given data is 5 and die is fair enough.

Question 3:

Solution:

Step 1 of 2:

A continuous random variable X is said to have a uniform distribution over an interval [a, b] if its
probability density function (PDF) is given by:

{
1
,∧a ≤ x ≤ b
f ( x )= b−a
0 ,∧otherwise

In this case, the interval is [10, 15], so the PDF of X is:


{
1 1
= ,∧10 ≤ x ≤ 15
f ( x )= 15−10 5
0 ,∧otherwise
Explanation: Uniform distribution is also called as rectangular distribution as it has constant probability
due to equally likely events.

Step 2 of 2:

The probability that X will take on the value of at least 14 can be calculated by finding the area under the
PDF curve to the right of x = 14. Since the PDF is constant over the interval [14, 15], this area is simply
equal to the height of the PDF at x = 14 multiplied by the width of the interval [14, 15]:

That is, if we let f(x) denote the PDF of X, then:


15
P( X ≥ 14)=∫ f ( x )dx
14

15
P( X ≥ 14)=(15−14)∫ f (14)dx
14

15
P( X ≥ 14)=(15−14)∫ f (14)dx
14

P( X ≥ 14)=1∗1/5
P( X ≥ 14)=0.2
Explanation: A continuous random variable X is said to have a uniform distribution over an interval [a,
b], wherein a and b are inclusive in the Uniform distribution.

Final Answer: Therefore, the probability that X will take on the value of at least 14 is 0.2.

Question 4:

Solution:

Step 1 of 1:
To estimate the number of left-handed scientists, we can use the sample proportion of left-handed
scientists in a sample of a certain size to estimate the true population proportion. The sample size needed
to estimate the population proportion with a margin of error of 4% and a confidence level of 95% can be
calculated using the following formula:
2
z ∗ ^p∗q^
Sample ¿ n= 2
E
where z is the z-score corresponding to the desired confidence level,

^p is the sample proportion,


q^ =^p−1=0.92 is the complement of the sample proportion,
and E is the margin of error.

Substituting z = 1.96 for a 95% confidence level,

^p=0.08 based on the previous study,


and E=0.04 in the above formula, we get:
2
1.96 ∗0.08∗0.92
Sample ¿ n= 2
0.04

0.2827
Sample ¿ n=
0.0016
Sample ¿ n=177
Explanation:

When we decrease the value of ^p simultaneously sample size also decreases and vice versa.

Final Answer:

A sample size of at least 177 scientists is needed to estimate the population proportion of left-handed
scientists with a margin of error of 4% and a confidence level of 95%.

Question 5:

Solution:

a. Step 1 of 2

The random variable X follows binomial distribution with parameter n and p.

Let X be the number of sites that contain the keyword. The probability that a site contains the keyword is
p = 0.20.

The number of sites visited first is n = 10.

The probability mass function of X B(n , p)=B (10 , 0.20) is:

()
P x ( x )= n p ( 1− p )
x
x n− x
n! x n− x
¿ p (1−p)
x !( n−x )!
Explanation: The Binomial distribution is a discrete probability distribution.

Step 2 of 2

The probability that at least 5 of the first 10 sites contain the given keyword can be computed as follows:

P ( X ≥5 )=1−P ( X <5 )
P( X ≥ 5)=1−(P(X =0)−P( X=1)−P( X=2)−P( X=3)−P( X=4))
10 ! 0 10
P ( X=0 )= 0.2 ( 0.8 )
0 ! ( 10−0 ) !
¿ 0.1074
10 ! 1 9
P ( X=1 )= 0.2 ( 0.8 )
1 ! ( 10−1 ) !
¿ 0.2684

10! 2 8
P ( X=2 )= 0.2 ( 0.8 )
2! ( 10−2 ) !
¿ 0.3020
10 ! 3 7
P ( X=3 )= 0.2 ( 0.8 )
3! (10−3 ) !
¿ 0.2013

10 ! 4 6
P ( X=4 )= 0.2 ( 0.8 )
4 ! ( 10−4 ) !
¿ 0.0881
Substituting the values, we get

P( X ≥ 5)=1−P (X=0)−P(X =1)−P (X =2)−P (X =3)−P(X =4)


P( X ≥ 5)=1−0.1074−0.2684−0.3020−0.2013−0.0881
P( X ≥ 5)=1−0.9673
P( X ≥ 5)=0.0327
Explanation: Therefore, the probability that at least 5 of the first 10 sites contain the given keyword is
approximately 0.0327.

b. Step 1 of 1:

Let Y be the number of sites that contain the keyword before the first occurrence of a keyword.
The probability that a site contains the keyword is p = 0.20. The random variable Y follows a geometric
distribution with parameter p. The probability mass function of Y is:
x−1
f ( x )=P (X=x )=( 1− p ) p
The probability that the search engine had to visit at least 5 sites in order to find the first occurrence of a
keyword can be computed as follows:

f ( x )=P (X ≥5)=1−P(X ≤ 4)
3−1 2−1 1−1
¿ 1−( ( 1−0.2 )¿¿ 4−1 0.2)+ ( 1−0.2 ) 0.2+ ( 1−0.2 ) 0.2+ ( 1−0.2 ) 0.2 ¿ ¿
¿ 1−(0.1024+ 0.128+0.16+ 0.2)
¿ 1−0.5904
¿ 0.4096
Explanation:

Therefore, the probability that the search engine had to visit at least 5 sites in order to find the first
occurrence of a keyword is approximately 0.4096.

Final Answer:

a. The probability that at least 5 of the first 10 sites contain the given keyword is approximately
0.0327
b. The probability that the search engine had to visit at least 5 sites in order to find the first
occurrence of a keyword is approximately 0.4096

Answering Directions:
 Solve min. any 3 given questions (total, there are 5 questions).

 Write detailed explanation, steps included in solving the question. Write


formulas, theories and concepts by using MathType.

 Must follow the formatting/settings provided in the email with this test paper.

 Include necessary formulas, equations, calculations, special characters and


numerical values if required.

 Hand written solutions are not accepted

 Do not provide plagiarised (copied from any online or offline source) content.

Question 1
Step 1 of 3

Given u ( x )=6 x−1∧the PDF is f ( x )=1 , 0≤ x ≤ 1

The expected value of u with respect to pdf f(x) is given by


E [u( x )]=∫ u (x)f (x ) dx

For Monte Carlo simulation we estimate the mean value of u(x), we need to perform the
following steps:

1. Calculate u(x ) for each pseudorandom number in the sequence that leads to a
sample of u(x ) values.
2. Then calculate the sample mean and sample standard deviation of the u(x )
values. This will give estimate of the expected value and the uncertainty of
u(x ).
3. Multiplying the sample mean by 1, which is nothing but the length of the
interval [ 0 , 1 ]. That give us an estimate of the integral of u(x ) over [ 0 , 1 ].

Explanation: The Monte Carlo simulation is a statistical tool that is used in the risk of
decision making and analysis in quantitative data.

Step 2 of 3

(a) The expected value of u with respect to pdf f(x) is given by


E [u( x )]=∫ u (x)f (x ) dx

Plugin the values we have in the above formula


1
E [u( x )]=∫ (6 x−1)dx
0

[ ]{ [ ]}
1 n+1
6 x2 x
¿
2
−1 x ¿ theintegration formula ∫ x dx=
n
n+1
0

6
= +1
2
=4

Explanation: Expected value of random variable is nothing but mean of a distribution.


Step 3 of 3

(b) Using the formula for sample mean u(x ) and sample standard deviation s.
10

Sample
∑ u(x) = Number of observations
1
Mean= Total number of observations
10

√ ( ( ) )
10 10 2

∑ u(x)2− ∑ u (x) /n
1 1
Sample Standard Deviation=
n−1

Hence, we tabulate the data for u(x )=6 x−1 and also u(x )2.

<u> u(x )=6 x−1 u (x)


2

0.985432 4.912592 24.13356016


0.509183 2.055098 4.22342779
0.054991 -0.670054 0.448972363
0.641903 2.851418 8.130584611
0.390188 1.341128 1.798624312
0.719905 3.31943 11.01861552
0.876302 4.257812 18.12896303
0.251175 0.50705 0.257099703
0.160938 -0.034372 0.001181434
0.487301 1.923806 3.701029526
Total 20.463908 71.84205845
10

Sample
∑ u(x) 20.463908
1
Mean= = =2.0463908
10 10

Sample Mean ≈ 2.05

√ (∑ (∑ ) )
10 10 2
2
u(x) − u (x) /n
1 1
Sample Standard Deviation=
n−1
Plugin the values we get
Standard Deviation=√ ¿ ¿¿

Standard Deviation=
√ ( 71.84205845−41.87715306 )
9

Standard Deviation=

29.96490539
9
Standard Deviation=√ 3.329433932
Standard Deviation ≈1.82

u=Sample Mean∗1
u=2.05

Explanation: Monte Carlo estimate u is 2.05 which is nothing but estimating the
expected value of u(x ) by using pseudorandom numbers.

Final Answer:
a. Expected value of u is 4.
b. Monte Carlo estimate u is 2.05.

Question 2

Question 3

Step 1 of 2:
Let f and g be positive log-convex functions defined on the convex set C. We want to
show that their sum h( x)=f (x )+ g (x) is also log-convex over C.
A function f is log-convex if and only if its logarithm log (f) is convex. Can equivalently
show that the logarithm of h(x),
convex
log (h( x ))=log (f (x))+ g(x)¿ , is
C
Explanation:

Great example of Logarithmically convex function is f (x)=e x .

Step 2 of 2:
As f and g are log-convex:
log f (t x 1 +(1−t)x 2 )≤ t log f (x 1)+(1−t)log f (x 2)log g (t x 1 +(1−t)x 2 )≤ t log g( x1 )+(1−t)log g (x 2)
for all x 1 , x 2 ∈C∧0≤ t ≤ 1

Adding these two inequalities:


log h (t x 1+(1−t )x 2)=log (f (t x 1 +( 1−t) x 2)+ g(t x 1 +(1−t) x 2))≤ log ¿

Simplifying the expression:


log ¿
f and g are positive, their product f*g is also positive. Exponentiate both sides of the
inequality to obtain:
t (1−t )
h(t x 1+(1−t )x 2)=f (t x 1+(1−t )x 2)+ g(t x 1+(1−t )x 2)≤[f (x 1 )g( x1 )] ∗[ f (x 1) g (x 1)]

Explanation:

The squaring logarithm function f (x)=x2 is convex, but its logarithm log f (x)=2 log| x|
is not.

Final Answer:
Sum of positive log convex function f(x) and g(x) (h( x)=f (x )+ g (x)¿ is also log convex
function defined on the convex set C.

Question 4
Question 5

Step 1 of 2:
Poisson process is used for the above question. Poisson process is the number of events
occurred in the given time frame.
Properties of Poisson Process:

 The number of events occurred in the time frame is a random variable that follows
Poisson distribution.
 Events are independent.
 Rate of events occurred is constant.
Explanation:
Poisson process application is in phone calls at an exchange, customer arrival at a store and
many more.
Step 2 of 2:
Here the time between the events that is passing of cars follows an exponential distribution.
The expected time for the nth event in a Poisson process with rate λ is given by
n
λ
Let X be the no. of black cars that pass before the 10th black car.

X NB(r , p)
X NB(10 , 0.3)
E( X )=10/0.3
E( X )=33.3
The expected number of non-black cars that pass before the 10th black car is 33.3.
Let T be the time until the 10th black car arrives.
9
T =∑ ∑ of the×of first 9 cars+10 th car
1

(¿ between the first 9 black cars) exp (9 , λ)


λ is rate and its independent
Times between the first 9 black cars follows exponential distribution
with rate λ and are independent that equals to 0.3λ

Car 1 Car 2 Car 3 Car 4 Car 5 Car 6 Car 7 Car 8 Car 9 Car 10

Sum follows the gamma distribution with parameter 9, λ


Let X∼Γ(k, θ) for some k, θ>0
where Γ is the Gamma distribution.
The expectation of X is given by:
E(X)=kθ
And the expected value of exponential with rate λ is 1/ λ.
Which leads to the expected time until the 10th black car arrives is
1
9 0.3
+
λ λ
30+9
λ
39
λ
Explanation:
There are three distributions used in the above Poisson process namely Negative Binomial,
Exponential and Gamma distribution.
Final Answer:
The student should expect to wait approximately 39/λ minutes until the 10th black car arrives.

Wise choice tutoring


(Statistics)
Question 1:

Which of the following statements accurately describes the normality condition based
on the constructed normal probability plot for a random sample of college students'
GPAs?

Choose the correct answer below

A. The GPAs of the college students in this sample exhibit an approximate normal
distribution. However, due to the sample's lack of representativeness for the entire
population of college students, it cannot be concluded that the GPAs of all college
students follow a normal distribution.

B. Given that the sample data display a normal distribution and were obtained
randomly, it is reasonable to assume that the GPAs of all college students follow a
normal distribution.

C. The normal probability plot indicates that the data is not fulfilling the conditions of a
normal distribution. Thus, it is not possible to conclude that the population data are
normally distributed.

D. Given that the sample data display a normal distribution, it is reasonable to assume
that the GPAs of all college students follow a normal distribution.
Solution:
Step 1 of 1
There are lot of ways to test whether the data follows Normal distribution or not.
One of the ways is the probability plot which is used in the example. Wherein we
determine how closely two data sets are linearly related. We can see some minor
deviations and also the points in upper and lower extreme does not deviate much from
the straight line. They are following the trend line.

Option A is correct because it states that the GPAs of the college students in this sample
exhibit an approximate normal distribution. However, due to the sample’s lack of
representativeness for the entire population of college students, it cannot be concluded
that the GPAs of all college students follow a normal distribution.

Option B is incorrect because it assumes that the GPAs of all college students follow a
normal distribution based on the sample data display a normal distribution and were
obtained randomly.

Option C is correct because it states that the normal probability plot indicates that the
data is not satisfying the conditions of a normal distribution. Thus, it is not possible to
conclude that the population data are normally distributed.

Option D is incorrect because it assumes that the GPAs of all college students follow a
normal distribution based on the sample data display a normal distribution.

Explanation: Normal plot is constructed on ordering the observations and percentile of


each observation is obtained then Z score against each percentile is obtained and last
but not the least we create scatterplot observations (vertical) against the Z score
(horizontal).

Final Answer

Both option A and C are correct as both follows Normal distribution and both satisfying
the conditions for Normal Probability Plot.

Question 2:
The lifetimes of two brands of light bulbs, A and B, are normally distributed. Brand A
has a mean lifetime of 800 hours with a standard deviation of 50 hours, while brand B
has a mean lifetime of 850 hours with a standard deviation of 60 hours. If a random
light bulb is selected and it lasts more than 820 hours, what is the probability that it
belongs to brand B?
Solution:
Step 1 of 2

σ=60
P (x≥ 820) =?

820 x
µ = 850
z
Z=-0.5 0
Here we use the Bayes Theorem concept whose formula is
P (B|A) = P (A| B) * P(B) /P(A)
Here P(B|A) is probability of brand B given that light bulb lasts more than 820 hours
And P(A|B) is probability of light bulb lasts more than 820 hours given that it is from
brand B
P(B) is Probability of brand B
P(A) is probability of light bulb lasts more than 820 hours
To find the P(A|B) which is nothing but 1-P(X≤820) (since x is distributed normally with
mean 850 and sd 60hrs). We use the Z score formula for going ahead with the Bayes
theorem probability.
z-score formula is:
(x−µ)
z=
σ
Given x is 820 hrs
For brand B μ = mean = 850 hrs and σ = standard deviation = 60hrs
Substituting the values for brand B we have as
(820−850)
z= =−0.5
60
On referring the Normal distribution table, we get the Z- score less than -0.5 to be
0.3085.
P( A∨B)=1−P( X ≤ 820)=1−0.3085=0.6915
Next P(B)= 0.5 as both brands have equal probabilities of selection.
P( A)=P (AՍB )+ P¿ ) (A and B are independent events)
We know that P( AՍB )=P (A∨B)∗P (B)
Now plugin the values for P(A|B) and P(B) we have
P( AՍB )=0.6915∗0.5 = 0.3458
Explanation: The above calculation was done for Brand B. Same goes for Brand A.
Step 2 of 2
For Brand A we have μ = mean = 800 hrs and σ = standard deviation = 50hrs
On substituting the values for Brand A in Z statistic
(820−800)
z= =0.4
50
On referring the Normal distribution table, we get the Z- score of 0.4 to be 0.6554. But
as we need to use the original distribution with mean 800 and Standard deviation 50.
We get P( X ≤ 820)=0.8944
P( A∨B≤ 820)=1−0.8944=0.1056
P( AU B)=P( A∨B)∗P(B)=0.105 6 * 0.5 = 0.0528
P( A)=P ( AՍB )+ P( AՍ B)=0.3458 + 0.0528 = 0.3986
P(B|A) = (0.6915*0.5)/0.3986 =0.8674
Explanation: Brand B bulbs are more life expectancy than brand A.
Final Answer
Therefore, the probability that a random light bulb is selected and it lasts more than 820
hours and belongs to brand B is approximately 86.74%.

Question 3:
A pharmaceutical company claims that a new drug reduces cholesterol levels in
patients. A random sample of 100 patients who took the drug for a month had a mean
cholesterol reduction of 15 mg/dL with a standard deviation of 5 mg/dL. The company
wants to test if the drug is effective in reducing cholesterol levels by at least 10 mg/dL.
Perform a hypothesis test at a significance level of 0.01 to determine if there is evidence
to support the company's claim.
Solution:
Step 1 of 2
To accomplish a hypothesis test at α = 0.01 level of significance if there is
evidence of company claims that new drug introduced reduces cholesterol levels in
patients by 10mg/dL. There are five steps to follow while testing for hypothesis:
Step 1: H 0: The null hypothesis is that the drug does not reduce cholesterol
levels by at least 10 mg/dL.
Step 2: H A : The alternative hypothesis is that the drug reduces cholesterol levels
by at least 10 mg/dL.
x−µ
Step 3: We use t-test as we have a sample mean and sd. t= ; t-statistics is
s/√n
used because of large sample, σ population standard deviation is unknown.
Step 4: Next calculate the p-value. The p-value is the probability of observing a
test statistic as extreme as the one calculated from the sample data, assuming that H 0 is
true.
Step 5: Compare the p-value to the significance level. If the p-value is less than or
equal to the significance level, we reject the H 0. Otherwise, we fail to reject the H 0.
Explanation: Thumb rule of using t statistics is when the sample size is ≥ 30 and
sample standard deviation is given.
Step 2 of 2

Using these steps, we can calculate the test statistic and p-value as follows:
H 0: µ ≤ 10

H A : µ > 10

Given n=100
x=15
s=5
Level of siginificance=0.01
Substituting the values in the formula of t-distribution
15−10
t=
5/ √100
t=10
The degrees of freedom for this test are n - 1 = 100-1 = 99.
Using a t-distribution table or calculator with 99 degrees of freedom, we can find that
the p-value (1) is less than 10.
Explanation: Pharmaceutical company claims that drug can safely reduce cholesterol
level is correct.
Final Answer:
We reject the null hypothesis and conclude and accept alternative hypothesis that there
is evidence to support the company’s claim that the new drug reduces cholesterol levels
by at least 10 mg/dL.

A manufacturers annual losses follow a distribution with density function

{
x
for 0 ≤ x ≤ 100
f ( x )= 5000
0 otherwise
The manufacturer takes out insurance to cover the losses with an annual deductible of 10.
Calculate the expected value of the manufacturers losses NOT paid by insurer.
Solution
Step 1 of 2
The expected value of a continuous random variable is the average of all possible values that a
variable accommodates.
Suppose we denote the losses that were not paid by the insurer as Y, then we can write Y in
place of X as

{
Y =f ( x )= x∧10< x <100 (As there are losses with an annual deductible of 10)
10 for x ≥ 10
Average value of the manufacture losses not paid by insurer is the expected value of the random
variable Y.
Explanation: Calculating the average value of manufacture losses not paid by insurer.
Step 2 of 2
This implies if x is a continuous random variable with probability density function f(x) in the
interval [m, n] and g(x) be the function of x, then the expected value of g(x) is
n
E [g(x )]=∫ g(x )f (x )dx
m

Suppose if we assume that random variables Y is equivalent to g(x), then the expected value of Y
is
100 ∞
E(Y )= ∫ x f (x ) dx+∫ 10 f (x )dx
10 10

Substituting the values of f(X) we


100 ∞
x x
¿∫ x dx +∫ 10 dx
10 5000 10 5000

[ ]
100 ∞
1
¿
5000
∫ x dx +∫ 10 x dx
2

10 10

[[ ] [ ] ] { [ ]}
3 100 2 ∞ n+1
1 x 10 x x
¿
5000 3
+
2
¿ theintegration formula ∫ x n dx= n+1
10 10

[[ ][ ]]
3 3 2
1 100 0 10∗10
¿ − + 0−
5000 3 3 2

¿
1 1000000 1000
5000 3

2 [ ]
≈ 66.57
Explanation: Calculating the average value of manufacture losses not paid by insurer.
Final Solution
The expected value of the manufacturers losses NOT paid by insurer is approximately equal to
66.57

Question 1
(a) Step 1 of 2
The expected value of u with respect to probability density function f(x) is given
by
E [u( x )]=∫ u (x)f (x ) dx
Explanation: The expected value of a function u(x) with respect to a probability
density function f(x) is given by the integral of the product of u(x) and f(x) over the
range of x.
Step 2 of 2
Given u ( x )=6 x−1∧f (x )=1 , 0 ≤ x ≤1

Plugin the values in the above formula


1
E [u( x )]=∫ (6 x−1)dx
0

[ ]{ [ ]}
1 n+1
6 x2 x
¿
2
−1 x ¿ theintegration formula ∫ x dx=
n
n+1
0

6
= −1
2
=2
Explanation: The expected value of u with respect to pdf is 2.
(b)
Step 1 of 2

To use Monte Carlo simulation to estimate the mean value of u(x), following are
the steps:

4. First calculate u(x ) for each pseudorandom number in the sequence. Which
leads to a sample of u(x ) values.
5. Next, we calculate the sample mean and sample standard deviation of the
u(x ) values. Estimate of the expected value and the uncertainty of u(x ).
6. Multiplying the sample mean by 1, which is the length of the interval [ 0 , 1 ].
Will give an estimate of the integral of u(x ) over [ 0 , 1 ].

Explanation: When a Monte Carlo simulation is complete it results in a range of


possible outcomes.

Step 2 of 2
Use the formula for sample mean u(x ) and s sample standard deviation.
10

Sample
∑ u(x)
1
Mean=
10

√ ( ( ) )
10 10 2

∑ u(x)2− ∑ u (x) /n
1 1
Sample Standard Deviation=
n−1

Tabulate the data for u(x )=6 x−1 and also square of u(x ).

<u> u(x )=6 x−1 u (x)


2

0.985432 4.912592 24.13356016


0.509183 2.055098 4.22342779
0.054991 -0.670054 0.448972363
0.641903 2.851418 8.130584611
0.390188 1.341128 1.798624312
0.719905 3.31943 11.01861552
0.876302 4.257812 18.12896303
0.251175 0.50705 0.257099703
0.160938 -0.034372 0.001181434
0.487301 1.923806 3.701029526
Total 20.463908 71.84205845
10

Sample
∑ u(x) 20.463908
1
Mean= = =2.0463908
10 10

Sample Mean ≈ 2.05

√ (∑ (∑ ) )
10 10 2
2
u(x) − u (x) /n
1 1
Sample Standard Deviation=
n−1
Plugin the values
Standard Deviation=√ ¿ ¿¿

Standard Deviation=
√ ( 71.84205845−41.87715306 )
9

Standard Deviation=

29.96490539
9
Standard Deviation=√ 3.329433932

Standard Deviation ≈1.82


u=Sample Mean∗1
u=2.05

Explanation: Monte Carlo estimate u is 2.05 which is nothing but estimating the
expected value of u(x ) by using pseudorandom numbers.
Final Answer:
c. Expected value of u is 4.
d. Monte Carlo estimate u is 2.05.

Question 2

Question 3

Solution:
Step 1 of 2:
Let f and g be positive log-convex functions defined on the convex set C. We want to
show that their sum h( x)=f (x )+ g (x) is also log-convex over C.
A function f is log-convex if and only if its logarithm log (f) is convex. Can equivalently
show that the logarithm of h(x),
convex
log (h( x ))=log (f (x))+ g(x)¿ , is
C
As f and g are log-convex:
log f (t x 1 +(1−t)x 2 )≤ t log f (x 1)+(1−t)log f (x 2)log g (t x 1 +(1−t)x 2 )≤ t log g( x1 )+(1−t)log g (x 2)
for all x 1 , x 2 ∈C∧0≤ t ≤ 1

Explanation: The squaring logarithm function f (x)=x2 is convex, but its logarithm
log f (x)=2 log| x| is not.

Step 2 of 2:
Adding these two inequalities:
log h (t x 1+(1−t )x 2)=log(f (t x 1 +(1−t) x 2)+ g(t x 1 +(1−t) x 2))≤ log ¿

Simplifying the expression:


log ¿
f and g are positive, their product f*g is also positive. Exponentiate both sides of the
inequality to obtain:
t (1−t )
h(t x 1+(1−t )x 2)=f (t x 1+(1−t )x 2)+ g(t x 1+(1−t )x 2)≤[f (x 1 )g( x1 )] ∗[ f (x 1) g (x 1)]

Explanation:
Sum of positive log convex function f(x) and g(x) (h( x)=f (x )+ g (x)¿ is also log convex
function defined on the convex set C.
Final Answer:
The sum of two positive log-convex functions is also log-convex function.
Question 4

Question 5
Using the following description, construct an ER diagram.
Information needs to be stored about books. Its ISBN (International Standard Book Number)
uniquely identifies each book. Other information about a book includes its little and publication
date. In addition to book information, there is also information stored about the book’s
publisher. This includes a unique publisher identifier, publisher name, and publisher address. A
single publisher can only publish a book.
Information on the authors of book is also stored. This information includes the authors social
security number, name, and address. Either a single author or several authors can write any
single book.
When the book is printed, it is sent to printer. Information about the printer includes a unique
printer identifier, printer name, and address. A contract is written that indicates the number of
books the printer will print and the printing deadline the printer needs to meet. At times, a
single book might be contracted to several printers if the quantity required to be printed
exceeds the printer’s production capacity.

Question 1
Question 2

Passing cars. A student procrastinates, watching cars pass his house. After 10 black cars
have passed, he will (finally) start his homework. He notices that the time between
consecutive cars that pass is Exponentially distributed, and the times are independent,
each with expected time of 1/5 of a minute. He also assumes that each car has a 30%
chance of being black, and the colors of the car are independent. How long should he
wait until the 10th black car arrives?
Solution:
Step 1 of 2:
The student is waiting for the 10th black car to pass by his house. He observes that the
time between consecutive cars passing by his house is exponentially distributed with an
expected time of 1/5 of a minute. The exponential distribution is a continuous
probability distribution that often concerns the amount of time until some specific event
happens.
Let time between consecutive cars passing by his house be T.
T is exponentially distributed with an expected time of 1/5 of a minute; the rate
parameter λ is equal to 5.
The probability density function of T is:
− λt −5t
f T (t )=λ e =5 e

Explanation: E
Step 2 of 2:
Now, let’s denote the number of black cars that have passed by his house as N. Since
each car has a 30% chance of being black, then the probability of a car being black is
equal to 0.3. The probability that the i th car passing by his house is black is:
P(black car )=0.3

We need to find the expected time until the 10th black car passes by his house. Denote
this time as X. We can express X in terms of T and N as follows:
X =T 1 +T 2+ ...+ T 10
where T i denotes the time between the (i−1)th and i th black car passing by his house.
Finding the expected value of X, using linearity of expectation to write:
E( X )=E(T 1)+ E (T 2 )+...+ E(T 10)

Let’s find E(T 1 ). As T i is exponentially distributed with a rate parameter of 5. Use the
formula for expected value of an exponential distribution to write:
1 1
E(T 1 )= =
λ 5
Substituting this value in our expression for E( X ):
10∗1
E( X )= =2
5

On average, he should expect to wait for 2 minutes until the 10th black car passes by his
house.
Question 3
Question 4

Question 5

You might also like