You are on page 1of 21

DATA SCIENCE

WITH R
NULL AND ALTERNATE HYPOTHESIS

A vendor claims that his company fills any accepted order, on the average, in at most six
working days. You suspect that the average is greater than six working days and want to test
the claim. How will you set up the null and alternative hypotheses
NORMAL DISTRIBUTION:

• Mercury makes a 2.4 lt V-6 engine, The Laser XRi, used in speedboats. The companies
engineer believe that the engine delivers an average power of 220 horsepower and that
the standard deviation of power delivered is 15 horsepower.

A potential buyer intends to sample 100 engines(each engine to be run a single time). What
is the probability that the sample mean will be less than 217 horsepower?
CONTINUOUS PROBABILITY DISTRIBUTIONS

• On an average , a One day international cricket match lasts for about 500 mins.
• It has about 300 mins of actual play, and 200 mins of advertisements.
• Let us assume that the probability distribution for the number of actual minutes of the
cricket telecast is uniformly distributed from a low of 260 mins to a high of 340 mins .
• What is the probability that the match will have between 280 mins to 320 mins of actual
play?
NORMAL

• A study report claimed that the average number of weeks an individual is unemployed is 17.5
weeks. Assume that for the population of all unemployed individuals the population mean length
of unemployment is 17.5 weeks and that the population standard deviation is 4 weeks.
• Suppose you would like to select a sample of 50 unemployed individuals for a follow up study:
• 1. What is the probability that a simple random sample of 50 unemployed individuals will
provide a sample mean within a week of the population mean?
• 2. What is the probability that a simple random sample of 50 unemployed individuals will
provide a sample mean within half a week of the population mean?
CENTRAL LIMIT THEOREM

• According to a study by MotherCare, the average child plays for 2 hours when active,
with a std dev of 0.7 hrs.
• The school class teacher wants to plan a trip with 50 children.
As a statistician, what is the probability that the children will play for more
than 2.2 hours?
QUALITY CONTROL:

• A Cold Drink Manufacturer says that the amount of liquid in its bottle is at least1500 ml.
He also says that the amount of standard deviation in the bottles manufactured is 30 ml.
• As a Food and Drug inspector, you want to check if the manufacturer is telling the truth,
or he is making a fool of the customers. Therefore, you want to check the amount of
quantity of liquid in a sample.

• You take out a sample of 10 cold drink bottles and find the average amount of liquid in
them. You get 1489 ml as the average liquid . What do you do now?
NORMAL DISTRIBUTION:

• Netflix is planning to invest heavily in online television services.


• As part of the decision, the company wants to estimate the average no of online shows a
family of four would watch per day.
• A random sample of n=100 families is obtained, and in this sample the average no of
shows viewed per day is 6.5 and the population standard deviation is known to be 3.2.
• Construct a 95% confidence interval for the average no of online television shows
watched by the entire population of families of four.
NORMAL DISTRIBUTION:

• Suppose the mean weight of 2 year old boys in India is normally distributed with a mean
of 9.5kg and a std dev of 1.1 kg. Calculate the % of the boys for the following conditions:
• 1.Weight is less than 8.4 kg
• 2.Between 7.3 kg and 11.7 kg
• 3.more than 12.8 kg
CONFIDENCE INTERVAL:

• A Bank President is interested in knowing about the average balance of all the saving
bank accounts in the bank. He needs to give this number in the investor briefing.
• Therefore, a random sample of 400 accounts is taken from the bank database to compute
the average balance of the accounts.
• Since it is going to be used in the press, the sample mean should be a good approximation
of the accounts.
• What level of confidence of the mean is the president going to be satisfied with?
CHI SQ TEST

In an automated process, a machine fills cans of coffee. If the average amount filled is
different from what it should be, the machine may be adjusted to correct the mean. If the
variance of the filling process is too high, however, the machine is out of control and needs
to be repaired. Therefore, from time to time regular checks of the variance of the filling
process are made. This is done by randomly sampling filled cans, measuring their amounts,
and computing the sample variance. A random sample of 30 cans gives an estimate s2=
18,540. Give a 95% confidence interval for the population variance ?
HYPOTHESIS TESTING:

• An automatic bottling plant machine fills cola into 2lt bottles.


• A lawyer wants to test the null hypothesis that the average amount filled by the machine
in the bottle is at least 2000cm3.
• A random sample of 40 bottles coming out of the machine was selected and measured. It
was found to be 1999.6cm3.
• The population standard deviation is 1.30 cm3.
• Test the null hypothesis at an alpha of 5%.
1 SAMPLE T TEST

• The class X students of all the schools based in New York appear for their annual exams.
• The Governor says that last year, he had given $100,000 to the city corporation for the
development of schools, including recruiting new teachers and paying for more
computers.
• He says that the mean marks of the students in mathematics has increased since last year
and now students score 55 % from the earlier 50%.
• You need to test if the governor is speaking the truth. For that you have a sample of
randomly selected students. Is he speaking the truth?
PAIRED SAMPLE T TEST

• The class X students of all the schools based in New York appear for their annual exams.
• The Governor says that last year, he had given $100,000 to the city corporation for the
development of schools, including recruiting new teachers and paying for more
computers.
• He says that there is a significant amount of difference between the reading and writing
scores of the students
• You need to test if the governor is speaking the truth. For that you have a sample of
randomly selected students. Is he speaking the truth?
INDEPENDENT GROUP T TEST

• The class X students of all the schools based in New York appear for their annual exams.
• The Governor says that last year, he had given $100,000 to the city corporation for the
development of schools, including recruiting new teachers and paying for more computers.
• He says that there is a significant amount of difference between the math scores of the
girl students vis a vis the boys.
• You need to test if the governor is speaking the truth. For that you have a sample of
randomly selected students. Is he speaking the truth?
ANOVA

In order to test the consumer’s preference for Brazilian coffee, three kinds of coffee were served:
a group of 21 randomly chosen customers were served pure Brazilian coffee; another group of 20
randomly chosen customers were served pure Colombian coffee; and a third group of 22
randomly chosen customers were served pure African-grown coffee.
Suppose that data for the three groups were consumers’ ratings of the coffee on a scale of 0 to 100
and that certain computations were carried out with these data, leading to the following value of
the ANOVA test statistic: F =2.02.
Is there evidence to conclude that any of the three kinds of coffee leads to an average consumer
rating different from that of the other two kinds?
ANOVA
CHI SQ

Let us have 2 die in our hand, one is fair and the other is loaded( unfair). One die is handed
to you and you are asked to determine whether it is fair or doctored. How shall you do this
with 95% accuracy?
For this, you roll the die 600 times, and record how many times each number occurs.
CHI SQ

You work in the academic department of a college, and there are various branches of engineering
like Electronics, Mechanical, Civil and Computer science. Over the last few years, the number of
students in each of these branches has changed.
Even though this can be random variation in data, is the variation in the number of students in
various branches has varied due to chance alone? Or is there more specific reason possible.

BRANCH 2013 2014 2015 2016 2017


Electronics 363 375 377 313 368
Civil 131 186 188 173 109
Mechanical 182 220 170 237 177
Computers 297 221 300 396 221
I.T. 219 238 215 213 217
COVARIANCE

You work in the strategy department of a manufacturing company.


A manufacturing company wants to see the relationship between the no. of workers (x) and the
no. of items that are produced (y) in its plant. Therefore, you obtain 10 samples, each of length
of 1 hour from the floor.
No. of items No. of workers
x y
4 14
6 16
8 19
9 22
10 24
12 29
7 12
8 22
9 21
14 32
NULL HYPOTHESIS FOR VARIOUS TESTS

You might also like