# Hypothesis Test

1. A consumer group, concerned about the mean fat content of a certain grade of steak burger submits to an
independent laboratory a random sample of 12 steak burgers for analysis. The percentage of fat in each of the
steak burgers is as follows: 21 18 19 16 18 24 22 19 24 14 18 15 The manufacturer claims that the mean fat
content of this grade of steak burger is less than 20%. Assuming percentage fat content to be normally distributed
with a standard deviation of 3, carry out an appropriate hypothesis test in order to advise the consumer group as
to the validity of the manufacturer's claim

2. During a particular week, 13 babies were born in a maternity unit. Part of the standard procedure is to measure
the length of the baby. Given below is a list of the lengths, in centimeters, of the babies born in this particular
week. 49 50 45 51 47 49 48 54 53 55 45 50 48. Assuming that this sample came from an underlying normal
population, test, at the 5% significance level, the hypothesis that the population mean length is 50 cm.

3. A random sample of 12 steel ingots was taken from a production line. The masses, in kilograms, of these ingots are
given below. 24.8 30.8 28.1 24.8 27.4 22.1 24.7 27.3 27.5 27.8 23.9 23.2 Assuming that this sample came
from an underlying normal population, investigate the claim that its mean exceeds 25.0 kg. α = 1%

4. A car manufacturer introduces a new method of assembling a particular component. The old method had a mean
assembly time of 42 minutes. The manufacturer would like the assembly time to be as short as possible, and so he
expects the new method to have a smaller mean. A random sample of assembly times (minutes) taken after the
new method had become established was 27 39 28 41 47 42 35 32 38 Stating any necessary distributional
assumptions, investigate the manufacturer's expectation.
α= 2 %

5. A random sample of 15 workers from a vacuum flask assembly line was selected from a large number of such
workers. Ivor Stopwatch, a work-study engineer, asked each of these workers to assemble a one-litre vacuum flask
at their normal working speed. The times taken, in seconds, to complete these tasks are given below: 109.2 146.2
127.9 92.0 108.5 91.1 109.8 114.9 115.3 99.0 112.8 130.7 141.7 122.6 119.9 Assuming that this sample came
from an underlying normal population, investigate the claim that the population mean assembly time is less than
2 minutes.
α= 5%

6. A pharmacist claims that more than 60% of all customers simply collect a prescription. One of her assistants notes
that, in a random sample of 12 customers, 10 simply collected a prescription. Does this provide sufficient evidence,
at the 5% level, to support the pharmacist's claim

7. In a survey carried out in Funville, 14 children out of a random sample of 30 said that they bought the Bopper
comic regularly. Test, at the 10% level of significance, the hypothesis that the true proportion of all children who
buy this comic regularly is 0.35.

8. A random sample of 36 coffee drinkers were each asked to taste-test a new brand of coffee. The responses are
listed below with L representing 'like', I representing 'indifferent', and D representing 'dislike'.
LDLLDLLLLILD
LLLILDLILLDI
ILLLDLLLLIDL
Do these data support the claim that more than half of all coffee drinkers like this new brand of coffee?
α= 5%

9. Packets of ground filter coffee have a nominal weight of 200 g. The distribution of weights may be assumed to be
normal. A random sample of 32 packets had the following weights. 218 207 214 189 211 206 203 217 183 186
197 219 213 207 214 203 204 195 197 213 212 188 221 217 184 186 216 198 211 216 200 208 Investigate
the assumption that the mean weight of all packets is 200 g. Test the hypothesis that 15% of packets weigh less
than 190 g.
α= 3%

10. Employees of a firm carrying out motorway maintenance are issued with brightly colored waterproof jackets. These
come in different sizes numbered 1 to 5. The last 40 jackets issued were of the following sizes.
2 3 3 1 3 3 2 4 3 2 5 4 1 2 3 3 2 4 5 3 2 4 4 1 5 3 3 2 3 3 1 3 4 3 3 2 5 1 4 4
Assuming that the 40 employees may be regarded as a random sample of all employees, test the hypothesis, at
the 5% significance level, that 40% of all employees require size 3. Test the claim that size 3 is the median size.

11. The Acme Company has developed a new battery. The engineer in charge claims that the new battery will operate
continuously for at least 7 minutes longer than the old battery. To test the claim, the company selects a simple
random sample of 100 new batteries and 100 old batteries. The old batteries run continuously for 190 minutes
with a standard deviation of 20 minutes; the new batteries, 200 minutes with a standard deviation of 40 minutes.
Test the engineer's claim that the new batteries run at least 7 minutes longer than the old. Use a 0.05 level of
significance. (Assume that there are no outliers in either sample.)

12. Forty-four sixth graders were randomly selected from a school district. Then, they were divided into 22 matched
pairs, each pair having equal IQ's. One member of each pair was randomly selected to receive special training.
Then, all of the students were given an IQ test. Test results are summarized below.

## Pair Training No training Pair Training No training

1 95 90 12 85 83
2 89 85 13 87 83
3 76 73 14 85 83
4 92 90 15 85 82
5 91 90 16 68 65
6 53 53 17 81 79
7 67 68 18 84 83
8 88 90 19 71 60
9 75 78 20 46 47
10 85 89 21 75 77
11 90 95 22 80 83
Do these results provide evidence that the special training helped or hurt student performance? Use an 0.05
level of significance. Assume that the mean differences are approximately normally distributed.

13. Within a school district, students were randomly assigned to one of two Math teachers - Mrs. Smith and Mrs.
Jones. After the assignment, Mrs. Smith had 30 students, and Mrs. Jones had 25 students. At the end of the year,
each class took the same standardized test. Mrs. Smith's students had an average test score of 78, with a standard
deviation of 10; and Mrs. Jones' students had an average test score of 85, with a standard deviation of 15.
Test the hypothesis that Mrs. Smith and Mrs. Jones are equally effective teachers. Use a 0.10 level of significance.
(Assume that student performance is approximately normal.)

14. The CEO of a large electric utility claims that 80 percent of his 1,000,000 customers are very satisfied with the
service they receive. To test this claim, the local newspaper surveyed 100 customers, using simple random
sampling. Among the sampled customers, 73 percent say they are very satisified. Based on these findings, can we
reject the CEO's hypothesis that 80% of the customers are very satisfied? Use a 0.05 level of significance.

15. The CEO of a large electric utility claims that 80 percent of his 1,000,000 customers are very satisfied with the
service they receive. To test this claim, the local newspaper surveyed 100 customers, using simple random
sampling. Among the sampled customers, 73 percent say they are very satisfied. Based on these findings, can we
reject the CEO's hypothesis that 80% of the customers are very satisfied? Use a 0.05 level of significance.

16. Suppose the Acme Drug Company develops a new drug, designed to prevent colds. The company states that the
drug is equally effective for men and women. To test this claim, they choose a simple random sample of 100 women
and 200 men from a population of 100,000 volunteers. At the end of the study, 38% of the women caught a cold;
and 51% of the men caught a cold. Based on these findings, can we reject the company's claim that the drug is
equally effective for men and women? Use a 0.04 level of significance.

17. Suppose the previous example is stated a little bit differently. Suppose the Acme Drug Company develops a new
drug, designed to prevent colds. The company states that the drug is more effective for women than for men. To
test this claim, they choose a simple random sample of 100 women and 200 men from a population of 100,000
volunteers. At the end of the study, 38% of the women caught a cold; and 51% of the men caught a cold. Based on
these findings, can we conclude that the drug is more effective for women than for men? Use a 0.01 level of
significance.

18. Un diseñador de productos está interesado en reducir el tiempo de secado de una pintura. Se prueban dos
fórmulas de pintura; la fórmula 1 tiene el contenido químico estándar y la fórmula 2 tiene un nuevo ingrediente
secante que tiende a reducir el tiempo de secado. De la experiencia se sabe que la desviación estándar del tiempo
Se pintan 35 placas con la fórmula 1 y otras 35 con la fórmula 2. Los dos tiempos promedio de secado muestrales
son 116 minutos para la fórmula 1 y 112 minutos para la fórmula 2. ¿A qué conclusión puede llegar el diseñador
del producto sobre la eficacia del nuevo ingrediente, al nivel de significancia 0,01?

19. Cinco muestras de una sustancia ferrosa se usan para determinar si hay una diferencia entre un análisis químico
de laboratorio y un análisis de fluorescencia de rayos X del contenido de hierro. Cada muestra se divide en dos
submuestras y se aplican los dos tipos de análisis. A continuación, se presentan los datos codificados que muestran
los análisis de contenido de hierro: Suponga que las poblaciones son normales, pruebe con un nivel de significancia
de 0,05 si los dos métodos de análisis dan, en promedio, el mismo resultado

## X-ray 2.0 2.0 2.3 2.1 2.4

Chemical 2.2 1.9 2.5 2.3 2.4
Suponga que las poblaciones son normales, pruebe con un nivel de significancia de 0,05 si los dos métodos de
análisis dan, en promedio, el mismo resultado.

20. Una muestra de 50 familias de una comunidad muestra que 10 de ellas están viendo un programa especial de
televisión sobre la economía nacional. En una segunda comunidad 15 familias de una muestra aleatoria de 50
están viendo el programa especial de televisión, a continuación se prueba la hipótesis de que la proporción general de
televidentes en las dos comunidades no difiere, usando el nivel de significancia de 1%

21. Se ponen a prueba la enseñanza de la Estadística empleando Excel y Winstats. Para determinar si los estudiantes
difieren en términos de estar a favor de la nueva enseñanza se toma una muestra aleatoria de 20 estudiantes por
cada paralelo. Del paralelo A 18 están a favor, en tanto que del paralelo B están a favor 14. ¿Es posible concluir
con un nivel de significación de 0,05 que los estudiantes que están a favor de la nueva enseñanza de la Estadística
es la misma en los dos paralelos?.

22. Supongamos que un investigador está interesado en evaluar la asociación entre quienes poseen vehículo propio y
el nivel socioeconómico del conductor. Con este objeto se toma una muestra de conductores a quienes se clasifica
en una tabla de asociación, encontrando los siguientes resultados:

## Posee Nivel Nivel Nivel

vehículo socioeconómico socioeconómico socioeconómico
propio bajo medio alto
SI 8 15 28
NO 13 16 14

## Tabla I. Tabla de asociación, valores observados.

¿Permiten estos datos afirmar que el uso del cinturón de seguridad depende del nivel socioeconómico? Usar un
nivel de significación alfa=0,05.

23. Tomemos como ejemplo la distribución esperada para los individuos de una población que son clasificados según
grupo sanguíneo. Según estudios realizados en población, se espera que dicha distribución, en porcentajes, sea la
siguiente:

AB 2,0%
A 30,5%
B 9,3%
0 58,2%

AB 4
A 48
B 15
0 83

## Se ajustan los datos observados a la distribución teórica? α = 1%

24. Juan Méndez, director de Mercadeo de Aladino, tiene la responsabilidad de controlar el nivel de existencias para
cuatro tipos de automóvil vendidos por la firma. En el pasado, ha ordenado nuevos automóviles bajo la premisa
de que los cuatro tipos son igualmente populares y la demanda de cada tipo es la misma. Sin embargo,
recientemente las existencias se han vuelto más difíciles de controlar. Juan considera que debería probar su
hipótesis respecto a una demanda uniforme. La demanda es uniforme para los cuatro tipos de autos?

## Tipo de auto Ventas Ventas

Kia 15 12
Fiesta 11 12
Focus 10 12
Clio 12 12

25. Paty Alvarado es la directora de investigación de Plaguicidas. En su proyecto actual Paty debe determinar si existe
alguna relación entre la clasificación de efectividad que los consumidores asignan a un nuevo insecticida y el sitio
(urbano o rural) en el cual se utiliza. De los 100 consumidores a quienes se le aplicó la encuesta, 75 vivían en zonas
urbanas y 25 en zonas rurales. La Tabla 1.2 resume las clasificaciones hechas por los consumidores.

## Clasificación Urbano Rural

Arriba del promedio fo = 20 fo =11

Promedio fo = 40 fo = 8

## H0: La clasificación y la ubicación son independientes.

26. A University conducted a survey of its recent graduates to collect demographic and health information for future
planning purposes as well as to assess students' satisfaction with their undergraduate experiences. The survey
revealed that a substantial proportion of students were not engaging in regular exercise, many felt their nutrition
was poor and a substantial number were smoking. In response to a question on regular exercise, 60% of all
graduates reported getting no regular exercise, 25% reported exercising sporadically and 15% reported exercising
regularly as undergraduates. The next year the University launched a health promotion campaign on campus in an
attempt to increase health behaviors among undergraduates. The program included modules on exercise, nutrition
and smoking cessation. To evaluate the impact of the program, the University again surveyed graduates and asked
the same questions. The survey was completed by 470 graduates and the following data were collected on the
exercise question:

Exercise Exercise Exercise
Number of Students 255 125 90

Based on the data, is there evidence of a shift in the distribution of responses to the exercise question following
the implementation of the health promotion campaign on campus? Run the test at a 5% level of significance.
27. The National Center for Health Statistics (NCHS) provided data on the distribution of weight (in categories) among
Americans in 2002. The distribution was based on specific values of body mass index (BMI) computed as weight in
kilograms over height in meters squared. Underweight was defined as BMI< 18.5, Normal weight as BMI between
18.5 and 24.9, overweight as BMI between 25 and 29.9 and obese as BMI of 30 or greater. Americans in 2002 were
distributed as follows: 2% Underweight, 39% Normal Weight, 36% Overweight, and 23% Obese. Suppose we want
to assess whether the distribution of BMI is different in the Framingham Offspring sample. Using data from the
n=3,326 participants who attended the seventh examination of the Offspring in the Framingham Heart Study we
created the BMI categories as defined and observed the following:

## BMI<18.5 BMI 18.5-24.9 BMI 25.0-29.9 BMI > 30

# of Participants 20 932 1374 1000

## Set up the hypotheses and determine level of significance 5%

28. In a prior example we evaluated data from a survey of university graduates which assessed, among other things,
how frequently they exercised. The survey was completed by 470 graduates. In the prior example we used the
χ2 goodness-of-fit test to assess whether there was a shift in the distribution of responses to the exercise question
following the implementation of a health promotion campaign on campus. We specifically considered one sample
(all students) and compared the observed distribution to the distribution of responses the prior year (a historical
control). Suppose we now wish to assess whether there is a relationship between exercise on campus and students'
living arrangements. As part of the same survey, graduates were asked where they lived their senior year. The
response options were dormitory, on-campus apartment, off-campus apartment, and at home (i.e., commuted to
and from the university). The data are shown below.

Exercise Exercise Exercise
Dormitory 32 30 28
On-Campus Apartment 74 64 42
Off-Campus Apartment 110 25 15
At Home 39 6 5
Total 255 125 90

## Set up hypotheses and determine level of significance.

H0: Living arrangement and exercise are independent; α=0.05

29. A randomized trial is designed to evaluate the effectiveness of a newly developed pain reliever designed to reduce
pain in patients following joint replacement surgery. The trial compares the new pain reliever to the pain reliever
currently in use (called the standard of care). A total of 100 patients undergoing joint replacement surgery agreed
to participate in the trial. Patients were randomly assigned to receive either the new pain reliever or the standard
pain reliever following surgery and were blind to the treatment assignment. Before receiving the assigned
treatment, patients were asked to rate their pain on a scale of 0-10 with higher scores indicative of more pain.
Each patient was then given the assigned treatment and after 30 minutes was again asked to rate their pain on the
same scale. The primary outcome was a reduction in pain of 3 or more scale points (defined by clinicians as a
clinically meaningful reduction). The following data were observed in the trial.

## Number with Number with

Reduction Reduction
Treatment Group n
of 3+ Points of <3 Points

## New Pain Reliever 50 23

Standard Pain Reliever 50 11
Test whether there was a significant difference in the proportions of patients reporting a meaningful reduction
(i.e., a reduction of 3 or more scale points) using a Z statistic, as follows.
Set up hypotheses and determine level of significance 5%

30. El director de una escuela clasifica a los padres en tres categorías socio-económicas según su área de residencia y
en tres niveles de participación en actividades escolares. Probar la hipótesis de que no existe relación entre el nivel
socio-económico y la participación en actividades escolares, con un nivel de significación del 4%

Nivel de ingreso
Participación
Bajo Medio Alto
Nunca 28 48 16
Ocasional 22 65 14
Regularmente 17 74 3

REGRESIÓN Y CORRELACIÓN

1. Las notas de 12 alumnos de una clase en Matemáticas y Física son las siguientes:
Matemáticas 2 3 4 4 5 6 6 7 7 8 10 10
Física 1 3 2 4 4 4 6 4 6 7 9 10
Hallar el coeficiente de correlación de la distribución e interpretarlo.

2. Una compañía de seguros considera que el número de vehículos que circulan por una determinada autopista a
más de 120 km/h , puede ponerse en función del número de accidentes que ocurren en ella. Durante 5 días obtuvo

Accidentes 5 7 2 1 9
Número de vehículos 15 18 10 8 20
Calcula el coeficiente de correlación lineal. Si ayer se produjeron 6 accidentes, ¿cuántos vehículos podemos
suponer que circulaban por la autopista a más de 120 km / h? Es buena la predicción?
3. El número de obreros (en millones) ocupados en la agricultura, para los años que se indican, era:
Año 2007 2008 2009 2010 2011 2012 2013 2014
Ocupados 2,1 2,04 1,96 1,74 1,69 1,49 1,25 1,16
a) ¿Podría explicarse su evolución mediante una recta de regresión?
b) ¿Qué limitaciones tendrían las estimaciones hechas por esa recta?

4. Asocia las rectas de regresión y = –x +16, y = 2x – 12, y = 0,5x + 5 a las nubes de puntos siguientes:

Asigna los coeficientes de correlación lineal r = 0,4, r = –0,85 y r = 0,7, a las nubes del problema anterior.

5. La tabla siguiente muestra las notas obtenidas por 8 alumnos en un examen, las horas de estudio dedicadas a su
preparación y las horas que vieron la televisión los días previos al examen.

Nota 5 6 7 3 5 8 4 9
Horas de estudio 7 10 9 4 8 10 5 14
Horas de TV 7 6 2 11 9 3 9 5

## a) Representa gráficamente los diagramas correspondientes a nota-estudio y nota-TV.

b) ¿Se observa correlación entre las variables estudiadas? ¿De qué tipo? ¿En qué caso estimas que es más fuerte?
C) Hallar el coeficiente de correlación de nota-estudio y nota-TV. ¿Qué puede deducirse con más precisión
conociendo la nota que obtuvo una persona en el examen: el tiempo que dedicó al estudio o el que dedicó a
ver la televisión?
R = 0,943382 y 0,846283. Respectivamente.
d) Con los mismos datos, hallar las rectas de regresión correspondientes y estima para un alumno que sacó un 2
en el examen: las horas que estudió y Las horas que vio la TV.

6. Durante su primer año de vida han pesado a Marta cada mes. En la tabla siguiente se dan sus pesos:

Edad 1 2 3 4 5 6 7 8 9 10 11 12
Peso 3,2 3,7 4,2 5,3 5,7 6,5 6,8 7,2 7,9 7,7 8 8,5

## a) Calcula la media y la desviación típica de los pesos.

b) Determina la ecuación de la recta de regresión de y sobre x, explicando detalladamente los cálculos realizados
y las fórmulas que utilizas R = a) 6,225; 1,7181 b) y = 0,48706x + 3,05909
c) Hallar el coeficiente de determinación
d) Si Marta llega a tener 14 meses, qué peso se esperaría?
e) Si se espera de Marta un peso de 7,7 Kg. Qué edad debería tener?
f) Si se espera de Marta un peso de 8,9 Kg. Qué edad debería tener?
7. El dueño de un restaurante de hamburguesas en la ciudad desea determinar la interrelación entre la introducción

## Utilidades 70 40 100 80 30 100

Demanda de catsup nacional 2 1 3 2 1 3
Demanda de catsup importada 50 65 75 30 45 35

## a) La ecuación de regresión lineal múltiple.

b) La prueba de significancia del modelo 5%
c) Intervalos de confianza del 95 % para los parámetros del modelo.
d) Intervalos de confianza del 90 % para la utilidad esperada y la futura cuando la demanda de catsup nacional
sea de 4 y la de catsup importada de 50.
e) El coeficiente de determinación múltiple.
f) Si la demanda de cátsup nacional es de 4 y la importada es de 58; cuáles serían las utilidades esperadas?
g) Si la demanda de cátsup nacional es de 3 y la importada es de 60; cuáles serían las utilidades esperadas?

8. A small study is conducted involving 17 infants to investigate the association between gestational age at birth,
measured in weeks, and birth weight, measured in grams.

a) We wish to estimate the association between gestational age and infant birth weight. In this example, birth
weight is the dependent variable and gestational age is the independent variable.
b) Build a confidence interval with 95 %, between eight and weight
c) If the eight is 37.8, which should be the expected weight?
d) If the weight is 2990, which should be the expected eight?
9. The following table shows a researcher works with Alzheimer’s caregivers

Table 1. Made-up data for the predictors of scores for quality of life.

## Social Caregiver Financial Quality Year

Participant Support Age Assets of Life Born
1 16 56 275,000 12 1950
2 26 44 325,000 8 1962
3 17 75 1,500,000 12 1931
4 27 59 2,100,000 12 1947
5 40 58 560,000 10 1948
6 20 78 790,000 9 1928
7 28 63 1,100,000 12 1943
8 38 44 973,000 18 1962
9 35 59 372,000 11 1947
10 21 76 70,000 8 1930
11 41 50 210,000 10 1956
12 10 82 65,000 5 1924
13 26 79 1,150,000 7 1927
14 38 69 15,000 10 1937
15 29 76 36,000 9 1930
16 36 73 72,000 15 1933
17 35 68 221,000 11 1938
18 15 71 14,000 8 1935
19 23 71 115,000 9 1935
20 29 75 28,000 8 1931
21 45 63 550,000 16 1943
22 23 79 79,000 10 1927
23 11 75 35,000 14 1931
24 15 67 110,000 8 1939
25 33 67 270,000 12 1939
26 16 54 250,000 11 1952
27 25 41 285,000 9 1965
28 16 75 120,000 13 1931
29 29 61 210,000 13 1945
30 42 56 560,000 11 1950
31 19 79 650,000 8 1927
32 27 65 130,000 11 1941
33 36 67 945,000 19 1939
34 34 57 272,000 10 1949
35 23 75 50,000 8 1931

## a) Determine the correlations of these variables with each other.

b) Let’s say that a person has a score on the measure of Social Support of 20 and they have 50,000 in
financial assets. What would you get if you plug these numbers into the regression equation?
R = 9.42