You are on page 1of 6

Statistical Inference - Assignment 1

Instructor: Dr. Saqib-ur Rehman

Instructions:

1. All questions carry equal marks and to be done by means of Minitab.


2. All Excel file containing data sets is available on link below:
https://docs.google.com/spreadsheets/d/12e9ZKHQp39NptOuayqTIHmm0QU4CdgOo/
edit?usp=sharing&ouid=117706441792514565728&rtpof=true&sd=true
3. Reference of worksheet is given in each question, which uses a dataset from datafile.

Question 1

The study, conducted in Azezo, North West Ethiopia, found that malnutrition is prevalent in
primary and secondary school children because of economic poverty. The weights, in kilograms
(kg), of 60 randomly selected male Ethiopian-born school children of ages 12–15 years are
presented on worksheet “WEIGHT” of the datafile. Use Minitab to answer following questions:

a. Obtain a normal probability plot, boxplot, histogram, and stem-and-leaf diagram of the data.

b. Based on your results from part (a), can you reasonably apply the confidence interval
procedure to the data? Explain your reasoning.

c. Find and interpret a 95% confidence interval for the mean weight of all male Ethiopian-born
school children of ages 12–15 years. Assume that the population standard deviation is 4.5 kg.

Question 2

The U.S. Department of Energy collects fuel-economy information on new motor vehicles and
publishes its findings in Fuel Economy Guide. The data included are the result of vehicle testing
done at the Environmental Protection Agency’s National Vehicle and Fuel Emissions Laboratory
in Ann Arbor, Michigan, and by vehicle manufacturers themselves with oversight by the
Environmental Protection Agency. We provide the highway mileages, in miles per gallon (mpg),
for one year’s cars. Refer to worksheet “Mileage”.

a. Obtain a random sample of 35 of the mileages.

b. Use your data from part (b) and the t-interval procedure to find a 95% confidence interval for
the mean highway gas mileage of all cars of the year in question.

Page 1 of 6
c. Does the mean highway gas mileage of all cars of the year in question lie in the confidence
interval that you found in part (c)? Would it necessarily have to? Explain your answers.

Question 3

J. Spletzer et al. examined gross job gains and losses as a percentage of the average of previous
and current employment figures. A simple random sample of 20 quarters provided the net
percentage gains (losses are negative gains) for jobs are available on worksheet “Gain”.

a. Decide whether, on average, the net percentage gain for jobs exceeds 0.2. Assume a population
standard deviation of 0.42. Apply the one-mean z-test with a 5% significance level.

b. Obtain a normal probability plot, boxplot, histogram, and stem-and-leaf diagram of the data.

c. Remove the outliers (if any) from the data and then repeat part (a).

d. Comment on the advisability of using the z-test here.

Question 4

L. Petersen et al. evaluated the effects of integrated treatment for patients with a first episode of
psychotic. Study was designed to measure client satisfaction for both the integrated treatment
and a standard treatment. The data on the results of the client questionnaire is available in
“Illness” worksheet of datafile attached.

a. Obtain normal probability plots, boxplots, and the standard deviations for the two samples.

b. Based on your results from part (a), which would you be inclined to use to compare the
population means: a pooled or a non-pooled t-procedure? Explain your answer.

c. Do the data provide sufficient evidence to conclude that, on average, clients preferred the
integrated treatment? Perform the required hypothesis test at the 1% significance level by using
both the pooled t-test and the non-pooled t-test. Compare your results.

d. Find a 98% confidence interval for the difference between mean client satisfaction scores for
the two treatments

Question 5

Eleven tires were each measured for treadwear by two methods, one based on weight and the
other on groove wear. The following are the data, in thousands of miles:

Weight method Groove method Weight method Groove method


30.5 28.7 24.5 16.1
30.9 25.9 20.9 19.9
31.9 23.3 18.9 15.2
30.4 23.1 13.7 11.5
27.3 23.7 11.4 11.2

Page 2 of 6
20.4 20.9
At the 5% significance level, do the data provide sufficient evidence to conclude that, on average,
the two measurement methods give different results?

Question 6

A study by researchers at the University of Maryland investigated that whether the mean body
temperature of humans is 98.6◦F. Among other data, the researchers obtained the body
temperatures of 93 healthy humans, as provided on worksheet “Temperature” in datafile.

a. Obtain a normal probability plot, boxplot, histogram, and stem-and-leaf diagram of the data.

b. Based on your results from part (a), can you reasonably apply one-standard-deviation χ2-
procedures to the data? Explain your reasoning.

c. In Exercise 9.81, you were asked to use these data to decide whether mean body temperature
of healthy humans differs from 98.6◦F. There, you were to assume that the population standard
deviation of body temperatures for healthy humans is 0.63◦F. At the 5% significance level, do the
data provide evidence against that assumption?

d. Find and interpret a 95% confidence interval for the population standard deviation of body
temperatures for healthy humans.

Question 7

Diabetes in Native Americans. Preventable chronic dis eases are increasing rapidly in Native
American populations. Following is a contingency table showing cross classification of
educational attainment and diabetic state for a sample of 1273 Native Americans (HS is high
school).

At the 1% significance level, do the data provide sufficient evidence to conclude that an
association exists between educational level and diabetic state for Native Americans?

Page 3 of 6
Question 8

Consider Blood Pressure data given in sheet “MULT REG” of data file, in which researchers
observed the following data on 20 individuals with high blood pressure:

• blood pressure (y = BP, in mm Hg)


• age ( = Age, in years)
• weight ( = Weight, in kg)
• body surface area ( = BSA, in sq m)
• duration of hypertension ( = Dur, in years)
• basal pulse ( = Pulse, in beats per minute)
• stress index ( = Stress)

Solve the following parts using Minitab:

a. Calculate the investigate the coefficient of age, weight, body surface area, duration of
hypertension, basal pulse and stress index.
b. Calculate the correlation between variables.
c. Estimate and interpret the value of R-square.
d. Predict the blood pressure, when Age is 49, Weight is 93 years, body surface area is 2,
duration of hypertension is 6.43, basal pulse is 69.6 and stress index is 54.
e. Calculate variance inflation factor for each variable. What impact could this have on
regression results?

Question 9

In 2005, the mean retail price of all history books was $78.01. This year’s retail prices for 40
randomly selected history books is given below:

At the 1% significance level, do the data provide sufficient evidence to conclude that this year’s
mean retail price of all history books has increased from the 2005 mean of $78.01? Assume that
the population standard deviation of prices for this year’s history books is $7.61.

Question 10

Page 4 of 6
D. Sternberg et al. published the results of their study in which they examined 25 schizophrenic
patients who had been classified as either psychotic or not psychotic by hospital staff. The activity
of dopamine was measured in each patient by using the enzyme dopamine β-hydroxylase to
assess differences in dopamine activity between the two groups. The following are the data, in
nanomoles per milliliter-hour per milligram.

At the 1% significance level, do the data suggest that dopamine activity is higher, on average, in
psychotic patients?

Question 11

Prior to the 2008 election, the Quinnipiac University Poll asked a sample of U.S. residents, “If
Barack Obama is elected President, do you think the economy will get better, get worse or stay
about the same?” This problem is based on the results of that poll. Independent simple random
samples of 500 residents each in red (predominantly Republican), blue (predominantly
Democratic), and purple (mixed) states responded to the aforementioned question as follows

At the 5% significance level, do the data provide sufficient evidence to conclude that the residents
of the red, blue, and purple states are nonhomogeneous with respect to their view?

Question 12

McKinney investigated the relationship of various factors on the number of exotic plants in each
state. On the worksheet “Plant” in datafile, you will find the data on population (in millions), area
(in thousands of square miles), and number of exotic plants for each state. Use the technology of
your choice to determine the linear correlation coefficient between each of the following:

Page 5 of 6
a. population and area

b. population and number of exotic plants

c. area and number of exotic plants

d. Interpret and explain the results you got in parts (a)–(c).

Question 13

From the Statistical Abstract of the United States, we obtained data on percentage of gross
domestic product (GDP) spent on health care and life expectancy, in years, for selected countries.
Those data are provided on the worksheet “Health”. Do the required parts separately for each
gender.

a. Obtain a scatterplot for the data.

b. Decide whether finding a regression line for the data is reasonable. If so, then also do parts
(c)–(f).

c. Determine and interpret the regression equation for the data.

d. Identify potential outliers and influential observations.

e. In case a potential outlier is present, remove it and discuss the effect.

f. In case a potential influential observation is present, remove it and discuss the effect.

Page 6 of 6

You might also like