You are on page 1of 251

©  2017 University of South Africa

All rights reserved

Printed and published by the


University of South Africa
Muckleneuk, Pretoria

STA1610/WB01/2018–2020

70690561

Layout done by the Department

Florida
Contents

1 Data and Statistics 1

1.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Population and Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Descriptive statistics: Tabular and Graphical Presentations 23

2.1 Visualizing Numerical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Descriptive Statistics: Numerical Measures 33

3.1 The Central Tendency or Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 The Dispersion or Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 Quartile and Coefficient of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Introduction to Probability 61

5 Discrete Probability Distribution: Binomial and Poisson 81

5.1 General Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.2 Discrete Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2.1 Binomial Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2.2 Poisson Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

iii STA1610/1
6 Continuous Probability Distribution 113

6.1 Normal Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7 Sampling Distribution of the Mean and Proportions 137

7.1 Sampling Distribution of the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7.2 Sampling Distribution of the Proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

8 Point Estimations and Confidence Intervals 161

8.1 Point Estimations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

8.2 Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

8.2.1 Confidence Interval of the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

8.2.2 Confidence Interval of the Proportion . . . . . . . . . . . . . . . . . . . . . . . . . 180

9 Hypothesis Testing of the Mean and Proportions 187

9.1 Hypothesis Testing of the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

9.2 Hypothesis Testing of the Proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

10 Chi–squared Test 209

11 Simple linear regression 229

iv
Preface

Welcome to the exciting workbook of Statistics.

I have written this workbook to make Statistics accessible to everyone, including those with a limited math-
ematics background. Statistics affects all aspects of our lives and its applications are so numerous that, in a
sense, we are limited only by our own imagination in discovering new uses for Statistics. The workbook of
statistics continues to emphasize some important concepts of Statistics.

The applied nature of the Statistics discipline is reinforced by showing and teaching students how to choose a
correct statistical procedure or formulae with a clear understanding of a such particular concept. Fulfilling this
objective requires several features built into this workbook that include a driven-scope of Statistical concepts
and a large number of activities.

In the workbook, calculators are very good at providing numerical results of statistical processes. One reason
for using calculator is for students to be able to understand the technique and concepts by doing calculations
by hand. Students have to be aware that all assignments and examinations in this module will be done by
hand with the support of formulaes and statistics tables. The approach adopted in the workbook is to divide
the solution of statistical problems into two stages and include them in every appropriate activity:

1. Identify a technique and

2. Calculate the Statistics. The calculation stage is completed in a manual way.

It is important that you do the exercises on your own before looking at my solutions. Even if you cannot
do an exercise, you should at least read through it and try to do it. Each exercise is designed to test your
understanding of the work immediately preceding it.

The scope helps students determine whether or not a statistical method or a particular formulae is appropriate.
To enrich students’ learning experience, each topic was chosen for its relatively straightforward presentations
and useful applications. Some of the other topics in expanding students knowledge for a quick overviews
include steam-and-leaf plots, statistical tables and graphs.

I hope that you enjoy studying this module as much as I have enjoyed compiling the workbook. In closing
I invite you to help me improve my presentation of this module. You can do this by bringing any errors,
obscurities, comments, suggestions or misprints to the attention of your lecturer.

Your Lecturer

Mr B J Kanyama

v STA1610/1
vi
Chapter 1

Data and Statistics

1.1 Variables

SUMMARY

 Variables can be denoted using symbols.

 Qualitative variables or categorical variables are used to measure qualitative, descriptive or categorical
characteristics of subjects. The value of qualitative variable cannot be described meaningfully using
numbers.

 Quantitative variables are used for quantitative characteristics of subjects, It is a variable that can be
expressed numerically.

 Qualitative - nominal variable: Data values represent descriptions or classifications. The order of the
values has no logical order.

 Qualitative - ordinal variable is used to show order. The values used as labels cannot be logically
interpreted.

1 STA1610/1
 Quantitative - discrete variable: Data occur as integers or whole numbers.

 Quantitative - continuous variable: Continuous data occur as continuous numbers with any level of
accuracy. The difference between measured values make sense but the data have no natural zero.

 Quantitative - ratio scale data: The data are ordered, having a continuous scale and have a natural or
true zero

Activities

Question 1

The variable “customer satisfaction” using the following four–point scale:

1  ‘not satisfied’, 2  ‘slightly satisfied’, 3  ‘satisfied’ and 4  ‘very satisfied’ represents

1. discrete variable

2. nominal variable

3. ordinal variable

4. continuous variable

5. discrete and ordinal variable

Solution

Variable "customer satisfaction" (1  ’not satisfied’, 2  ’slightly satisfied’, 3  ’satisfied’ and 4  ’very
satisfied’). This statement represents a qualitative - ordinal variable as the values 1, 2, 3 and 4 are used to
show the order but cannot be interpreted. We can even remove them to see the data values in the form of
ordered list.

Option (3)

2
Question 2

A survey of readers asked respondents to complete the following:

A. Gender

B. Marital status

C. Number of magazine subscriptions

D. Annual income

E. Rate the lecturer (very effective, effective, not too effective, not at all effective).

Determine the type of variable from each items given above, which one represents the discrete variable?

1. C and D

2. A, B and E

3. C and E

4. Only D

5. Only C

Solution

A. Gender (male and female): represents a qualitative - nominal variable.

B. Marital status (single, married, divorced, widow): represents a qualitative - nominal variable.

C. Number of magazine subscriptions (can be 1, or 2, or 4 and so on): represents a quantitative - discrete


variable.

D. Annual income (can be R200 00000 or R600 00000, and so on ): represents a quantitative - continuous
variable.

E. Rate the lecturer (very effective, effective, not too effective, not at all effective): represents a qualitative
- ordinal variable.

Option (5)

3 STA1610/1
Question 3

The variable “country” with possible values encoded as

1  ‘South Africa’, 2  ‘USA’, 3  ‘UK’ and 4  ‘Zimbabwe’ represents

1. nominal variable

2. ordinal variable

3. discrete variable

4. continuous variable

5. both ordinal and discrete variable

Solution

The variable country (1  ’South Africa’, 2  ’USA’, 3  ’UK’, 4  ’Zimbabwe’) represents a qualitative -
nominal variable.

Option (1)

Question 4

Which one of the following statements is incorrect?

1. The average marks for STA1610, the values could be 75%, 748% and 7489% is a continuous variable.

2. The number of children in a family is a discrete variable.

3. The median is sensitive to outlier.

4. The mean makes use of all observations.

5. In a data set we can have more than one mode.

4
Solution

1. Correct
The average marks for STA1610 are 075; 0748; and 07489

2. Correct
The number of children in the family can be for example 1, 10 or 15 or 25 and so on: Integers numbers.

3. Incorrect
The median is the middle value while the outlier is an extreme value from the dataset.

4. Correct
The mean is the sum of values in the dataset divide by the total number.

5. Correct
When we have one mode we say unimodal, when we have two mode we say bimodal and so on.

Option (3)

Question 5

Which one of the following statements is incorrect?

1. Nominal data are also called qualitative data.

2. Quantitative data in Statistics refers to continuous data alone.

3. Qualitative variables indicate that a person or object belongs in a category.

4. The method of payment is a qualitative variable.

5. The amount of money a person withdraws from an ATM today is a discrete variable.

5 STA1610/1
Solution

1. Correct

2. Incorrect

Quantitative data refers to discrete and continuous variable.

3. Correct

4. Correct

Method of payment can be cash, cheque, electronic etc.

5. Correct

The withdrawal can only be referred to integers number representing the number of money notes with-
drawal at a time.

Option (2)

Question 6

Consider the following statements refers to a

A. whether or not you own a Panasonic television set.

B. your status as either a full time or a part time student.

C. the number of pupils who attended your primary school.

D. an weight of the unfilled container.

E. condition, either poor, fair, good or excellent.

6
Which one of the following statements is quantitative variable?

1. Only B

2. B and D

3. B , C and D

4. C and D

5. A, B and E

Solution

Given the following statements

A. A Panasonic television set (own or not own): represents a qualitative - nominal variable.

B. The status of the students (full time or part time): represents a qualitative - nominal variable.

C. The number of pupils who attended your primary school: represents a quantitative - discrete variable

D. The variable weight: represents a quantitative - continuous variable.

E. Variable condition (poor, fair, good and excellent): represents a qualitative - ordinal variable.

Option (4)

7 STA1610/1
Question 7

Which one of the following statements is incorrect?

1. Stress level, age, gender, religion are examples of variables.

2. A person’s nationality (Mexican, Ethiopian, Australian) represents a nominal variable.

3. The number of times a mouse makes a wrong turn in a laboratory represents a continuous variable.

4. The position one finishes in a race is a qualitative variable.

5. The ethnic group to which a person belongs represents a categorical variable.

Solution

1. Correct

2. Correct

3. Incorrect

The statement about the number of a mouse makes a wrong turn represents a discrete variable.

4. Correct

The position one finishes in a race can be first, 2nd, 3rd, ... as this represents a qualitative - ordinal
variable.

5. Correct

Option (3)

8
Question 8

Which one of the following statements is incorrect?

1. The number of minutes it takes to read a page is a continuous variable.

2. The number of problems in the text is a discrete variable.

3. The place of residence (Pretoria, Johannesburg, Danville) is a nominal variable.

4. Annual income represents a quantitative variable.

5. The type of residence (single or multiple family home) is an ordinal variable.

Solution

1. Correct

2. Correct

3. Correct

4. Correct

5. Incorrect

This is a qualitative - nominal variable

Option (5)

9 STA1610/1
Question 9

Residents were asked a series of questions:

A. On what floor is your flat?

B. Do you own or rent your accommodation?

C. How large is your apartment (in square meters)?

D. How much did you spend?

E. Rate the availability of parking space as excellent, good, fair, poor or very poor.

Which one of the following statements is a quantitative variable?

1. A and C

2. C and D

3. Only D

4. A , C and D

5. B and E

Solution

A. This question refers to a quantitative - discrete variable.

B. This question refers to a qualitative - nominal variable.

C. This question represents a quantitative - continuous variable.

D. This question represents a quantitative - continuous variable.

E. The variable "rate the availability of parking" (excellent, good, fair or poor): represents a qualitative -
ordinal variable.

Option (4)

10
Question 10

The question “what is your marital status?” had the following responses

A. Married

B. Widowed

C. Divorced

D. Separated

E. Never married

The best graphical techniques to summarize the data is:

1. Bar chart

2. Pie chart

3. Histogram

4. Stem–and–leaf

5. Bar and Pie chart

Solution

The variable "response to the question about marital status" (Married, widowed, divorced, separated, never
married): represents a categorical variable.

There are two graphical techniques to summarize the data with the qualitative variable: Bar and Pie chart.
The best between the two is the bar chart.

Option (1)

11 STA1610/1
Question 11

Consider the following statements:

A. Score in a soccer match.

B. Temperature in degrees Celsius.

C. Number of defective machine part.

D. Age to the nearest year of children in a classroom.

E. Number of modules needed for a degree.

Which one of the following variables is / are discrete?

1. A and B

2. B , C and D

3. C and E

4. A , C , D and E

5. A , B , C , D and E

Solution

Each statement is defined as indicated below

A. Quantitative -discrete variable

B. Quantitative - continuous variable

C. Quantitative - discrete variable

D. Quantitative - discrete variable

E. Quantitative - discrete variable

Option (4)

12
Question 12

Which one of the following statements is incorrect?

1. The number of ear pierces a person has, represents a discrete variable.

2. The opinion about legalization of marijuana is a qualitative variable.

3. The number of oil cans sold at a given petrol station is a continuous variable.

4. Height of a customer at a boutique is a quantitative variable.

5. The daily number of students going to school is discrete variable.

Solution

1. Correct

2. Correct

An opinion can be good or bad, alternatively an opinion can be valid or invalid.

3. Incorrect

The number of cans sold gives us a countable characteristics of data therefore we can only obtain
integers. This is a discrete variable.

4. Correct

Height can be measured therefore height represents a quantitative - continuous variable.

5. Correct

Option (3)

13 STA1610/1
Question 13

Before leaving a particular restaurant, customers are asked to respond to the questions listed below.

A. What is the approximate distance of the restaurant from your residence?

B. Have you eaten at the restaurant previously?

C. If your answer to part (B) was yes, on how many occasions?

D. Which of the following attributes of the restaurant do you find most attractive: service, prices, quality
of food, or varied menu?

For each question, determine which possible responses are qualitative:

1. Only B

2. B and C

3. B and D

4. A and C

5. B C and D

Solution

The question has a characteristic of data that is

A. Quantitative - continuous variable, the variable is ’distance’ that can be measured.

B. Qualitative - nominal variable, the variable is ’have you eaten previously’ (the response is yes or no).

C. Quantitative - discrete variable.

D. Qualitative - nominal variable, the variable is ’attribute of the restaurant do you find more attrative’
(service, prices, quality of food, or varied menu).

Option (3)

14
Question 14

The following information is collected from an application form for a car-loan to a certain bank

A. Marital status

B. Total of monthly expenditures in rands

C. Number of jobs in the past ten years

D. Gender of the applicant

E. Street address of the applicant

Which of the above variables are quantitative?

1. A and B

2. B and C

3. C and D

4. A, B and E

5. B and E

Solution

A. Qualitative - nominal

B. Quantitative - continuous

C. Quantitative - discrete

D. Qualitative - nominal

E. Qualitative nominal

Option (2)

15 STA1610/1
Question 15

Which one of the following statements is incorrect?

1. Academic rank is an ordinal variable.

2. Discrete variable arises from counting process.

3. Internet provider is a nominal variable.

4. Number of magazines subscribed to is a discrete variable.

5. Your status on whether you own a computer is a continuous variable.

Solution

1. Correct

2. Correct

3. Correct

4. Correct

5. Incorrect

Your status on whether you own a computer is a qualitative - nominal variable.

Option (5)

16
Question 16

Consider the following questions:

A. What is your occupations?

B. What is your income?

C. What degree did you do?

D. What is the amount of your student loan?

E. How would you rate the quality of instruction? (excellent, very good, good, fair, poor).

Identify the type of variable for each question, which one represents the quantitative variable?

1. Only B

2. Only D

3. A, C and E

4. B and D

5. B, C, and D

A. Qualitative - nominal variable.

B. Quantitative - continuous variable.

C. Qualitative - ordinal variable.

D. Quantitative - continuous variable.

E. Qualitative - ordinal variable.

Option (3)

17 STA1610/1
1.2 Population and Sample

SUMMARY

 A sample is a portion (subset) of the population.

 A population is a complete set of objects that are considered in the study.

 A population size is the number of items in the population denoted by N 

 A sample size is the number of items in a sample denoted by n

 A population parameter : Summarising measure of a specific aspect of an entire population, it is repre-


sented by the population mean  and the population variance  2 

 A sample statistic : Summarising measure of a specific aspect of a sample, it is represented by the


sample mean X and the sample variance S 2

Activities

Question 1

A descriptive measure of a population is called a

1. parameter

2. statistic

3. mean 

4. mean X

5. size N

18
Solution

Option (1)

Question 2

Consider the following statistical measurements:

A. The mean

B. The median

C. The mode

D. The range

E The standard deviation

The less commonly used measure of central tendency of a distribution of scores is

1. Only A

2. C and D

3. A, B and D

4. C, D and E

5. B and C

Solution

Option (5)

19 STA1610/1
Question 3

Which one of the following statements is incorrect?

1. One form of descriptive statistics uses graphical techniques.

2. One form of descriptive statistics uses numerical techniques.

3. In the language of statistics, population refers to a group of people.

4. Statistical inference is used to draw conclusions or inferences about characteristic of populations based
on sample data.

5. A summary measure that is computed from a sample to describe a characteristic of the population is
called a statistic.

Solution

1. Correct

2. Correct

3. Incorrect

A sample refers to a group of people.

4. Correct

5. Correct

Option (3)

20
Question 4

Which of the statements is/are incorrect?

1. A variable is a characteristic of an item or individual being measured.

2. Population is a portion of a sample selected for analysis.

3. In a pie chart, the size of segments varies according to the percentage in each category.

4. The difference between the histogram and bar chart is that the bars of the histogram are jointed together
whereas those of a bar graph are not.

5. Stem-and-leaf display reveals more information than the frequency distribution.

Solution

1. Correct

2. Incorrect
Population is the entire or complete set of objects.

3. Correct

4. Correct

5. Correct

Option (2)

21 STA1610/1
22
Chapter 2

Descriptive statistics: Tabular and


Graphical Presentations

2.1 Visualizing Numerical data

SUMMARY

 Visualising data involves using various tables and charts to help draw conclusions about data. The
tables and charts depend on the type of data we have.

 Stem-and-leaf allows us to see how a data set are distributed and where the concentration of data exist.

The values in stem-and-leaf display are presented in ascending order.

 Histogram is a way of summarising data that are measured on continuous or discrete scale variable.

 Frequency polygon is a graph that is made by joining the centre of the top of the columns of a frequency.

 Bar and pie charts are used of summarising data that are measured on categorical scale variable.

23 STA1610/1
Activities

Question 1

Given the following stem–and–leaf display for midterm exam score on information systems:

5 0
7 4 4 6
8 1 9
9 2

Which one of the following statements is incorrect?

1. The sample size is 7.

2. Approximately most of the candidate had distinctions.

3. The median is 76

4. The fifth exam score is 1.

5. An ordered array for exam score is 50 74 74 76 81 89 92

Solution

Given the stem-and-leaf midterm exam score

5 0
6
7 4 4 6
8 1 9
9 2

1. Correct

The sample data are 50 74 74 76 71 89 92 therefore the sample size n  7

2. Correct

The candidates that had distinction can be represented by 74 74 76 81 89 92

24
3. Correct

4. Incorrect

The median is 76

5. Correct

Option (4)

Question 2

Consider the nine e-mail receipts data of two-digit integers given below

11 33 28 32 13 24 28 22 17

The correct stem-and-leaf display is

1. 1 1 3 7
2 8 4 8 2
3 3 2

2. 1 1 3 7
3 3 2
2 8 4 8 2

3. 1 1 3 7
2 2 4 8 8
3 2 3

4. 1 1 3 7
3 2 3
2 2 4 8 8

5. 11 13 17
22 24 28 28
32 33

25 STA1610/1
Solution

The stem-and-leaf display is


1 1 3 7
2 2 4 8 8
3 2 3

Option (3)

Question 3

In the following stem-and-leaf display for a set of two digit integers,

2 0 2 2 7
3 1 1 3 5 9

The correct original set of data is

1. 0 2 2 7 1 1 3 5 9

2. 20 22 22 27 31 31 33 35 39

3. 20 22 22 27 31 31 33 35 39

4. 20 22 27 31 33 35 39

5. 2 22 22 72 13 13 33 53 59

Solution

Given a stem-and-leaf
2 0 2 2 7
3 1 1 3 5 9

The numerical values in a set of two digit integers are 20 22 22 27 31 31 33 35 39

Option (3)

26
Question 4

The ages of a sample of 40 workers are shown below using a stem–and–leaf diagram

2 5 6 6 8 8 9
3 0 1 2 3 3 3 4 5 5 6 6 7 7 8 8 9 9
4 0 0 1 1 1 1 6 6 6 8 9
5 0 0 1 2 3
6 1

Which one of the following statements is incorrect?

1. The smallest age in the sample is 25

2. Most ages in the sample are greater than 40

3. The mode of the distribution is 41

4. The range of the distribution is 36

5. The median is 38

Solution

Given a stem-and-leaf display

2 5 6 6 8 8 9
3 0 1 2 3 3 3 4 5 5 6 6 7 7 8 8 9 9
4 0 0 1 1 1 1 6 6 6 8 9
5 0 0 1 2 3
6 1

The numerical values are


25 26 26 28 28 29 30 31 32 33 33 33 34 35 35
36 36 37 37 38 38 39 39 40 40 41 41 41 41 46
46 46 48 49 50 50 51 52 53 61

1. Correct

2. Incorrect

Because there are 15 values that are more than 40 and 25 values that are less or equal to 40.

27 STA1610/1
3. Correct

The most repeated number.

4. Correct

The range: The largest number - the smallest number  61  25  36

5. Correct

n1 40  1
The median position :   205
2 2
Because the position 205 falls between the 20th and 21st value therefore the median is the average of
38  38
the 20th value and 21st value in increasing order:  38
2

Option (2)

Question 5

The following data gives the marks obtained in a statistics exam as a percentage:

3 5
4 3 8 8 9 9
5 4 5 5 9
6 1 3 6
7 3
9 5

Which one of the following statements is correct?

1. The range is 35

2. The mode is 48 and 55.

3. About 33% of the marks lie between the marks 40 and 50

4. The median is 5.

5. The majority of the students failed the statistics exam.

28
Solution
3 5
4 3 8 8 9 9
5 4 5 5 9
6 1 3 6
7 3
9 5

1. Incorrect

The range  the largest number  the smallest number  95  35  60

2. Incorrect

The mode is 48, 49 and 55

3. Correct

Between 40 and 50, we have five values among 15 values and the five values are 43 48 48 49 and
5
49. These values correspond to the proportion equals to  03333 or 3333%
15

4. Incorrect

The median is 54 as the middle number:

35 43 48 48 49 49 54 55 55 59 61 63 66 73 95

5. Incorrect

The majority of the students have passed the statistics exam as they are nine among the 15 students.

Option (3)

Question 6

The following stem-and-leaf display was constructed for a random sample:


3 0
4 2 6
5 5 6 7
6 0 2 4 6
7 2 7 7
8 3

29 STA1610/1
Which one of the following statements is incorrect?

1. The median for the random sample is 62

2. 50% of the values lie between the values 50 and 70.

3. The mode of the data set is 77

4. The sixth largest value in the random sample is 64

5. The range of the random sample is 53

Solution

Given the stem-and-leaf


3 0
4 2 6
5 5 6 7
6 0 2 4 6
7 2 7 7
8 3

1. Incorrect

The ranked data:

30 42 46 55 56 57 60 62 64 66 72 77 77 83

60  62
the median is the average of 60 and 62   61
2

2. Correct

7
Between 50 and 70 we have 7 values that correspond to the proportion of  05 or 50%
14

3. Correct

The most occurrence value is 77.

30
4. Correct

The sixth value is 64 when we count from 83

5. Correct

The range equals to the largest value minus the smallest value in increasing order: 83  30  53

Option (1)

31 STA1610/1
32
Chapter 3

Descriptive Statistics: Numerical Measures

3.1 The Central Tendency or Location

SUMMARY

 The central tendency or location defines the location of the middle or the centre of a distribution.

 The central tendency allows us to assign a value to what is the most representative value of the group.

 The measures of the central tendency are the mean, median and the mode.


x
 The mean is a measure of 
the average data values for a given dataset: The sample mean X  n
and
the population mean   Nx 

 The median is the value in the middle of an ordered dataset.

 The mode is the most occurring value in a set of discrete data set.

 A distribution is symmetrical when the value of the mean, median and mode are all equal. The distrib-
ution is called normal (bell-shaped).

33 STA1610/1
 A distribution is positively skewed when the value of the mean is more than the value of the median
and the mode. The tail is on the right hand side.

 A distribution is negatively skewed when the value of the mean is smaller than the value of the median
and the value of the mode. The tail is on the left hand side

3.2 The Dispersion or Variability

 Dispersion is a degree to which values are spread out on the number line.

 The range is the difference between the largest and the smallest data values. it shows how widely spread
the data are by considering the distance between the largest and smallest values.

 The variance: Related to the sum of the squared distances of observations from their sample mean.
This sum of squared values measures the dispersion of the observations.

 2 
2
xi X 2 xi  2
 The sample variance S  The population variance   
n1 N


 The standard deviation is the square root value of the variance = ariance

Activities

Question 1

The distribution is symmetrical when

1. the value of the mean, median and mode are all equal.

2. we use the sample to conclude about the population.

3. each observation in a sample is likely to be selected.

4. the value of the mean, median and mode are not equal.

5. we are able to calculate the mean and the standard deviation through the calculator.

34
Solution

Option (1)

Question 2

Consider the data collected on returns on investment as follows:

31 27 26 30 28 31

Which one of the following statements is correct?

1. The range is 5

2. The median is 28

3. The mode is 30.

4. The sample mean X is 28.

5. The distribution is symmetrical.

Solution

Given information below

The data are 31 27 26 30 28 31

The ranked data are 26 27 28 30 31 31

1. Correct

The range : The largest - the smallest value in increasing order: 31  26  5

35 STA1610/1
2. Incorrect

The median : is the middle value 26 27 28 30 31 31

28  30
The average of 28 and 30 which equals to  29
2

3. Incorrect

The mode is 31

4. Incorrect

31  27  26  30  28  31 173
The sample mean equals to   288333
6 6

5. Incorrect

Because the mean  288333, the median  29 and mode  31 are not all equal the distribution is not
symmetrical.

Option (1)

Question 3

Consider a random sample of the daily cost of electricity of five citizens:

23 11 15 26 12

Calculate the standard deviation of the cost of electricity.

1. 174

2. 637

3. 673

4. 051

5. 453

36
Solution

Given the sample data 23 11 15 26 12

 2
2
xi X
The sample variance S 
n1


xi 23  11  15  26  12
The sample mean X    174
n 5

The sample variance


 2
2 xi  X
S 
n1
23  1742  11  1742  15  1742  26  1742  12  1742

51
1812
  453
4


The standard deviation S  453  67305

Option (3)

Question 4

Measures of central tendency include

1. the value of the mean, median and mode, when they have all the same value.

2. the presence of outlier.

3. whether the distribution is symmetrical or skewed.

4. sample size, mean and variance.

5. mean, median and mode.

37 STA1610/1
Solution

The measure of central tendency (or location) is mean, median and mode.

Option (5)

Question 5

Consider the data collected on the mass of six laboratory rats as follows:

27 31 28 30 26 27

Which one of the following statements is correct?

1. The range is zero.

2. The median is 29.

3. The mean is 2017.

4. The mode is 27.

5. Unordered data set is 26 27 27 28 30 31.

Solution

Given the data 27 31 28 30 26 27

The ranked data are 26 27 27 28 30 31

1. The range  31  26  5

27  28
2. The median is the average of 27 and 28 which is  275
2

27  31  28  30  26  27 169
3. The mean is X    338
5 5

38
4. The mode is 27

5. The unordered data set is 27 31 28 30 26 27

Option (4)

Question 6

The examination marks are given below:

24 27 36 48 52 52 53 59


8
Given that the  2  1202875 the sample variance equal to:
xi x
i1

1. 15036

2. 1311

3. 17184

4. 43875

5. 1784

Solution

From the given data

24 27 36 48 52 52 53 59

8  2
If the i1 xi  X  1202875

 2
2
xi X 1202875 1202875
The variance is S     171839
n1 81 7

Option (3)

39 STA1610/1
Question 7

Consider the following statistical measurements:

A. The mean

B. The median

C. The mode

D. The range

E. The standard deviation

The most commonly used measure of central tendency of a distribution scores is

1. Only A

2. C and D

3. A , B and D

4. C, D and E

5. B and C

Solution

The most commonly used measure of central tendency is the mean.

Option (1)

40
Question 8

For a sample of eight employees, the most recent hourly wage increases were 18 5 7 2 10 6 12 15 cents
per hour. The sample variance is

1. 2913

2. 20388

3. 540

4. 9213

5. 2391

Solution

The given the data are 18 5 7 2 10 6 12 15

 2
2
xi X
The sample variance S 
n1

18  5  7  2  10  6  12  15 75
The sample mean X    9375
8 8
18  93752  5  93752  7  93752  2  93752
10  93752  6  93752  12  93752  15  93752
S2 
81
203875

7
 29125

Option (1)

Question 9

Consider the following scores


5 3 2 13 2

The outlier score is:

41 STA1610/1
1. 5

2. 2

3. 2 and 5

4. 2 and 13

5. 13

Solution

The outlier score is 13

Option (5)

Question 10

The mean tells you

1. The middle score if you line the score up from the lowest to the highest.

2. The central tendency of the distribution.

3. The most frequent score in a distribution.

4. One high point that you are sure about.

5. About how spread out the scores are.

Solution

The mean tells us the most frequent score in a distribution

Option (3)

42
Question 11

The most commonly used measure of dispersion of a distribution of scores is

1. The mode

2. The mean

3. The median

4. The standard deviation

5. The range

Solution

Option (4)

Question 12

A social psychologist asked 15 College students how many times they fell in love before they were eleven
years old. The number of times were as follows:

2 0 6 0 3 1 0 4 9 0 5 6 1 0 2

Which one of the following statement is incorrect?

1. The range is 9

2. The median is 2

3. The distribution tails to the right

4. The mode is 1, 2 and 6 since 0 adds nothing

5. The value of the mode is smaller than the value of the median?

43 STA1610/1
Solution

The ranked data 0 0 0 0 0 1 1 2 2 3 4 5 6 6 9

1. Correct

Range: 9  0  9

2. Correct

n1 15  1
Middle value is obtained by   8, the 8th value in ordered array is 2.
2 2

3. Correct

xi 000001122345669 39
The mean X     26
n 15 15
because the mean  26 is greater than the median  2, the distribution is positively skewed on the
right hand side.

4. Incorrect

Zero is also a value counted amongst the values that were given.

5. Correct

The mode is 0

Option (4)

Question 13

The variance tells you

1. about how spread out the scores are.

2. where the mean of the distribution is.

3. about the sum of the scores divided by the number of scores.

4. about the central tendency of the distribution.

5. that the population is known.

44
Solution

Option (1)

Question 14

The following performance scores have been recorded for 10 jobs applicants who have taken a pre-employment
aptitude test at UNISA:

22 19 22 24 21 22 17 20 17 21

Which one of the following statement is incorrect?

1. The sample mean X is 205

2. The median is 215

3. The mode is 22

4. The range is 7

5. The distribution is skewed.

Solution

Given data 22 19 22 24 21 22 17 20 17 21

1. Correct 
xi 22  19  22  24  21  22  17  20  17  21 205
The sample mean X     205
n 10 10

2. Incorrect

The median: The ranked data 17 17 19 20 21 21 22 22 22 24

The median is the average of 21 and 21 that gives 21.

3. Correct

45 STA1610/1
4. Correct

The range  24  17  7

5. Correct

Because the mean, median and mode values are completely different.

Option (2)

Question 15

The following data gives the ages in years of a sample of 8 employees from a government department:

31 43 56 23 49 42 33 61

Which statement is correct?


1. xi  410

2. X  41

3. xi2  15540

4. The standard deviation s  1293

5. The median is 36.

Solution

Consider the given data 31 43 56 23 49 42 33 61


1. xi  31  43  56  23  49  42  33  61  338


xi 31  43  56  23  49  42  33  61 338
2. X     4225
n 8 8

46

3. xi2  312  432  562  232  492  422  332  612  15450

4. The standard variance

31  42252  43  42252  56  42252  23  42252


49  42252  33  42252  61  42252
S2 
81
116995

7
 1670714


The standard deviation S  1670714  129256

5. The ranked data 23 31 33 42 43 49 56 61

The median is the average of 42 and 43 values that corresponds to 425

Option (4)

Question 16

The following data gives the typing speeds (in words per minute) for several stenographers.

125 140 170 155 132 175 225 210 125 310

Which one of the following computations for this sample is incorrect?


10
1. xi  1767
i1


10
2. xi2  1 7672
i1

3. The mean is X  1767

4. The range is equal to 185

5. The mode is 125

47 STA1610/1
Solution

1. Correct

xi  125  140  170  155  132  175  225  210  125  310  1767

2. Incorrect
 2
xi  1252  1402  1702  1552  1322  1752  2252  2102  1252  3102  342649

3. Correct

4. Correct
 125  140  170  155  132  175  225  210  125  310
xi
The sample mean X  n
 
10
1767
 1767
10
5. Correct
The range  the largest value  the smallest value  310  125  185

6. Correct
The mode  125

Option (2)

Question 17

Which one of the following statements is not true about the mean?

1. It is the best measure of central tendency when data is not skewed.

2. In a symmetric distribution, the mean, the median and the mode are all equal.

3. It is not affected by extreme values or outliers.

4. It utilises all values in its calculation.

5. To calculate the mean, sum all values and divide by the count.

Solution

Option (3)

48
Question 18

A summary measure that is computed from a sample to describe a characteristic of the population is called

1. a parameter

2. a population

3. a statistic

4. inferential statistics

5. box–plot

Solution

Option (1)

Question 19

The statistics Department randomly selected 15 students and recorded their marks (in percentage) during the
last examination. The data of the 15 students are as follows:

50 72 63 60 80 79 41 48 23 72 96 34 43 55 51

Which one of the following statements is incorrect?

1. The mean mark is 578

2. The median mark is 55

3. The range mark is 73

4. Approximately 66% of the students pass the exam

5. The mode is 2

49 STA1610/1
Solution

1. Correct 
xi 50  72  63  60  80  79  41  48  23  72  96  34  43  55  51
The mean X  
n 15
867
  578
15
2. Correct
n1
The median: The middle value when the data are arranged. The position of the median is 
2
15  1
 8 this means that the median is the 8th value in the increasing order.
2

23 34 41 43 48 50 51 55 60 63 72 72 79 80 96
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th

The median equals to 55.

3. Correct
The range  the largest value  the smallest value in increasing order
 96  23  73

4. Correct
66% of 15  066  15  99  10 students pass the exam.
Alternatively, we count the number of students who pass the exam and we divide by 15. This gives us
10
15
 06667 as this can be expressed as about 66%.

5. Incorrect
The mode is 72

Option (5)

Question 20

In a survey of office workers in a large city, each worker in a random sample of 10 workers was asked to
report the number of times during the previous month he or she has eaten an evening meal at a restaurant.

The results were


5 0 9 1 9 8 5 5 2 6

The standard deviation of the above data is

50
1. 5

2. 1022

3. 320

4. 283

5. 803

Solution

xi 5091985526 50
The mean X    5
n 10 10

The formulae of the sample variance


 2
xi X
S2 
n1
5  52  0  52  9  52  1  52  9  52
 8  52  5  52  5  52  2  62  6  62

10  1
92
  102222
9

 
The standard deviation S  V ariance  102222  31972

Option (3)

Question 21

Which one of the following sample statistics is a measure of spread ( or dispersion)?

1. Sample mean

2. Sample mode

3. Sample median

4. Sample proportion

5. Sample standard deviation

51 STA1610/1
Solution

Option (5)

Question 22

Which one of the following statements is incorrect?

1. The mean, median and mode are measures of central tendency.

2. When a distribution has more values to the left and tails to the right we say, it is skewed negatively.

3. When there is no difference in the values of the mean, median and mode we say, it is a normal distrib-
ution.

4. When a distribution is bell–shaped with the left half identical to the right half, it is symmetrical.

5. For the following data values: 9 7 8 6 9 10 14 the mean, median and mode are all equal.

Solution

1. Correct

2. Incorrect
The distribution is skewed positively.

3. Correct

4. Correct

5. Correct 
xi 9  7  8  6  9  10  14 63
The sample mean X    9
n 7 7
The median is the middle value. The ranked data are: 6 7 8 9 9 10 14
Therefore the median is 9
The mode is the most repeated value which is 9

Option (2)

52
3.3 Quartile and Coefficient of Variation

 Quartiles is to split a set of data into four equal parts called:

 The first quartile Q 1 , which divides the smallest 25% of the values. The position of the first quartile
n1
Q1 
4

 The second quartile


 Q 2, 
is the median since it represents 50% of the values. The position of the second
n1
quartile Q 2  2
4

 
n1
 The third quartile Q 3  represents 75% of the values. the position of the third quartile Q 3  3
4

 The interquartile range equals to the value of Q 3  the value of Q 1

 The boxplot provides a graphical representation of the data based on the five -number summary: The
smallest value, the value of Q 1 , the value of Q 2 = median, the value of Q 3 and the largest value.

Activities

Question 1

The bounced check fees (in Rand) for a sample of 10 banks are:

26 28 20 21 22 25 18 23 15 30

Which one of the following statement is incorrect?

1. The position of Q 1 is 275.

2. The value of Q 2 is not equal always to the median.

3. The position of Q 3 is 825.

4. To calculate the value of Q 3 , ranked first the values from the smallest to the largest.

5. The interquartile range is 6.

53 STA1610/1
Solution

Given the values

26 28 20 21 22 25 18 23 15 30

The ranked data : 15 18 20 21 22 23 25 26 28 30

Q 1  825 Q 2  55 Q 3  825

n1 10  1 11
1. The position Q 1     275
4 4 4

2. Incorrect, the value of Q 2 is always equal to the median.

The median is the average 22 and 23 equals to 225.

 
10  1
The position of Q 2  2  2  275  55
4

To get the value of Q 2 , because 5.5 falls between the 5th and 6th value than the value of Q 2 is the

22  23 45
average of the 5th value and the 6th value:   225
2 2

   
n1 10  1
3. The position of Q 3  3 3  3  275  825
4 4

4. The value of Q 3 , we have to round the position of Q 3  825 to 8 and consider the eight value equals
to 26

5. The interquartile rangeI Q R equals to Q 3  Q 1  26  20  6

The value of Q 1 , we have to round the position of Q 1  275 to 3 and consider the third value in the
ordered array which equals to 20

Option (2)

54
Question 2

The daily electricity consumption in kilowatt hours (kwh) by a sample of 10 household is

51 50 47 33 37 43 61 55 44 41

Which one of the following statements is incorrect?

1. The value of Q 1  275

2. The value of Q 2  455

3. The position of Q 3  825

4. The value of Q 3  51

5. The interquartile range I Q R  10

Solution

The ranked data: 33 37 41 43 44 47 50 51 55 61

n1 10  1 1
1. The position of Q 1     275
4 4 4

To calculate the value of Q 1 , we have to round the position of Q 1  275 to 3 and consider the third
value which is 41

   
n1 10  1
2. The position of Q 2  2 2  2  275  55
4 4

44  47 91
The value of Q 2 is the average of the 5th and 6th value which gives   455
2 2

   
n1 10  1
3. The position of Q 3  3 3  3  275  825
4 4

55 STA1610/1
4. The value of Q 3 , we have to round the position of Q 3 and consider the 8th value in the ranked data
which is equal to 51

5. The interquartile range I Q R : 51  41  10

Option (1)

Question 3

A sample of 13 students marks are: 6 5 6 6 7 8 9 10 10 10 9 9 6

Which one of the following statements is correct?

1. The value of the first quartile Q 1 is 35

2. The position of the second quartile Q 2 is 8

3. The median is not always equal to Q 2 

4. The value of the third quartile Q 3 is 85

5. The interquartile range is 35

Solution

The given data 6 5 6 6 7 8 9 10 10 10 9 9 6 n  13

The ranked data 5 6 6 6 6 7 8 9 9 9 10 10 10

n1 13  1 14
1. The position of Q 1     35
4 4 4

For the value of Q 1 , we have to calculate the average of third and the fourth value in an ordered array
66 12
which is   6
2 2

56
   
n1 13  1
2. The position of Q 2  2 2  2  35  7
4 4

3. The median is always equals to the value of Q 2 as they are representing the 50% of the values, alterna-
tively you can show with the calculation.

   
n1 13  1
4. The position of Q 3  3 3  3  35  105
4 4

For the value of Q 3 , we have to calculate the average of 10th and the eleventh value in an ordered array

9  10 19
which is   95
2 2

5. The interquartile range I Q R  95  6  35.

Option (5)

Question 4

The following sample shows the starting salary for new university graduates (in thousands of rand):

307 288 291 311 301


297 307 300 306 305

Which one of the following statements is correct?

1. The position of Q 1 is 297

2. The value of third quartile is 825

3. The second quartile is equal to the median.

4. The position of Q 3 is 307

5. The value of the interquartile range is 55.

57 STA1610/1
Solution

Given the following data

307 288 291 311 301 297 307 300 306 305

The ranked data 288 291 297 300 301 305 306 307 307 311

n1 10  1 11
1. The position of Q 1     275
4 4 4

   
n1 10  1
2. The position Q 3  3 3  3  275  825
4 4

To calculate the value of Q 3 we have to round the position of Q 3 and consider the 8th value in the
ranked data which is equal to 307.

3. Correct

   
n1 10  1
4. The position of Q 3  3 3  3  275  825
4 4

5. The interquartile range I Q R  307  297  1

To calculate the value of Q 1  we have to round the position of Q 1  275 to 3 and we consider the 3rd
value in increasing order which is 297.

Option (3)

Question 5

The following data give the amount paid in rentals (in hundreds) for a random sample of 14 one–bedroomed
apartments in Arcadia Pretoria.

14 20 28 16 18 17 21 20 17 18 25 15 20 30

Which one of the following statements is incorrect?

58
1. The location of the first quartile is Q 1  375

2. The location of the second quartile is Q 2  75 and the value of Q 2  190

3. The location of the third quartile is Q 3  1125

4. The value of the second quartile represents the value of the median.

5. The value of the interquartile range is 75

Solution

Given the following data

14 20 28 16 18 17 21 20 17 18 25 15 20 30

The ranked data 14 15 16 17 17 18 18 20 20 20 21 25 28 30

n1 14  1 15
1. The position of Q 1     375
4 4 4

   
n1 13  1
2. The position of Q 2  2 2  2  375  75
4 4

The value of Q 2  we have to calculate the average of 7th and the eight value in an ordered array which

18  20 38
is   19
2 2

   
n1 14  1
3. The position of Q 3  3 3  3  375  1125
4 4

4. Correct

5. The interquartile range I Q R  Q 3  Q 1  21  17  4.

Option (5)

59 STA1610/1
Question 6

A statistics practitioner gives the following output:

The sample size n  105, the value of Q 1  2100, the value of Q 3  2400, the median  22000 and the
sample mean X  22238.

The distribution of the data set is

1. symmetric

2. positively skewed

3. asymptotic

4. negatively skewed

5. normal

Solution

Because the value of the sample mean X  22238 is greater than the value of the median  22000, the
distribution is positively skewed.

Option (2)

60
Chapter 4

Introduction to Probability

SUMMARY

 Probability is a way of expressing the likely occurrence of a particular event as a number between 0
and 1.

 Probability calculations allows us to quantify uncertainty. That means to allow us to express our uncer-
tainty numerically.

 In order to determine any probability, you first need to obtain data. We can obtain data through an
experimentation, observation or experience.

 Outcome: The potential result of a random experiment, where the exact value is unknown before the
experiment, but known after the experiment has been concluded.

 Sample space: An exhaustive list of all the possible outcomes of an experiment.

 An event is any subset of the collection of outcomes of an experiment. That means an event is a
subset of the sample space. In this module we are interested in determining the probability of an event
occurring.

61 STA1610/1
 The probability of each individual outcome lies between 0 and 1.

 The sum of the probabilities of the outcomes in the sample space equals 1.

 The relative frequence is a probability. This is because it is a ratio of the frequency of occurrence of
each outcome or event X to the total number of times the experiment was repeated n.

X
Relative frequency of an event A to occur 
N

 The compliment rule: PA or AC   1 that means PA   P AC   1 therefore PAC   1  PA

 Two events A and B are said to be mutually exclusive if they do not intersect in any way, that means
PA and B  0

 The additive rule for mutually exclusive: PA or B  PA  PB

 The general additive rule: PA or B  PA  PB  PA and B

 Independent events: Two events are independent if the occurrence of one of the events has no influence
on the occurrence of the other event, this means PA and B  PA  PB

 Conditional probabilities will look how to calculate probabilities of events that are conditional on out-
comes of other experiments.

P B and A
PB A  PA and B  P B and A
PA

Activities

Question 1

The sample space of the toss of a fair die is

S  1 2 3 4 5 6

If the die is balanced each simple event has the same probability. Let event A represents an even number and
let event B represents a number less than or equal to 4

Which one of the following statements is incorrect?

62
1. Event A  2 4 6

2. Event B  1 2 3 4

3. Joint events A and B  2 4

3 4
4. P A  P B 
6 6
2
5. P A and B 
4

Solution

From the given information

The sample space S  1 2 3 4 5 6

A  even numbers from the sample space S, that means A  2 4 6

B  a number less than or equal to 4 from the sample space S, that means B  1 2 3 4

1. Correct

2. Correct

3. Correct

Joint events A and B represents element of intersection, appearing in both A and B  2  4

4. Correct
number of outcome of event of A 3
P A     05
total number in S 6

number of outcome of event of B 4


P B     06667
total number in S 6

5. Incorrect
number of outcome of event of A and B 2
PA and B  
total number in S 6

Option (5)

63 STA1610/1
Question 2

The following table lists the joint probabilities of achieving grades of A and not achieving grades of A in two
MBA courses.

Achieve a grade Does not achieve a grade Total


of A in marketing of A in marketing
Achieve a grade of A 0053 0130 0183
in statistics
Does not achieve a grade 0237 0580 0817
of A in statistics
Total 029 071 10

Which one of the following statements is incorrect?

1. PAchieve a grade of A in marketing  029

2. P Achieve a grade of A in statistics  0183

3. Event does not achieve a grade of A in marketing and does not achieve a grade A in statistics are
mutually exclusive events.

4. Events achieve a grade of A in marketing and achieve a grade of A in statistics are independent events.

5. The probability that a student achieves a grade of A in marketing, given that he or she does not achieve
a grade of A in statistics is 02901

Solution

Given the contingency table above

1. Correct

P Achieve a grade of A in marketing   0053  0237  029

2. Correct

PAchieve a grade of A in Statistics  0053  0130  0183

64
3. Incorrect

Two events are mutually exclusive if the probability of the two events equals to zero i.e. PA and B 
0

PDoes not achieve a grade of A in Statistics and Does not achieve a grade of A in marketing 

0580
 0580
10

Since the result of the probability  0580 which is different than zero, therefore the two events are not
mutually exclusive.

4. Correct

Two events are independent events if PA and B  PA   P B where A represent the first event
and B the second event.

0053
PA and B   0053 PA  029 P B   0183
10

PA  PB  029  0183  005307

Because PA and B  0053 is approximately equal to PA   P B  005307, therefore the two
events are independent.

5. Correct

P A and B
Conditional probability rule: PBA  where A represents the first event and B the
PA

second event under the conditional of outcome of A

0237
P A and B   0237 PA  029
10

P B and A 0237
PB A    02901
PA 0817

Option (3)

65 STA1610/1
Question 3

Consider a sample space from an experiment in which a die is rolled. The sample space is S  1 2 3 4 5 6

Let event A represents the event of rolling an odd number, A  1 3 5; let B represent the event of rolling
a number less than or equal to 4, B  1 2 3 4 and C be the event that a 5 or 6 is rolled, C  5 6.

The probability that A and B both occur when the die is rolled is:

3
1.
6

4
2.
6

2
3.
6

1
4.
6

5
5.
6

Solution

The given information are

The sample space S  1 2 3 4 5 6 A  1 3 5 B  1 2 3 4 C  5 6 event:


A and B  1 3

PA and B?

number of outcome of event of A and B 2


PA and B    03333
total number in S 6

Option (3)

66
Question 4

Given the following table of joint probability:

A1 A2
B1 04 03
B1 02 01

Which one of the following statements is incorrect?

1. PA1   06

2. PB1   07

3. PA1 and B1   04

4. PA1 and A2   1

5. PB1 and B2   0

Given the contingency table of joint probability

A1 A2 Total
B1 04 03 07
B2 02 01 03
Total 06 04 10

1. Correct

PA1   04  02  06

2. Correct

PB1   04  03  07

3. Correct

04
P A1 and B1    04
10

67 STA1610/1
4. Incorrect

P A1 and A2   0

Because the two events are mutually exclusive.

5. Correct

Because the two events are mutually exclusive.

Option (4)

Question 5

The following table represents gas well completions during 1986 in North and South America.

Dry (A) Not dry (B) Total


North America (C) ? 31 45
South America (D) 15 ? ?
29 ? 70

Complete the numbers missing in the above table and calculate PA and C:

1. 14

2. 027

3. 041

4. 064

5. 020

Solution

Given the contingency table


A B Total
C 14 31 45
D 15 10 25
Total 29 41 70

68
number of outcome of event of  A and C 14
PA and C    02
total number 70

Option (5)

Question 6

If PA  04, PB    05 and PA and B  01

Which one of the following statements is incorrect?

1. PB  05

2. PAB  02

3. PA or B  08

4. Events A and B are not mutually exclusive.

5. Events A and B are dependent.

Solution

PA   04 PB C   05

1. Correct

P B  1  P B C   1  05  05

2. Correct
P A and B
PA B  where A represents the first event and B the second event under the
PB
conditional of outcome of A.

PA and B  01

P A and B 01
PA B    02
PB 05

69 STA1610/1
3. Correct

PA or B  PA  PB  PA and B


 04  05  01
 08

4. Incorrect

Because PA and B  01 therefore the two events A and B are not mutually exclusive. Two events
are mutually exclusive when the probability of the two events equals to zero i.e. PA and B  0

5. Correct

Because the two events are dependent when

PA and B  PA   P B


01  04  05
01  02

Option (4)

Question 7

Which one of the following statements is incorrect?

1. The probability refers to a number between 0 and 1 which expresses the change that an event will occur.

2. An experiment is an activity of measurement that results in an outcome.

3. If the event of interest is A, then the probability that A will not occur is the compliment of A.

4. When events A and B are independent then P A and B  P A  P B

5. The sex of the students (males and females) cannot be used as an example of mutually exclusive events.

70
Solution

1. Correct

2. Correct

3. Correct

4. Correct

5. Incorrect

Two events are mutually exclusive when P A and B  0. In this case the probability of having sex
of the students male and sex of students female equals to zero.

Option (5)

Question 8

Suppose that P A  05 and P B  03

Which one of the following statements is incorrect?

1. If events A and B are mutually exclusive, then P A or B  08

2. If events A and B are independent, then P A or B  08

3. If events A and B are mutually exclusive, then P AB  0

4. If events A and B are independent, then P AB  05

5. If P A and B  02 then P A or B  06

71 STA1610/1
Solution

Given information such as: PA   05 P B  03

1. Correct

If A and B are mutually exclusive than PA or B  PA  PB  PA and B
 05  03  0
 08
The two events are mutually exclusive when PA and B  0

2. Incorrect

If A and B are independent than PA or B  PA  PB  PA and B

Events A and B are independent when P A and B  P A  P B


PA and B  PA  PB  05  03  015
PA or B  PA  PB  PA and B
 05  03  015
 065

3. Correct

P A and B 0
PA B   0 Because the two events are mutually exclusive PA and B 
PB 03
0

4. Correct

P A and B PA   P B
PA B    PA  05
PB PB

5. Correct

PA or B  PA  PB  PA and B


 05  03  02
 06

Option (2)

72
Question 9

The following table shows a sample of voters cross-classified according to place of residence and their pref-
erence for two candidates for parliament:

Place of residence
Preferred candidate Urban Suburban
A 100 30
B 60 40

What is the probability that a voter picked at random will be a suburban dweller and will prefer candidate B?

1. 01739

2. 03043

3. 04348

4. 40

5. 07139

Solution

Given the contingency table


Urban Suburban Total
A 100 30 130
B 60 40 100
Total 160 70 230

40
Psuburban and B   01739
230

Option (1)

73 STA1610/1
Question 10

1 3
Two events A and B are independent such that PA  and PB  . What is the value of PA  B?
4 4

1. 075

2. 025

3. 000

4. 10

5. Need the value of PA and B

Solution

1 3
Given that PA  PB 
4 4

Because A and B are independent: P A and B  P A  P B

P A and B PA   P B 1
PA B    PA   025
PB PB 4

Option(2)

Question 11

The physical science degrees conferred by a school between 1992 and 1995 were broken down as follows:

Gender
Major Male (M) Female (F) TOTAL
Physics (P) 25 25 50
Chemistry (C) 60 40 100
Geology (G) 30 20 50
TOTAL 115 85 200

74
Suppose that a person is selected at random from these graduates.

Which one of the following statements is incorrect?

1. P M  0575

2. P MP  05

3. P M and P  0125

4. P C or F  0275

5. Events M and P are not mutually exclusive.

Solution

The contingency table


M F Total
P 25 25 50
C 60 40 100
G 30 20 50
Total 115 85 200

1. Correct
115
PM   0575
200

2. Correct
P M and P
PM  P 
PP

25
PM and P   0125
200

50
PP   025
200

P M and P
PM  P 
PP

0125

025

 05

75 STA1610/1
25
3. PM and P   0125
200

4. Correct

PC or F  PC  PF  PC and F

100 85 40
  
200 200 200

100  85  40

200

145

200

 0725

5. Correct

25
Because PM and P   0125 and not zero
200

Option (4)

Question 12

Given that P A  07 P B  06 and P A and B  035 which one of the
following statements is incorrect?

 
1. P B   04

2. A and B are dependent events.

3. P BA  050

4. P A or B  095

5. Events A and B are mutually exclusive

76
Solution

Given PA  07 PB  06 PA and B  035

1. Correct

PB C   1  PB  1  06  04

2. Correct

Events A and B are dependent when PA and B  PA  PB

PA  PB  07  06  042

PA and B  PA  PB

035  042

P B and A 035
3. PBA    05
PA 07

4. PA or B  PA  PB  PA and B


 07  06  035
 095

5. Incorrect

Events A and B are mutually exclusive when PA and B  0 but because PA and B  035 ,

we can say that events A and B are not mutually exclusive.

Option (5)

Question 13

Assume A and B are independent events with PA  040 and PB  030

Which one of the following statements is incorrect?

77 STA1610/1
1. PA   060

2. P A and B  012

3. PA or B  058

4. P B  A  03

5. Events A and B are mutually exclusive.

Solution

Considering the information given below

Events A and B are independent when PA and B  PA  PB

PA  040 PB  030

1. Correct

PAC   1  PA  1  040  060

2. Correct

PA and B  PA  PB  040  030  012

3. Correct

PA or B  PA  PB  PA and B


 040  030  012
 058

4. Correct

P B and A 012
PB  A    03
PA 04
78
5. Incorrect

Events A and B are independent as PA and B  012 not Zero to satisfy the mutually exclusive
rule.

Option (5)

Question 14

A computer is programmed to generate the eight single-digit integers, 1 2 3 4 5 6 7 and 8 with


equal frequency. Consider the experiment, “the next integer generated”. Define : Event A  “Odd number”

 1 3 5 7, Event B  “Number greater than 4”  5 6 7 8 and Event C  “1 or 2”  1 2, then
PA  B is

1. 05

2. 025

3. 10

4. 075

5. 0125

Solution

From the given information

The sample space S  1 2 3 4 5 6 7 8

A  1 3 5 7 B  5 6 7 8 C  1 2 Joint event A and B 


5 7 as we are looking for elements that appearing in A and B

P A and B
PA  B 
PB
79 STA1610/1
number of outcome of event of A and B 2
PA and B    025
total number S 8

number of outcome of event of B 4


P B    05
total number S 8

P A and B 025
PAB    05
PB 05

Option (1)

80
Chapter 5

Discrete Probability Distribution: Binomial


and Poisson

5.1 General Information

SUMMARY

 Discrete probability distribution means probability distribution when dealing with discrete random
variables.


 The sum of the probabilities of the random variables equals to 1. Mathematically that means pxi  
1


 The mean () or the expected value denoted by EX = xi   pxi 


 The variance  2  xi  2  Pxi 


 The standard deviation   ariance

81 STA1610/1
Activities

Question 1

Which one of the following statements is a valid probability distribution:

1.
x 0 1 2 3
p x 0512 0384 0096 0008

2.
x 0 1 2 3
p x 01 03 04 01

3.
x 0 1 2 3
p x 001 001 001 098

4.
x 0 1 2 3
p x 025 046 004 024

5.
x 0 1 2 3
p x 015 025 05 03

Solution

x 0 1 2 3
Px 0512 0384 0096 0008


pxi   0512  0384  0096  0008  1

Option (1)

Question 2

The number of pizzas delivered to university students each month is a random variable with the following
probability distribution

x 0 1 2 3
p x 01 03 ? 02

82
Which one of the following statements is incorrect?

1. p 2  04

2. P 0  X  2  08

3. P 1  X  3  04

4. P X  1  06

5. The mean   E X  07

Solution

Given the probability distribution

x 0 1 2 3
px 01 03 ? 02

1.

p2  1  01  03  02


 1  06
 04

2. P0  X  2  01  03  04  08

3. P1  X  3  P2  04

4. PX  1  04  02  06

5.

  xi pxi 
 0  01  1  03  2  04  3  02
 0  03  08  06
 17

Option (5)

83 STA1610/1
Question 3

Based on past experience, a researcher knows that the probability distribution for X= the number of students
who come to her office on Wednesdays is given as

x 0 1 2 3 4
px 010 020 050 015 005

Which one of the following statements is correct?

1. PX  2  02

2. PX  1  010

3. EX  185

4. P1  X  4  07

5. P2  X  4  065

Solution

1. Incorrect

PX  2  050  015  005  07

2. Incorrect

PX  1  010  020  030

3. Incorrect


EX    xi  pxi 
 0  010  1  020  2  050  3  015  4  005
 0  020  1  045  02
 185

84
4. Correct
P1  X  4  050  015  005  07

5. Incorrect
P2  X  4  P3  015

Option (4)

Question 4

Suppose that the number of defective welds in a length of pipe has the probability distribution given below.
X, represents the number of defective welds

x 0 1 2 3 4 5 6
p x 060 020 010 005 003 001 001

The mean (expected value) of the number of defective welds is

1. 35

2. 017

3. 078

4. 138

5. 10

Solution

x 0 1 2 3 4 5 6
px 060 020 010 005 003 001 001


The mean   xi  pxi 
 0  060  1  020  2  010  3  005  4  003  5  001  6  001
 0  020  020  015  012  005  006
 078

Option(3)

85 STA1610/1
Question 5

Suppose X represent the number of the students in STA1610. The probability distribution of X is as follows:

x 1 2 3 4 5
p x 025 033 017 015 010

If the mean   252 then the variance of the best students in STA1610 is

1. 16496

2. 16006

3. 30409

4. 17438

5. 12844

Solution

x 1 2 3 4 5
  252
px 025 033 017 015 010

The variance

2  xi  2  pxi 
 1  2522  025  2  2522  033  3  2522  017  4  2522  015  5  2522  010
 05776  008923  003917  032856  061504
 16496

Option (1)

86
Question 6

The probability distribution of a discrete random variable X is shown below, where X represents the number
of cars owned by a family:
x 0 1 2 3
px 025 040 020 015

Which probability is incorrect?

1. PX  1  035

2. PX  2  085

3. P1  X  2  060

4. PX  1  025

5. P0  X  1  065

Solution

x 0 1 2 3
px 025 040 020 015

1. Correct

PX  1  020  015  035

2. Correct

PX  2  025  040  020  085

3. Correct

P1  X  2  040  020  060

87 STA1610/1
4. Correct

PX  1  P0  025

5. Incorrect

P0  X  1  P1  040

Option(5)

Question 7

After analyzing the frequency with which cross-country skiers participate in their sport, a sportswriter created
the following probability distribution for X  number of times per year cross–country skiers ski.

x 0 1 2 3 4 5 6 7 8
p x 004 009 019 021 016 012 008 006 005

Which one of the following statements is correct?

1. P X  3  012

2. P X  5  019

3. P 5  X  7  014

4. P X  3  032

5. The mean   364

Solution

Given the probability distribution

x 0 1 2 3 4 5 6 7 8
px 004 009 019 021 016 012 008 006 005

88
1. Incorrect

PX  3  021

2. Incorrect

PX  5  012  008  006  005  031

3. Incorrect

P5  X  7  012  008  006  026

4. Incorrect

PX  3  004  009  019  021  053

5. Correct

The mean   X i  PX i   0  004  1  009  2  019  3  021  4  016  5  012 
6  008  7  006  8  005  364

Option (5)

Question 8

The mean length of stay in hospital is useful for planning purposes. Suppose that the following is the distrib-
ution of the length of stay in a hospital after a minor operation:

Days 2 3 4 5 6
Probability 005 020 040 020 ?

The mean  (or expected number Ex) length of stay is

1. 015

2. 017

3. 33

4. 40

5. 42

89 STA1610/1
Solution

The given probability distribution

Days 2 3 4 5 6 Total
Probability 005 020 040 020 015 1

p6  1  005  020  040  020  1  085  015


The mean   xi  pxi   2  005  3  020  4  040  5  020  6  015  420

Option (5)

Question 9

Consider the probability distribution of the number of sales as below:

x 0 1 2 3
p x 0512 0382 ? 0008

Which one of the following statements is incorrect?

1. p2  0098

2. P0  x  2  048

3. P1  x  3  0488

4. PX  0  1

5. EX    00602

90
Solution

1. Correct

p2  1  0512  0382  0008  1  0902  0098

2. Correct

P0  x  2  p1  p2  0382  0098  048

3. Correct

P1  x  3  p2  0098

4. Correct

Px  0  p0  p1  p2  p3  0512  0382  0098  0008  1

5. Incorrect

EX    x  px
 0  0512  1  0382  2  0098  3  0008
 0602

Option (5)

91 STA1610/1
5.2 Discrete Probability Distribution

5.2.1 Binomial Probability Distribution

SUMMARY

 Binomial distribution is a discrete random variable that describes the number of successful outcomes
of n simple independent trials that can either succeed or fail.

 Binomial distribution is associated with questions that allow "yes" or "no" type of answers. This means
we have two outcomes as "success" or "failure".

 The probability of success for each simple trial is the same and is denoted by 

n!
 To calculate the binomial probability we can use either the formula px   x 1  nx
x!n  x!
5!  5  4  3  2  1  120 or we can use the binomial statistics tables enclosed at the end of the
section 2.

 The mean   n  

 The variance  2  n1  


 The standard deviation   ariance

 The question on how identifying binomial random variables?

1. The experiment consists of n identical trials.

2. Each trial results in one of two outcomes, which we can define as either a success or a failure.

3. The outcomes from trial to trial are independent.

4. The probability of success () is the same for each trial. In other hand the probability of failure
is 1  

5. The random variable equals the number of successes in the trials, and can only take on whole
number values between 0 and 1.

92
Activities

Question 1

Given a binomial random variable with n  6 and   020 (Hints: Use the formulae or the binomial Table
1 to calculate the following probabilities).

The incorrect statement is

1. P X  2  02458

2. P X  5  00005

3. P X  1  06553

4. P X  5  00016

5. The mean   12

Solution

The given information are

The probability of success   020

The sample size n  6

1. Correct

PX  2?

93 STA1610/1
Using the formulae: 6!  720 2!  2 4!  24 0202  004 0804  04096 so
6!
p2   0202 1  02062
2!6  2!

720
  004  04096
2  24

 15  004  04096  024576

Using the statistics tables:

n X 0.01 0.02 0.20 X n


2 0
1
2
:
:
6 0 0.2621
1 0.3932
2 0.2458
3 0.0819
4 0.0154
5 0.0015
6 0.0001

2. Incorrect

PX  5?

Because PX  5  00015

3. Correct

PX  1  PX  0  PX  1  02621  03932  06553

4. Correct

PX  5  PX  5  PX  6  00015  00001  00016

5. Correct

The mean   n    6  020  12

Option (2)

94
Question 2

A certain type of tomato seed germinates 90% of the time. A backyard farmer planned 25 seeds. The expected
number (mean) of seeds that germinate is

1. 225

2. 259

3. 0036

4. 2778

5. 225

Solution

From the given information

  090 n  25 ?

The mean   n    25  090  225

Option(5)

Question 3

Suppose that 10% of butterflies have damaged wings. If a random sample of 10 butterflies is selected, what
is the probability that more than four have damaged wings?

1. 00112

2. 00128

3. 00016

4. 09372

5. 09984

95 STA1610/1
Solution

The given information

  010 n  10 PX  4?

There are two ways of calculating this probability:

1.
PX  4  P X  5
 PX  5  PX  6  PX  7  PX  8  PX  9  PX  10
 00015  00001  00000  00000  00000  00000 (binomial tables)
 00016

2.
PX  4  P X  5
 1  PX  4
 1  PX  0  PX  1  PX  2  PX  3  PX  4
 1  03487  03874  01937  00574  00112 (binomial tables)
 1  09984
 00016

Option (3)

Question 4

Using the Binomial distribution, if n  4 and   020 then PX  2 is

1. 01536

2. 09728

3. 01808

4. 00272

5. 08192

96
Solution

From the given information

  020 n4 PX  2?

PX  2  PX  0  PX  1


 04096  04096
 08192

Option (5)

Question 5

In the Limpopo province about 30% of adults have four–year college degrees. Suppose five adults are ran-
domly selected. Calculate the expected value (mean) and the standard deviation of this binomial distribution.

1. 15 and 105

2. 35 and 045

3. 15 and 1025

4. 030 and 12

5. 105 and 125

Solution

From the given information

The mean   030 n5 mean? ?

The mean   n    5  030  15

  
The standard deviation   n1    5  030  1  030  105  10247

Option (3)

97 STA1610/1
Question 6

According to a report from the center for studying health system change, 20% of South Africans delay or go
without medical care because of concerns about cost. Suppose six individuals are randomly selected. The
probability that more than 4 will delay or go without medical care is

1. 00015

2. 09984

3. 00154

4. 00016

5. 00017

Solution

The given information are

  020 n6 PX  4?


PX  4  PX  5  PX  6
 00015  00001
 00016

Option (4)

Question 7

Suppose that an admission test for a certain university is designed so that the probability of passing it is 45%
Find the probability that among 5 candidates who take the test, more than 3 will pass.

1. 02757

2. 01128

3. 01313

4. 00185

5. 00102

98
Solution

The given information are

  045 n5 PX  3?

PX  3  PX  4  PX  5


 01128  00185
 01313

Option (3)

Question 8

The probability that a certain machine will produce a defective item is 025. If a random sample of 6 items is
taken from the output of this machine, what is the probability that there will be at least five defectives items
in the sample?

1. 02373

2. 00044

3. 00046

4. 025

5. 15

Solution

The given information are shown below

  025 n6 PX  3?

PX  5  PX  5  PX  6


 00044  00002
 00046

Option (3)

99 STA1610/1
Question 9

A motor company has purchased steel parts form a supplier for several years and has found that 10% of the
parts must be returned because they are defective. An order of 5 parts is received. What is the probability

that more than three of these parts are defective?

1. 00004

2. 00001

3. 05905

4. 00729

5. 00086

Solution

This is a binomial probability distribution

  010 n5 PX  3?

PX  5  PX  4  PX  5


 00004  00000
 00004

Option (1)

100
5.2.2 Poisson Probability Distribution

SUMMARY

 Poisson distribution is a discrete probability distribution that describes the probability of X events to
occur during a specified interval that could be time, distance, area or volume, if the average occurrence

is known and the events are independent of the specified interval denoted  since the last event occurred.

 To calculate the probability of a Poisson random variable. we can use either the formula or the poisson
statistics tables provided,

e x
The formula is PX  x  
x!
x, represents the number of occurrences of an event.

x! is the factorial of x

 is a positive number that represents the expected number (or the mean ) of occurrences for a given
interval.

e is the symbol of the probability function of the natural logarithm (e  271828).

 The mean and variance for Poisson distribution are closely identical, that why we can say   V arianceX 
meanX

Activities
Question 1

Calculate the probability that three bank robberies occurred in a day where the number of bank robberies that
occur in a large Gauteng city is Poisson distributed with a mean  equals to 18 per day. (Hints: Use the
Poisson tables to calculate the probability).

1. 01653

2. 02975

3. 01607

4. 02768

5. 03329

101 STA1610/1
Solution

The mean   18 PX  3?

Using the formula

27182818  183
PX  3  where 27182818  01653
3!
183  5832 3!  3  2  1  6
01653  5832
PX  3 
6
PX  3  01607

Using the Poisson distribution tables

X 01 02  10


0
1
:
7

X 11 12 13 14 15 16 17 18 19 20
0
1
2
3 00738 00867 00998 01128 01255 01378 01496 01607 01710 01804
4
:
9

Option (3)

Question 2

The number of accidents that occur at a busy intersection is Poisson distributed with a mean of 3.5 per week.
The probability of no accidents in one week is

102
1. 00302

2. 00000

3. 01359

4. 03020

5. 10000

Solution

Considering the information as given below

The mean   35 PX  0

Using the formulae:


27182835  350
PX  0  where 27182835  00302 350  1 0!  1
0!
00302  1

1
 00302

Using the Poisson distribution tables

X 01 02 03 10

X 31 32 33 34 35 36 37 38 39 40
0 00450 00408 00369 00334 00302 00273 00247 00224 00202 00183

Option (1)

103 STA1610/1
Question 3

Poisson distribution was used to model the number of faults in the gearboxes of buses. Suppose the faults
occur at an average of 2.5 per month. The probability that at a least 2 faults are found in a month is

1. 07127

2. 00821

3. 02565

4. 09692

5. 07172

Solution

Given that the mean   25 PX  2?

PX  2  P2  P3  P4  P5  P6  P7  P8  P9  P10  P11  P12
 02565  02138  01336  00668  00278  00099  00031  00009  00002  00000
 07126

104
Using the Poisson distribution tables

X 01 02 03 10

X 21 22 23 24 25 26 27 28 29 30
0 01225 01108 01003 00907 00821 00743 00672 00608 00550 00198
1 02052
2 02565
3 02138
4 01336
5 00668
6 00278
7 00099
8 00031
9 00009
10 00002
11 00000
12 00000

Alternatively we know that the total number of each column equals to 1. In this case we will count from 0 up
to 12. To easily resolve this problem we have to add the probability 0 and 1 and the result we subtract from
1, this means

PX  2  1  PX  1
 1  P0  P1
 1  00821  02052
 1  02873
 07127

Option (1)

105 STA1610/1
Question 4

Assume that X is a Poisson random variable with an average   60

Which one of the following statements is incorrect?

1. P X  0  00025

2. The variance is 245

3. P X  3  0062

4. P X  5  05543

5. P 2  X  5  02824

106
Solution

Using the Poisson statistics tables with a mean   60

X 01 02 03 10

X 51 52 53 54 55 56 57 58 59 60
0 00061 00055 00050 00045 00041 00037 00033 00030 00027 00025
1 00149
2 00446
3 00892
4 01339
5 01606
6 01606
7 01377
8 01033
9 00688
10 00413
11 00225
12 00113
13 00052
14 00022
15 00009
16 00003
17 00001
18 00000

107 STA1610/1
1. Correct
PX  0

2. Incorrect

The variance equals to the mean  6

3. Correct

PX  3  PX  0  PX  1  PX  2


 00025  00149  00446
 00620

4. Correct

There are two-way of doing it, we can add the values from PX  6  01606 up to PX  18 
00000 as indicated by the above tables or we can use the second procedure demonstrated below

PX  5  P X  6
 1  PX  5
 1  PX  0  PX  1  PX  2  PX  3  PX  4  PX  5
 1  00025  00149  00446  00892  01339  01606
 1  04457  05543

5. Correct

P2  X  5  P1  X  4  PX  4  P0


 PX  0  PX  1  PX  2  PX  3  PX  4  P X  0
 00025  00147  00446  00892  01339  00025
 02824

Option(2)

108
Question 5

A random variable Y has the Poisson distribution with parameter 38 The probability PY  5 is then equal
to

1. 01477

2. 08156

3. 01944

4. 08344

5. 01844

Solution

We are given:

The mean   38 PX  5?

PX  5  PX  0  PX  1  PX  2  PX  3  PX  4  PX  5


 00224  00850  01615  02046  01944  01477
 08156

Option (2)

Question 6

Tomatoes for January in Kansas follow a Poisson distribution with an average of 32 per month. The proba-
bility that in the next January Kansas will experience exactly 2 tomatoes is:

1. 07913

2. 04076

3. 02087

4. 01304

5. 02226

109 STA1610/1
Solution

The mean   32 PX  2?

PX  2  02087 using the statistics tables

Option (3)

Question 7

A cashier at Caster’s cafeteria can total an average of 12 trays per minute.

Which one of the following statements is incorrect?

1. The problem follows a Poisson distribution.

2. The variance of this problem is 12 trays per minute.

3. The probability that the cashier will total exactly zero tray per minute is 03012.

4. The probability that the cashier will total at least four trays per minute is 00338.

5. The probability that the cashier will total at most three trays per minute is 08795.

Solution

The mean   12

1. Correct
The occurrence number is given as 1.2 per interval of time

2. Correct
The variance is equal to the mean

3. Correct
PX  0  03012

110
4. Correct

PX  4  1  PX  0  PX  1  PX  2  PX  3


 1  03012  03614  02169  00867
 1  09662  00338

5. Incorrect

PX  3  PX  0  PX  1  PX  2  PX  3


 03012  03614  02169  00867
 09662

Option (5)

Question 8

Poisson distribution was used to model the number of faults that arise in the gearboxes of buses. Suppose the
faults occur at an average rate of 25 per month. The probability that at least one PX  1 fault is found in
a month is

1. 02052

2. 02873

3. 07127

4. 09179

5. 09197

Solution

The mean   25

PX  1  1  PX  0
 1  00821
 09179

Option (4)

111 STA1610/1
112
Chapter 6

Continuous Probability Distribution

6.1 Normal Probability Distribution

SUMMARY

 Continuous probability distribution used a random variable that is continuous variable.

 In this module we explore the concept of normal probability alone without discussing other continuous
probability distribution.

 A normal distribution is symmetric, bell shaped curve and centred at its mean value.

 A normal distribution has two parameters: The mean  and the variance  2 , we denote a normal
random variable by N   2 

 To calculate the probability of a random variable, we have to perform the concept of standardisation.
The standardisation is a way to standardise the values from normal population so that they have a
standard mean and variance. By so doing we remove the unit of measurement of the variable (for
example kilograms, meters or second) to the standardised values that have no unit.

 By standardising a normal random variable X with a mean  and the standard deviation  , we create a
random variable called Z that has a standard normal distribution.

113 STA1610/1
 The standard normal variable is a normal random variable with a mean equal to zero and the standard
deviation equal to 1. We denote the standard normal variable by Z0 1

 The calculation performed to convert a normal population N   2  into a standard normal Z denoted
X 
by Z0 1 we use the formula Z  

 The standard normal variable is symmetric, bell-shape, asymptotic and the total area under a normal
curve equal to 1.

 We use the standard normal Z statistics tables to calculate the probabilities.

Activities

Question 1

Using the standardise normal table Z, calculate PZ  165 where Z is normally distributed with a mean
equals to 0 and the variance equals to 1. The correct answer is

1. 00495

2. 00548

3. 09452

4. 09505

5. 06139

Solution

PZ  165?

Z is normally distributed with the value of the mean equal to Zero and the variance equals to 1.

0.9505

_ 1.65 0 Z

114
Because we shade a big area, we use the positive Z- standardised normal table.

Z 000 001 002 003 004 005 006 007 008 009
00
01
02

16 09452 09463 09474 09484 09495 09505 09515 09525 09535 09545

PZ  165  09505

Option (4)

Question 2

The long–distance calls made by the employees of a company are normally distributed with a mean of 63
minutes and a standard deviation of 22 minutes. Use the normal standardised table to calculate the probability
that a call last less than 7 minutes.

1. 06179

2. 06293

3. 03821

4. 04880

5. 05120

115 STA1610/1
Solution

The population mean   63 The population standard deviation   22 PX  7?

X 
Let us convert the random variable X into Z, using the standardised formulae Z  

 
763
PX  7  P Z   PZ  003
22

0.5120

0 0.03 Z

Because we shade a big area under a normal curve, we use the positive Z-standardised tables

Z 000 001 002 003 004 0.05 006 007 008 009
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
01
02

3.0

PZ  003  05120

Option (5)

116
Question 3

Suppose Z is normally distributed with a mean   0 and the variance  2  1, calculate PZ  159.

1. 09441

2. 00559

3. 00668

4. 01469

5. 00559

Solution

PZ  159?

0.0559

_ 1.59 0 Z

Because we shade a small area under a normal curve, we use the negative Z-standardised tables

Z 000 001 002 0.03 004 0.05 006 007 008 009
- 3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
29
28

:
-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
-0.0

PZ  159  00559

Option (2)

117 STA1610/1
Question 4

X is normally distributed with mean   100 and the standard deviation   20. What is the probability
that X is greater than 145?

1. 09878

2. 00139

3. 09778

4. 09861

5. 00122

Solution

From the given information

The population mean   100 The population standard deviation   20 PX  145?

X 
Let us convert the random variable X into Z, using the standardised formulae Z  

 
145  100
PX  145  P Z   PZ  225
20

0.0122

0 2.25 Z

118
Because we shade a small area under a normal curve, we use the negative Z-standardised tables.

Z 000 001 002 0.03 004 0.05 006 007 008 009
31
29

:
22 00139 00136 00132 00129 00125 00122 00119 00116 00113 00110

PZ  225  00122

Option (5)

Question 5

The distribution that has a mean of zero and a standard deviation of one is called the

1. binomial probability distribution.

2. Poisson probability distribution.

3. standard normal distribution.

4. frequency distribution.

5. skewed distribution.

Solution

Option (3)

119 STA1610/1
Question 6

If the Z-score is given as Z  196, and the distribution of X is normally distributed with a mean   60
and a standard deviation   6, then the x-value that this Z-score corresponds to is

Solution

From the given information

Z  196  6   60 X?

X 
The standardised Z 

X  60
196 
6

X  60
6  196  6  Let multiply both side by 6 to solve the equation
6

1176  X  60

1176  60  X

7176  X

Option (1)

120
Question 7

For a random variable Z from the standard normal distribution with a mean   0 and a standard deviation
  1, which one of the following is incorrect?

1. PZ  152  00643

2. PZ  148  09306

3. PZ  043  06664

4. PZ  074  07704

5. P210  Z  234  00083

Solution

1. Correct
PZ  152  00643

0.0643

_ 1.52 0 Z

Because we have shaded the a area than we use the negative Z-standardised normal table.

2. Correct
PZ  148  09306

0.9306

0 1.48 Z

We use the positive Z-standardised normal tables

121 STA1610/1
3. Correct

PZ  043  06664

0.6664

_ 0.43 Z
0

We have used the positive Z-standardised normal tables

4. Incorrect

PZ  074  07704

0.2296

0 0.74 Z

We have used the negative Z-tables

5. Correct

P210  Z  234  PZ  234  PZ  210


 09904  09821
 00083

122
0.9821

0 2.10 Z
2.34
0.9904

In this case we read the values based on the sign of the number that means a negative Z value from
the negative Z-tables and a positive Z value from a positive Z-tables than we subtract the two values
obtained. We have used the positive Z-tables because the two Z values are positive 210 and 234.

Option (4)

Question 8

For a particular group of scores, the calculated mean and standard deviation are 20 and 5 respectively. The Z
score for a raw score of 30 is

1. 55

2. 2

3. 26

4. 10

5. 2

123 STA1610/1
Solution

  20  5 X  30

X 
The standardised Z 

30  20
Z 2
5

Option (2)

Question 9

A psychologist has been studying eye fatigue using a particular measure, which she administers to students
after they have worked for 1 hour writing on a computer. On this measure she has found that the distribution
follows a normal curve. Using a normal probabilities table, what is the probability of students having Z–score
below 15?

1. 00668

2. 09394

3. 09332

4. 00606

5. 09345

Solution

PZ  15?

0.9332

0 1.5 Z

Because we have shaded a big area, we have to use the positive Z-standardised tables.

Option (3)

124
Question 10

The average high school teacher annual salary is R43 000 Let teacher salary be normally distributed with a
standard deviation of R18 000 Calculate P X  R80 000?

1. 00228

2. 09803

3. 206

4. 09772

5. 00197

Solution

From the given information

The population mean   43000 The population standard deviation   18000 PX  80000?

X 
Let us convert the random variable X into Z, using the standardised Z 

 
80000  43000
PX  80000  P Z   PZ  206
18000

0.0197

0 2.06 Z

Because we shade a small area, we have to use the negative Z-standardised tables.

Option (5)

125 STA1610/1
Question 11

If the area to the right of a positive Z 1 is 00869 then the value of z 1 must be

1. 136

2. 136

3. 05319

4. 008

5. 180

Solution

0.0869

0 Z ?

We were given the value of the area called probability equals to 00869, we have been asked to get the
corresponding value of Z. Because the value of the area is small, we use the negative Z-standardised tables
knowing that a standardised Z normal distribution is symmetric.

Z 000 001 002 0.03 004 0.05 006 007 008 009
31
29

-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0853 0.0838 0.0823
0.0869

The value of Z is 13  006  136.

Option (2)

126
Question 12

Given that Z is a standard normal variable, the variance of Z

1. is always greater than 20

2. is always greater that 10

3. is always equal to 10

4. is always equal to zero

5. Cannot be calculated

Solution

Option (3)

Question 13

Suppose that Z is normally distributed with a mean   0 and the variance equals 1.

Which of the following statements is incorrect?

1. PZ  264  00041

2. PZ  087  01922

3. P14  Z  14  08384

4. PZ  28  09974

5. PZ  074  PZ  074

127 STA1610/1
Solution

1. Correct

0.0041

0 2.64 Z

2. Correct

0.1922

_ 0.87 0 Z

3. Correct

P14  Z  14  PZ  14  PZ  14

 09192  00808

 08384

128
0.0808

_ 1.4 0 1.4 Z
0.9192

4. Incorrect

0.0026

_ 2.8 0 Z

PZ  28  00026

5. Correct

PZ  074  02296

PZ  074  02296

0.2296

_ 0.74 0 Z

129 STA1610/1
0.2296

0 0.74 Z

Option (4)

Question 14

The owner of an appliance store uses a normal distribution with mean 10 and variance 9 to model the weekly
net sales Calculate PX  35 ?

1. 2167

2. 00062

3. 02358

4. 09850

5. 00150

Solution


The population mean   10 The population standard deviation   93 PX  35?

X 
Let us convert the random variable X into Z, using the standardised Z 

130
 
35  10
PX  35  P Z   PZ  217  00150
3

0.0150

_ 2.17 0 Z

Because we shade a big area under a normal curve, we use the positive Z-standardised tables

Z 000 001 002 0.03 004 0.05 0.06 007 008 009
31
29
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
:

Option (5)

Question 15

Consider Z is normally distributed with the mean   0 and the variance  2  1

Which one of the following statements is correct?

1. P Z  160  00548

2. P Z  155  00606

3. P 140  Z  060  06494

131 STA1610/1
4. P Z  209  00019

5. P Z  167  09525

Solution

1. Incorrect

PZ  160  09452

0.9452

_ 1.60 0 Z

2. Correct

PZ  155  00606

0.0606

0 1.55 Z

3. Incorrect

P140  Z  060  PZ  060  PZ  140

 07257  00808

 06449

132
0.0808

_ 1.40 Z
0 0.6
0.7257

4. Incorrect

PZ  209  09817

0.9817

0 2.09 Z

5. Incorrect

PZ  167  00475

0.0475

_ 1.67 Z
0

Option (2)

133 STA1610/1
Question 16

The distribution of weights of a large group of high school students is normally distributed with a mean of
55 kg and a standard deviation of 5 kg. What is the probability of weights of a large group of high school
students will be more than 63 kg?

1. 09452

2. 00458

3. 01446

4. 00548

5. 08554

Solution

The population mean   55 The population standard deviation   5 PX  63?

X 
Let us convert the random variable X into Z, using the standardised Z 

 
63  55
PX  63  P Z   PZ  160  09452
5

0.9452

0 1.60 Z

Option (1)

134
Question 17

A courier service company has found that their delivery time of parcels to clients is normally distributed with
a mean of 45 minutes and a standard deviation of 8 minutes. The probability that a randomly selected parcel
will take less than 48 minutes to deliver is

1. 0375

2. 032

3. 00648

4. 03520

5. 06480

The population mean   45 The population standard deviation   8 PX  48?

X 
Let us convert the random variable X into Z, using the standardised Z 

 
48  45
PX  48  P Z   PZ  0375  PZ  038  06480
8

0.6480

0 0.38 Z

Option (5)

135 STA1610/1
136
Chapter 7

Sampling Distribution of the Mean and


Proportions

SUMMARY

 We explore the concept of taking a sample from a population, and discuss the distributional properties
of the sample mean and sample proportion calculated from these samples.

 The sample mean is normally distributed when the underlying data are normally distributed.

 The idea behind probability sampling is random selection, and for this reason we typically call these
random samples.

 In this module will not discuss the categories of probability samples such as sample random, systematic,
stratified and cluster.

 Sampling distribution describes how a sample statistic varies when calculated from all samples of size
n drawn from a single population.

137 STA1610/1
7.1 Sampling Distribution of the Mean

SUMMARY

 A sample mean X is an unbiased estimator for the population mean  , because the mean (expected
value) of all sample means of size n selected from the population is equal to the population mean,  .
This is denoted by EX  

 Unbiased means when the mean (expected value) of the sampling distribution of a statistic is equal to
the population parameter, that statistic is said to be unbiased estimator of that parameter, that is, on
average, we say that the statistic estimates the correct value.

 If the variance of the variable of X is  2 , then the variance of the sample mean X , denoted by
2
 2X  .
n


 The standard deviation of the mean X , which we call the standard error of X is denoted by  X   
n

 If a random variable X is normally distributed with the mean  and the variance  2 , this is denoted
by X~N   2 , then the random variable
  is also normally distributed with the mean  and the
X
2 2
variance  this is denoted by X ~N  .
n n

 To calculate the probability under a random variable X , we have to convert X into the standardised
X 
Z-value given by the formulae Z    

n

X
 The test statistic is Z    

n

138
Activities

Question 1

A sample of n  16 observations is drawn from a normal population with a mean   1000 and a standard
 
deviation   200 Use the standardised normal Z tables to calculate the probability P X  1050

1. 08413

2. 01587

3. 05398

4. 08438

5. 04602

Solution

The given information are

The sample size n  16 The population mean   1000 The population standard deviation   200

PX  1050?

Step 1

Because we are given a random variable X we need to convert X into Z by using the formulae of the test
X 
statistic Z    as it is developed below

n

X  1050  1000 50
If X  1050what is the corresponding of Z? Z       1
 200 50
n 
16
139 STA1610/1
Step 2

PX  1050? can be written as PZ  1? and this takes us back to the concept of chapter 6. To solve this
probability we draw a normal graph as shown below

0.1587

0 1.00 Z

Because we shade a small area under a normal curve, we use the negative Z-table to calculate the probability.

PZ  100  01587

Option (2)

Question 2

A normally distributed population has a mean of 40 and a standard deviation 12. The standard error of the
sample mean if the sample size is 100 equal to:

1. 4

2. 28

3. 12

4. 02

5. 12

140
Solution

The population mean   40

The population standard deviation   12

The sample size n  100

 12
The standard error  X      12
n 100

Option (3)

Question 3

For a random variable that is normally distributed, with a population mean   80 and a population standard
deviation   10, the probability that a simple random sample of 25 items will have a mean that is less than
85 is

1. 09798

2. 09938

3. 00062

4. 00054

5. 00202

141 STA1610/1
Solution

The given information are as below

 The random variable is normally distributed

 The population mean   80

 The population standard deviation   10

 The sample size n  25

 The question is PX  85?

Step 1

Because we are given a random variable Xwe need to convert X into Z by using the formulae of the test
X
statistic Z    as shown below

n

X 85  80 5
If X  85what is the corresponding of Z? Z      25
 10 2
n 25

Step 2

PX  85? can now be written as PZ  25? and this takes us back to the concept of chapter 6. To solve
this probability we have a normal graph as shown below

0.9938

0 2.5 Z

Because we shade a big area under a normal curve, we use the positive Z-table.

PZ  24  09938

Option (2)

142
Question 4

Which one of the following statements is incorrect?

1. The sampling distribution of the mean will have the mean as the original population from which the
samples were drawn.

2. The standard deviation of the sampling distribution of the mean is also called the standard error.

3. When the population mean   160 the population standard deviation   25 n  64 the standard
error is 3125

4. A confidence interval is an estimate for which there is a specified degree of certainty that the population
parameter will be in the interval.

5. We use the t–distribution for the statistical inference of the population mean when the population
standard deviation is known under the assumption that the population is normally distributed.

Solution

1. Correct

Because the mean (expected value) of all sample means of size n selected from the population is equal
to the population mean,  . This is denoted by EX  

2. Correct

3. Correct

When X  160   25 n  64 The standard error  X ?


 25
 X      3125
n 64

4. Correct

5. Incorrect
We use the t-distribution when the population standard deviation is unknown but the sample standard
deviation is known.

Option (5)

143 STA1610/1
Question 5

For a random variable that is normally distributed, with a population mean   80 and a population standard
deviation   10, the probability that a simple random sample of 25 items will have a mean that is between
79 and 85 is

1. 09938

2. 03085

3. 06853

4. 04997

5. 06835

Solution

We are given

The population mean   80 The population standard deviation   10 The sample size n  25

The question is P79  X  85?

Step 1

Because the random variable X is given, we need to convert X into Z by using the formulae of the test
X
statistic Z    as shown below

n

X 79  80
When X  79what is the corresponding value of Z? Z     1
2
 05
 10
n 25

X 85  80 5
When X  85what is the corresponding value of Z? Z      25
 10 2
n 25

144
Step 2

P79  X  85? can now be written as P05  Z  25? and this takes us back to the concept of
chapter 6. To solve this probability we have to draw a standardised normal graph as shown below

Because the shade is between the two Z-values, the probability is equal to the difference between P(Z  2.5)
and P(Z  -0.5) as we read based on the sign of the Z- value. A negative value to the negative Z and a
positive value to the positive Z normal tables.

P05  Z  25  PZ  25  PZ  05

 09938  03085

 06853

0.6853
0.3085

_ 0.5 0 2.5 Z
0.9938

Option (3)

Question 6

A random sample of size n  12 was selected from a population and the data are as follows:

36 61 47 23 51 82 71 12 71 65 42 50

The sample mean 5092 and the sample standard deviation  2064

The standard error (or the standard deviation of the mean) is equal to

145 STA1610/1
1. 1720

2. 35490

3. 0379

4. 20637

5. 59583

Solution

The data : 36 61 47 23 51 82 71 12 71 65 42 50 n  12


X 36  61  47  23  51  82  71  12  71  65  42  50 611
The sample mean X    
n 12 12
509166


2 XX2
The sample variance S 
n1

36  5091662 61  5091662 47  5091662 23  5091662


51  5091662 82  5091662 71  5091662 12  5091662
71  5091662 65  5091662 42  5091662 50  5091662
S2   4259015
12  1


The sample standard deviation is S  4259015  206374

S 206374
The standard error  X      59575
n 12

Option (5)

146
Question 7

The amount of time it takes to complete a final examination is normally distributed with a mean of 75
minutes and a standard deviation of 8 minutes. If 64 students were randomly sampled, the probability that
the sample mean of the sampled students exceeds 76 minutes is

1. 08413

2. 04602

3. 01587

4. 05398

5. 01578

Solution

The mean   75  8 n  64 PX  76?

 
X 76  75 1
PX  76  PZ      P Z  PZ    PZ  1  01587
  864  1
n

0.1587

0 1.00 Z

Option (5)

147 STA1610/1
Question 8

Employers in a large manufacturing plant worked an average of 620 hours of overtime last year, with a
standard deviation of 150 hours . For a random sample of 36 employees, the probability that the average

number of overtime hours will be greater than 58 hours is

1. 04452

2. 03936

3. 00548

4. 09452

5. 04364

Solution

The population mean   620   150 n  36 PX  58?

     
X 58  62 4
PX  58  P  Z      P Z P Z  PZ  16  00548
  1536  25
n

0.0548

0 1.6 Z

Option (2)

148
Question 9

Given a normal population whose mean is 50 and whose standard deviation is 5, find the probability that a
random sample of 25 has a mean greater than 52.

1. 09772

2. 09778

3. 00228

4. 00222

5. 02280

Solution

  50  5 n  25 PX  52?

     
X 52  50 2
PX  52  P  Z      P Z  P Z  PZ  2  00228
  525  1
n

0.0228

0 2.00 Z

Option (3)

149 STA1610/1
Question 10

A random sample of size n  12 was selected from a population and the data are as follows:

36 61 47 23 51 82 71 12 71 65 42 50
mean x  5092 Standard deviation S  2064

The standard error of the sample mean is equal to

1. 1720

2. 35490

3. 0379

4. 20637

5. 59583

Solution

S 2064
The standard error of the sample mean is equal to     59583
n 12

Option (5)

150
7.2 Sampling Distribution of the Proportion

SUMMARY

 The population proportion is denoted by 

 The sample proportion is denoted by p

 The sampling distribution for the sample proportion is approximated using the normal distribution by
making use of the formula  p  

 The standard deviation of the sampling


 distribution (called also standard error of the proportion ) when
 1  
 is known is given by  p  
n

 The standard deviation of the sampling


 distribution (called also standard error of the proportion ) when
p 1  p
 is unknown is given by  p  
n

 The approximation for the sampling distribution of the proportion is valid if:

1. the population of success , is not too close to 0 or 1.

2. the sample size n, is reasonably large.

3. the products n   and n1   are at least 5.

 The stardardised sample mean Z-value is given by the formula commonly called the test statistics
p
Z
 1  
n

 We can now conduct probability calculations for the sample proportion in the same way as for the
sample mean.

151 STA1610/1
Activities

Question 1

In a binomial experiment with the sample size n  300 and the sample proportion p  05 the standard
error for proportion p is

1. 00283

2. 00008

3. 01238

4. 00016

5. 02886

Solution

The sample size n  300 The sample proportion p  05 The standard error for proportion p ?

 
p 1  p 05 1  05
The standard error of the proportion  p    00283
n 300

Option (1)

Question 2

Determine the probability that in a sample of 100 the sample proportion is less than 075 if   080:

1. 125

2. 004

152
3. 01056

4. 01469

5. 08944

Solution

P p  075?   080 n  100

Step 1

Because the random variable p is given, we need to convert p into Z by using the formulae of the test statistic
p
Z as shown below
 1  
n

p 075  080


When p  075what is the corresponding of Z value. Z     
 1   080 1  080
n 100
005
 125
004

Step 2

P p  075? can now be written as PZ  125? and this takes us back to the concept of chapter 6. To
solve this probability we have to draw a standardised normal graph as shown below

0.1056

_ 1.25 0 Z

Because we shade a small area than we use the negative Z-tables.

Option (3)

153 STA1610/1
Question 3

A random sample of size n = 400 was selected from a binomial population with the population proportion
  02. The number of observed successes in the sample is 96.

Which one of the following statements is incorrect?

1. The sample proportion is p  024

2. The standard error is 002

3. The test statistic is 202

4. P p  024  00228

5. This is a case of a sampling distribution for proportion.

Solution

The sample size n  400 The population proportion   02 The number of successes X  96

1. Correct

X 96
The sample proportion p    024
n 400

2. Correct
 
 1   02 1  02 
The standard error  p    00004  002
n 400

3. Incorrect

p 024  02 004


The test statistic is Z   Z  2
 1   002 002
n
154
4. Correct

P p  024?

p
Let us convert p into Z by using the formulae of the test statistic Z   as shown below
 1 
n

p 024  02 004


Z   2
 1   02 1  02 002
n 400

P p  024  PZ  2  00228

5. Correct

Option (3)

Question 4

Consider a population proportion   068 The standard error of the proportion for n  20 is

1. 01043

2. 00109

3. 02176

4. 01404

5. 0034

Solution

  068 n  20 The standard error  p ?

 
 1   068 1  068 
The standard error  p    00109  01044
n 20

Option (1)

155 STA1610/1
Question 5

A simple random sample with n  300 is drawn from a binomial process in which   04. The test statistic
for the proportion of success p  035 is

1. 17668

2. 00283

3. 137843

4. 17668

5. 16786

Solution

n  300   04 p  035 the test statistic Z?

p
The test statistic Z  
 1  
n

p 035  04 005 005


Z       17668
 1   04 1  04 00008 00283
n 300

Option (4)

156
Question 6

In a random sample of 85 people from a population, X is the number of left-handed people. In the population
a proportion p  020 of the people are left-handed. The standard error for proportion equals to:

1. 00019

2. 00434

3. 00024

4. 080

5. 04338

Solution

n  85 The sample proportion p  020 The standard error  p ?

 
p 1  p 020 1  020 
The standard error  p    00019  00436
n 85

Option (2)

Question 7

The probability of success on any trial of a binomial experiment is 25%. Find the probability that the pro-
portion of success in a sample of 500 is less than 22%.

1. 00606

2. 09332

3. 09394

4. 00668

5. 06060

157 STA1610/1
Solution

The population proportion   025

The sample proportion p  022

The sample size n  500

P p  022?

Step 1

Because the random variable p is given, we need to convert p into Z by using the formulae of the test
p
statistic Z   as demonstrated below
 1  
n

p 022  025


When p  022what is the corresponding Z value? Z     
 1   025 1  025
n 500
003
 125
004

Step 2

P p  075? can now be written as PZ  125? and this takes us back to the concept of chapter 6. To
solve this probability we draw a standardised normal graph as indicated below

0.1056

_ 1.25 0 Z

Because we shade a small area than we use the negative Z-tables.

Option (3)

158
Question 8

A random sample of 50 households was selected for a telephone survey. The key question asked was, “Do you
or any member of your household own a cellular telephone with a built–in camera?” Of the 50 respondents,
15 said yes and 35 said no.

The population standard error of households with cellular telephones with built–in camera is

1. 03

2. 00042

3. 00648

4. 07

5. 01684

Solution

The number of successes X  15

The sample size n  50

X 15
The sample proportion p    03
n 50

 
p 1  p 03 1  03 
The standard error S p    00042  00648
n 50

Option (3)

159 STA1610/1
160
Chapter 8

Point Estimations and Confidence Intervals

8.1 Point Estimations

 In many population, we do not know the value of the population mean and proportion. fortunately, we
can use the sample mean (or proportion) to provide an estimate of the population value. We can use
the information provided in a sample. This is called estimation.

 The objective of estimation is to determine the appropriate value of a population parameter on the basis
of the sample statistic.

 To estimate the population value we can proceed by point estimate and interval estimates.

8.2 Confidence Interval

 Confidence interval estimate provide a range of possible values that the true parameter value can assume
along with the degree of confidence that the parameter value lies within the interval.

 We often talk about a 95% or 90% or 99% confidence interval for a parameter value.

161 STA1610/1
 Confidence level is represented by the probability value 1   associated with a confidence interval,
that means the interval contains the specified parameter with probability (1  ).

 In this section we will discuss the confidence interval of the mean and the confidence interval of the
proportion.

8.2.1 Confidence Interval of the Mean

 The data have to be normally distributed.

 When the population standard deviation  is known, the confidence interval of the mean is given by
the formulae

X  Z 2  
n

where Z 2 is called the critical value of the confidence interval and this value can be expressed by the
standardized normal Z-tables.

 When the population standard deviation  is unknown (but given the sample standard deviation S of
the data ), the confidence interval of the mean is determined by the formulae.
  S
X  t n  1 2  
n
 
where t n  1 is called the critical value of the the confidence interval and this value can be
2
expressed by the t-student table. In most cases the sample size n  30

162
Activities

Question 1

A statistics practitioner took a random sample of 50 observations from a population with a standard deviation
of 25 and a sample mean of 100. The 95% confidence interval of population mean is

1. 9307 10693

2. 100 69296

3. 931 10693

4. 9418 10582

5. 3907 10629

Solution

The question talks about the confidence interval of the mean

The sample size n  50

The population standard deviation   25

The sample mean X  100

The confidence interval of the mean is given by the formula below with   005

X  Z 2  
n

25
100  Z 005  
2
50

100  Z 0025  35355

163 STA1610/1
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233

The corresponding value of 0025 equals to Z  19  006  196

100  196  35355

100  69296

100 69296  100  69296

930704  1069296

Option (1)

Question 2

From the information given below, determine the 95% confidence interval of the population mean:

X  200 n  100 S5

1. 18902 2098

2. 109008 20192

3. 199008 200992

4. 200 0992

5. 198 209

164
Solution

The sample mean X  200

The sample standard deviation S  5

The sample size n  100

The confidence interval of the mean is given by the formulae as shown below with   005

  S
X  t n  1  
2 n

 
005 5
200  t 100  1  
2 100

200  t 99 0025  05

The t-student tables

Degrees of freedom 010 005 0025 001 0005


1
:
99 1290 1660 1984 2365 2626

The critical value  table value at 99 degrees of freedom and   005 equals to 1984

200  1984  05

200  0992

200 0992  200  0992

199008  200992

Option (3)

165 STA1610/1
Question 3

A random sample of 25 was drawn from a normal distribution with a population standard deviation of 5. The
sample mean is 80. The 95% confidence interval estimate of the population mean is:

1. 7804 8196

2. 77936 82064

3. 80 196

4. 7840 8159

5. 7804 8196

Solution

The sample size n  25

The population standard deviation   5

The sample mean X  80

The confidence interval of the mean is given by the formulae below with   005


X  Z 2  
n

5
80  Z 005  
2
25

80  Z 0025  1

Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233

166
The corresponding value of 0025 equals to Z  19  006  196

80  196  1

80  196

80 196  80  196

7804  8196

Option (1)

Question 4

It is know that the ages are normally distributed with a mean x  4375 and a sample standard deviation
S  1505 for a random sample of 8 men in a bar. The 95% confidence interval of the mean is:

1. 4375 125841

2. 311659 563341

3. 131659 653344

4. 4375 104291

5. 333209 541791

Solution

The sample mean X  4375

The sample standard deviation S  1505

The sample size n  8

167 STA1610/1
The confidence interval of the mean is given by the formulae as shown below with   005

  S
X  t n  1  
2 n

 
005 1505
4375  t 8  1  
2 8

4375  t 7 0025  53210

The t-student tables

Degrees of freedom 010 005 0025 001 0005


1
:
7 1415 1895 2365 2998 3499

The critical value  table value at 7 degrees of freedom and   005 equals to 2365

4375  2365  53210

4375  125842

4375 125842  4375  125842

311658  563342

Option (2)

Question 5

A random sample of 25 was drawn from a population. The sample mean and the sample standard deviation
are 510 and 125 respectively.

When calculating a confidence interval of the population mean, the t-test should be used because

168
1. the population standard deviation is known.

2. the population mean is known.

3. the sample standard deviation is known.

4. the population standard deviation is unknown.

5. the sample size is small, regardless of the population standard deviation.

Solution

Option (4)

Question 6

A simple random sample of 30 has been collected from a population for which it is known that the population
standard deviation is   100 . The sample mean has been calculated as 240. The 95% confidence interval
for the population mean is

1. 2364215  2435785

2. 240  35785

3. 2362664  2437336

4. 2362451  2437558

5. 2435785  2364251

Solution

The sample size n  30

The population standard deviation   100

169 STA1610/1
The sample mean X  240

The confidence interval of the mean is given by the formulae as shown below with   005


X  Z 2  
n

10
240  Z 005  
2
30

240  Z 0025  18257

Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233

The corresponding value of 0025 equals to Z  19  006  196

240  196  18257

240  35784

240 35784  240  35784

2364216  2435784

Option (1)

Question 7

A random sample of 24 observations is used to estimate the population mean. The sample mean and the
sample standard deviation are calculated as 1046 and 288 respectively. The critical value (table value) to
be used to construct a 95% confidence interval for the population mean is

170
1. 196

2. 1645

3. 2069

4. 2064

5. 1711

Solution

The sample size n  24

The sample mean X  104

The sample standard deviation S  288

The level of significance   005

  
 005
The critical value for the confidence interval equals t n  1  t 24  1  t 23 0025 
2 2
2069

The t-student tables

Degrees of freedom 010 005 0025 001 0005


1
:
23 1319 1714 2069 2500 2807

The critical value  table value at 23 degrees of freedom and   005 equals to 2069.

Option (3)

171 STA1610/1
Question 8

A research firm conducted a survey to determine the mean amount steady smokers spend on cigarettes dur-
ing a week. They found the distribution of amounts spent per week followed a normal distribution with a
population standard deviation of R5 A random sample of 49 steady smokers revealed that the sample mean
X  R20 Determine the 95% confidence interval for 

1. 1860 2140

2. 1937 2063

3. 1980 2020

4. 1983 2017

5. 18825 21175

Solution

The sample size n  49

The population standard deviation   5

The sample mean X  20

The confidence interval of the mean is given by the formula below with   005


X  Z 2  
n

5
20  Z 005  
2
49

20  Z 0025  07143

172
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233

The corresponding value of 0025 equals to Z  19  006  196

20  196  07143

20  14000

20 14000  20  14000

186  214

Option (1)

Question 9

A random sample of 40 men drank with an average of 20 cups of coffee per week during a final examination,
with a sample standard deviation Sx equal to 6 cups. A lower limit of an appropriate 90% confidence interval
for the population average number of cups of coffee drunk, is

1. 20000

2. 217674

3. 19708

4. 182326

5. 18014

173 STA1610/1
Solution

The sample mean X  20

The sample standard deviation S  6

The sample size n  40

The confidence interval of the mean is given by the formulae as below with   010

  S
X  t n  1  
2 n

 
010 6
20  t 40  1  
2 40

20  t 39 005  09487

The t-student tables

Degrees of freedom 010 005 0025 001 0005


1
:
39 1303 1683 2021 2423 2704

The critical value  table value at 39 degrees of freedom and   005 equals to 1683

20  1683  09487

20  17674

20 17674  20  17674

182326  217674

The lower limit  182326 and the upper limit  217674.

Option (4)

174
Question 10

The principle purpose of a 95% confidence interval for a mean is

1. to estimate a sample mean.

2. to test a hypothesis about a sample mean.

3. to estimate a population mean.

4. to provide an interval that covers 95% of the individual values in the population.

5. to estimate a population proportion.

Solution

Option (3)

Question 11

The average cost per night of a hotel room in Port Elizabeth township is R273. Assume this estimate is based
on a sample of 46 hotels and that the sample standard deviation is R65. The 95% confidence interval of the
population mean cost per night is

1. 2536984 2923016

2. 2567195 2892805

3. 2540083 2919917

4. 2651248 2872631

5. 2925285 2534715

175 STA1610/1
Solution

The sample mean X  273

The sample standard deviation S  65

The sample size n  46

The confidence interval of the mean is given by the formulae below with   005

  S
X  t n  1  
2 n

 
005 65
273  t 46  1  
2 46

273  t 45 0025  95837

The t-student tables

Degrees of freedom 010 005 0025 001 0005


1
:
45 1301 1679 2014 2412 2690

The critical value  table value at 45 degrees of freedom and   0025 equals to 2014.

273  2014  95837

273  193016

273 193016  273  193016

2536984  2923016

Option (1)

176
Question 12

The mean X  10 for a sample of 100 and the population standard deviation found as 1. The upper limit of
the 90% confidence for the population mean estimate is

1. 0196

2. 10196

3. 9804

4. 101645

5. 00196

Solution

The sample size n  100

The population standard deviation   1

The sample mean X  10

The confidence interval of the mean is given by the formula below with   010


X  Z 2  
n

1
10  Z 010  
2
100

10  Z 005  01

177 STA1610/1
Z 000 001 002 003 004 005 006 007 008 009
30
:
16 00548 00537 00526 00516 00505 00495 00485 00475 00465 00455
00505  00495
we can see that 005is between  005
2

164  165
In the same way the corresponding value of Z is  1645
2

10  1645  01

10  01645

10 01645  10  01645

98355  101645

The upper limit equals to 101645

The lower limit equals to 98355

Option (4)

Question 13

A simple random sample of 30 has been collected from a population for which it is known that the population
standard deviation   100. The sample mean has been calculated as 2400. The 95% confidence intervals
for the population mean is

1. 2364216 2435784

2. 240 35785

3. 2362664 2437336

4. 2632451 2347558

5. 2435785 2634251

178
Solution

n  30   10 X  240   005


X  Z 2  
n

10
240  Z 005  
2
30

240  Z 0025  18257

Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233

240  196  18257

240  35784

240 35784  240  35784

2364216  2435784

Option (1)

179 STA1610/1
8.2.2 Confidence Interval of the Proportion

 The data have to be normally distributed.



p 1  p
 The confidence interval of the proportion is expressed by the formulae p  Z  
 
2
n
p 1  p  1  
where  for estimating the population proportion.
n n
number of successes
 The sample proportion is p 
Total

 The population proportion is 

 The critical value when estimating the confidence interval is Z 2 


p 1  p
 The standard error of the proportion is S p  
n

p
 The test statistic Z   
 1  
n

Activities

Question 1

An airline has surveyed a simple random sample of travelers to find out whether they would be interested
in paying a higher fare in order to have access to e-mail during their flight. Of the 400 travelers surveyed,
80 said e-mail access would be worth a slight extra cost. The manager wants to construct a 95% confidence
interval for the population proportion of air travelers who are in favor of the airline’s e-mail idea.

Which one of the following statement is incorrect?

1. The sample proportion is p  02

2. The critical value at   5% is 11645

3. The standard error for proportion is 002

180
4. The lower confidence limit is 01608

5. The upper confidence limit is 02392.

Solution

This question is related to the confidence interval of the proportion.

From the given information

The sample size n  400

The number of successes X  80

1. Correct

number of successes 80
The sample proportion p    02
Total 400

2. Incorrect

The critical value is Z 2  Z 005  Z 0025  196


2

The standardized Z normal table

Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233

3. Correct
 
p 1  p 02 1  02
The standard error of the proportion is S p    002
n 400

4. Correct

181 STA1610/1
5. Correct

p 1  p
The confidence interval of the proportion is p  Z 2 
n

02  196  002

02  00392

02  00392  02  00392

01608  02392

The lower confidence limit is 01608.

The upper confidence limit is 02392

Option (2)

Question 2

A random sample of 80 observations results in 50 successes. The 95% confidence interval for the population
proportion of successes is

1. 0625 01060

2. 0536 0714

3. 0622 0628

4. 0519 0731

5. 0591 0371

182
Solution

The sample size n  80

The number of successes X  50

number of successes 50
The sample proportion p    0625
Total 80

The critical value is Z 2  Z 005  Z 0025  196


2

The standardized normal table

Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233

 
p 1  p 0625 1  0625
The standard error of the proportion is S p    00541
n 80


p 1  p
The confidence interval of the proportion is p  Z 2 
n

0625  196  00541

0625  01060

0625  01060  02  01060

0519  0731

Option (4)

183 STA1610/1
Question 3

According to statistics reported on STATSA, a surprising number of motor vehicles are not covered by in-
surance. The results, consistent with the STATSA report, showed 46 of 200 vehicles were not covered by
insurance. The 95% confidence interval estimate for the population proportion is

1. 01716 02884

2. 02317 02283

3. 023 02884

4. 02884 01716

5. 01810 02790

Solution

The sample size n  200

The number of successes X  46

number of successes 46
The sample proportion p    023
Total 200

The critical value is Z 2  Z 005  Z 0025  196


2

The standardized normal table

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233

 
p 1  p 023 1  023
The standard error of the proportion is S p    00298
n 200
184

p 1  p
The confidence interval of the proportion is p  Z 2 
n

023  196  00298

023  00584

023  00584  023  00584

01716  02884

Option (1)

185 STA1610/1
186
Chapter 9

Hypothesis Testing of the Mean and


Proportions

SUMMARY

 Hypothesis is a proposition or statement made by usually a researcher or an analyst concerning the


nature or value of some unknown population quantity.

 This statement needs to be substantiated using real data. This is because we do not want to take the
researcher or analyst statement simply like this but we need to collect the data in an attempt to confirm
the hypothesis.

 The process of testing these hypotheses using sample data is called statistical hypothesis testing. This
involves the knowledge of probability to come to the conclusion about the stated hypothesis.

 The conclusion is about whether the hypothesis should be rejected or not.

 A series of five steps are used to determine whether we reject or not a null hypothesis based on a sample
data as follows:

Step 1. To formulate or to provide the formal hypothesis statements called null hypothesis H0 and the
alternative hypothesis H1 

187 STA1610/1
Step 2. To determine the appropriate test for the given hypothesis statement. This can be between
one-tail test or two-tail test depending on the problem being considered. This step has to be correctly
specified.

When the symbol:  or  appear in the alternate hypothesis H1 , we refer to one tail-test.

When the symbol:  appears in the alternate hypothesis H1 , we refer to two-tail test.

Step 3. To specify the level of significance denoted by  . The level of significance is a fixed probability
of making the error of rejecting the null hypothesis even though it is true. This is specified by the
researcher and represents the degree of accuracy that the test should exhibit. In practice, we use 
equals 5%, 10% or 1%.

Step 4. To calculate the relevant test statistic. This is a quantity calculated from a sample data. It is
used to determine if the null hypothesis H0 should be rejected or not.

Step 5. To make a decision we consider three different approaches in this module:

(1) By using the test statistic and the critical value.

When the test statistic is greater than the critical value, we will reject the null hypothesis H0

The critical value or the table value, it defines the range of possible values of the test statistic
for which we will reject H0 , otherwise we fail to reject H0 

(2) By using the p-value.

When the p-value is less than the level of significance , we will reject the null hypothesis H0
otherwise we fail to reject H0

p-value is the probability of getting a value of the test statistic as extreme as more extreme than
that observed by chance alone, if the null hypothesis is true.

In making use of the above (1) and (2), our decision will be based on the nature of the alternative
hypothesis (whether it is one-tail or two-tail test).

(3) By using the confidence interval,

When the value zero lies between the two confidence limits, we fail to reject H0 and when the
value zero lies outside of the confidence limits, we reject H0 .

 Type 1 error occurs when the null hypothesis H0 is rejected whereas it is in fact true.

 Type error 2 occurs when the null hypothesis H0 is not rejected whereas it is in fact false.

188
9.1 Hypothesis Testing of the Mean

X
 The test statistic when the population standard deviation is known equals to Z   


n
X
 The test statistic when the sample standard deviation is known equals to t   
S

n
 To calculate the p-value, we need to know both
the value of the test statistic Z and the level of significance  
The calculation of p-value will be based on the nature of the alternative hypothesis H1 (Whether it is
one-tail or two-tail test). In case of a two-tail test, the value of the p-value is multiplied by two.

 To calculate the critical value,


we need to know the level of significance  and make use of the standardised test statistic Z .
The calculation of critical value will be based on the nature of the alternative hypothesis H1 (Whether
it is one-tail or two-tail test). In case of a two-tail test the value of the level of significance  is divided
by two before getting the corresponding value Z from the table.

Activities

Question 1

Consider the following information

H1 :   1000 vs H1 :   1000

  200 n  100 x  980   001

Which one of the following statements is incorrect?

1. A tow tailed is used.

2. The standard error is 20.

3. The test statistic is 01.

4. The critical value is 258

5. Reject H0 when p-value is less than   001.

189 STA1610/1
Solution

H0 :   1000 s H1 :   1000

  200 n  100 X  980   001

1. Correct

Because H1 is using the symbol 

2. Correct

 200
The standard error  X      20
n 100

3. Incorrect

X 980  1000


The test statistic Z     1
 20

n

4. Correct

The critical value depends on the nature of the alternative hypothesis H1 and level of significance 
 001
Because of two-tail test :   0005
2 2

The standardized Z normal table

Z 000 001 002 003 004 005 006 007 008 009
30
:
25 00062 00060 00059 00057 00055 00054 00052 00051 00049 00048

1. The nearest critical value Z  258

5. Correct

Option (1)

190
Question 2

Consider testing the hypotheses

H0 :   50 s H1 :   50

If n  64 X  535   005

Which one of the following statements is incorrect?

1. A one-tail is appropriate in this situation.

2. The critical value at 25% level is 196

3. The test statistic is 280

4. The p-value is 00052

5. H0 is rejected at the 25% level of significance.

Solution

The given information

H0 :   50 s H1 :   50

  10 n  64 X  535   0025

1. Correct

Because H1 is using the symbol 

191 STA1610/1
2. Correct

The critical value depends on the nature of the alternative hypothesis H1 and level of significance 
Because of one-tail test :   0025

The standardized normal table

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233

The critical value Z  196

3. Correct

X 535  50 35


The test statistic Z        28
 10 125
 
n 64
4. Incorrect

The p-value depends on the nature of hypothesis and the test statistic Z(Because of one-tail test, the
value of p-value is not multiply by two)
p-value = PZ  28 This takes us back to chapter 6.

The standardized normal table

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019

p-value =PZ  28 = 00026

5. Correct

Reject H0 when p-value is less than   0025 Since 00026  0025, H0 is rejected at 5% level of
significance.

Option (4)

192
Question 3

For a sample of 35 items from a population for which the population standard deviation is   205, the
sample mean is x  4580. At the 005 level of significance, the tutor wants to test H0 :   450 against

H1 :   450.

Which one of the following statement is incorrect?

1. A two-tailed test is appropriate to this situation.

2. The critical value is 196

3. The test statistic z  231

4. The p-value is 00104

5. H0 is rejected at the 5% level of significance.

Solution

The given information

H0 :   450 s H1 :   450

  205 n  35 X  458   005

1. Correct

Because H1 is using the symbol 

193 STA1610/1
2. Correct

The critical value depends on the nature of the alternative hypothesis H1 and level of significance 
 005
Because of two-tail test :   0025
2 2

The standardized normal table

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233

The critical value Z  196

3. Correct

X 458  450 8


The test statistic Z        23087
 205 34651
 
n 35

4. Incorrect

The p-value depends on the nature of hypothesis and the test statistic Z(Because of two-tail test, the

value of p-value will be multiplied by two)


p-value  2 PZ  23087  2 PZ  231 This takes us back to chapter 6.

The standardized normal table

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084

p-value  2 00104  00208

5. Correct

Reject H0 when p-value is less than   005 Since 00208  005, H0 is rejected at 5% level of
significance.

Option (4)

194
Question 4

A professor of statistics refuses the claim that the average student spends 3 hours studying for the exam.
Which one of the following hypothesis is used to test the claim?

1. H0 :   3 H1 :   3

2. H0 :   3 H1 :   3

3. H0 :   3 H1 :   3

4. H0 :   3 H1 :   3

5. H0 : x  3 H1 : x  3

Solution

Option (2)

Question 5

Consider testing the hypotheses


H0 :   400 vs H1 :   400

If the value of the test statistic z equals 087 then the p–value is:

1. 01922

2. 08078

3. 03844

4. 04681

5. Need more information.

195 STA1610/1
Solution

H0 :   450 s H1 :   450

The test statistic Z  087

The p-value depends on the nature of hypothesis and the test statistic Z(Because of two-tail test, the value
of p-value will be multiplied by two)

p-value  2 PZ  087

The standardized normal table


Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867

p-value  2 01922  03844

Option (3)

Question 6

A laboratory tested a random sample of 30 chicken eggs and found that the mean amount of cholesterol
per egg is 235 milligrams and the standard deviation is 20 milligrams. If H0 :   230 is tested against
H1 :   230 at the 5% significant level, with the assumption that the cholesterol of chicken eggs is normally
distributed and suppose that the test statistic Z is equal to 137

What is the p–value of the test?

1. 00853

2. 09147

3. 08533

4. 06125

5. 01706

196
Solution

H0 :   230 s H1 :   230

The test statistic Z  137

The p-value depends on the nature of hypothesis and the test statistic Z

p-value  2 PZ  137

The standardized normal table

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823

p-value  2 00853  01706

Option (5)

Question 7

Which one of the following describe correctly a possible way to do hypothesis testing?

A. Calculate the test statistic and compare it to the significance level.

B. Calculate the p–value from the significance level, calculate the test statistic and compare the test sta-
tistic to the p–value.

C. Calculate the test statistic, find the corresponding p–value and compare the p–value to the significance
level.

D. Find the critical Z–value from the significance level, calculate the test statistic and compare the test
statistic to the critical Z–value.

197 STA1610/1
E. Find the critical Z–value from the significance level, calculate the p–value from the critical Z–value,
and compare the p–value to the significance level.

Which one of the following statements is correct?

1. B only

2. B and D

3. E only

4. B and E

5. C and D

Solution

A. Incorrect

The test statistic is compared to the critical value.

B. Incorrect

The p-value is calculated from the test statistic and compare the p-value to the level of significance.

C. Correct

D. Correct

p-value is calculated from the test statistic and the critical value is calculated from the level of significance
and we can compare the test statistic and the critical value.

Option (5)

198
Question 8

Which one of the following statements is correct?

In the hypothesis process, you should

1. define the statistical hypotheses H0 and H1 

2. determine the region of acceptance of H0

3. calculate the test statistic.

4. compare the test statistic to the critical value.

5. all the above

Solution

Option (5)

Question 9

A bakery stated that the average number of breads sold daily is 3000. An employee thinks that the actual
value might differ from this and wants to test this statement. The correct hypotheses are:

1. H0 :   3000 H1 :   3000

2. H0 :   3000 H1 :   3000

3. H0 :   3000 H1 :   3000

4. H0 :   3000 H1 :   3000

5. H0 :   3000 H1 :   3000

199 STA1610/1
Solution

Option (1)

Question 10

In testing the hypothesis H0 :   15 vs   15 the following information was given:

 5 X  181 n  10   003. The test statistic is

1. 0025

2. 196

3. 0975

4. 062

5. 196

Solution

H0 :   15 s H1 :   15

 5 n  10 X  181   003

X 181  15 31


The test statistic Z        19607
 5 15811
 
n 10

Option (2)

200
Question 11

A researcher wants to carry out a hypothesis test involving the mean for a sample of size n  18. The popula-
tion standard deviation is unknown, but she is reasonably sure that the underlying population is approximately
normally distributed. The test statistic she should use in carrying out the analysis is:

1. Z–test

2. t–test

3. Binomial test

4. Poisson test

5. Normal test

Solution

Option (2)

9.2 Hypothesis Testing of the Proportion

p
The test statistic when the population standard deviation is known Z   
1
n

The population proportion is 

The sample proportion is p

201 STA1610/1
Activities

Question 1

Calculate the p–value of the test of the following hypothesis given that the sample proportion p  063 n 
100 and the calculated test statistic z  005 The null hypothesis and the alternative hypothesis are

H0 :   060 vs H1 :   060

1. 04801

2. 05000

3. 05199

4. 06915

5. 07088

Solution

H0 :   060 s H1 :   060

p  063 n  100 Z  005

The test statistic Z  005

The p-value depends on the nature of hypothesis and the test statistic Z . This is one -tail test, the value of

p-value will not be multiply by two.

p-value = PZ  005

202
The standardized Z normal table

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641

p-value = 0.4801

Option (1)

Question 2

A statistics practitioner formulated the following hypotheses.

H0 :   070 vs H1 :   070

A random sample of 100 produced p  073 with a test statistic of 066, the p-value is:

1. 003

2. 00003

3. 05239

4. 07454

5. 02546

203 STA1610/1
Solution

H0 :   070 s H1 :   070

p  073 n  100 Z  066

The test statistic Z  066

The p-value depends on the nature of hypothesis and the test statistic Z (This is a one-tail test).

p-value = PZ  066

The standardized normal table

Z 0.00 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09


- 3.0
:
-0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451

p-value = 0.2546

Option (5)

Question 3

A popular weekly magazine asserts that fewer than 40% of households in South Africa have changed their
lifestyles because of escalating gas prices. A recent survey of 100 households finds that 67 households have
made lifestyle changes due to escalating gas prices.

Which one of the following statements is incorrect?

1. H0 :   040 vs H1 :   040

2. The test statistic is 55102

3. The critical value when   10% is 128

4. The p–value equals 00000

5. Conclusion: H0 is not rejected at 10% level of significance.

204
Solution

Given the following

H0 :   040 s H1 :   040

The sample size n  100

The number of successes X  67

number o f successes 67
The sample proportion p   100
 067
T otal

1. Correct

Because the statement speaks about the fewer of households therefore H1 is using the symbol 

2. Correct

p 067  040 027


The test statistic Z        55102
1 0401040 004900
n 100

3. Correct

The critical value depends on the nature of the alternative hypothesis H0 and level of significance
  010

The standardised Z normal table

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.2 0.1151 0.1131 0.1093 0.1075 0.1076 0.1056 0.1038 0.1020 0.1003 0.0985

The critical value Z  128

205 STA1610/1
4. Correct

The p-value depends on the nature of hypothesis and the test statistic Z

p-value = PZ  55102  PZ  551 = 0.0000

5. Incorrect

Rule: Reject H0 when the p-value is less than the level of significance  . Since 0.0000  0.10, we
reject H0 at 10% of level of significance.

Option (5)

Question 4

In testing the hypothesis H0 :   040 H1 :   040 at the 5% significance level, if the sample proportion
is 045 and the p–value is 00764 the appropriate conclusion would be:

1. to reject H0

2. not to reject H0

3. to reject H1

4. need more information

5. to accept H0 because p–value is minimum

Solution

H0 :   040 s H1 :   040   005

The sample proportion p  045

The p-value 00764

206
Rule : Fail to reject H0 when the p-value is greater than the level of significance  . Since 0.0764  0.05,
we fail to reject H0 at 5% of level of significance.

Option (2)

207 STA1610/1
208
Chapter 10

Chi–squared Test

SUMMARY

 Chi-squared test is applied when dealing with data that are categorical ( nominal or qualitative) in
nature.

 Chi-squared test makes used of a contingency tables for two or more qualitative variables.

 The Chi-squared test operates by comparing the observed values in a frequency tables or contingency
tables to the values that we would expect if a given hypothesis is true.

 A series of five steps are used to determine whether we reject or not a null hypothesis based on a sample
data as follows:

Step 1. To formulate or to provide the formal hypothesis statements called null hypothesis H0 and the
alternative hypothesis H1 

H0 : The two variables are independent or the two variables are not related

H1 : The two variables are dependent or the two variables are related

Step 2. To determine the appropriate degrees of freedom df : The number of rows minus 1 times the
number of column minus 1 denoted by r  1c  1 for the given frequency tables called observed
frequency tables.

209 STA1610/1
Step 3. To specify the level of significance denoted by  . The level of significance is a fixed probability
of making the error of rejecting the null hypothesis even though it is true. This is specified by the
researcher and represents the degree of accuracy that the test should exhibit. In practice, we use 
equals 5%, 10% or 1%.

Step 4. To calculate the relevant test statistic. This is a quantity calculated from an observed frequency.
It is used to determine if the null hypothesis H0 should be rejected or not.

 
2
 OE 2
The test statistic  
E

Where O represents the observed frequency

E represents the expected frequency. The expected frequency is calculated by taking

The Row total of cell  Column total of the cell


Total number  n

Step 5. To make a decision by using the test statistic and the critical value.

Rule : When the test statistic is greater than the critical value, we will reject the null hypothesis H0

 The critical value or the table value, it defines the range of possible values of the test statistic for which
we will reject H0 , otherwise we fail to reject H0 

Activities

Question 1

If a contingency table has 4 rows and 5 columns, how many degrees of freedom are there for the chi–square
 2 test for independence?

1. 20

2. 12

3. 15

4. 9

5. 10

210
Solution

The degrees of freedom df = r  1c  1  4  15  1  3  4  12

Option (2)

Question 2

Use the following contingency table:

A B Total
Yes 40 25 65
No 35 45 80
Total 75 70 145

The professor wants to test the independence for the two variables given in columns and in rows at the 5%
level of significance.

Which one of the following statements is incorrect?

1. The null hypothesis H0 is the two variables are independent.

2. The alternative hypothesis is the two variables are dependent.

3. The critical value is 3841

4. The expected frequency for Yes and B is 25

5. Suppose that the calculated test statistic  2  45455 the null hypothesis H0 is rejected at 5% level of
significance.

211 STA1610/1
Solution

The observed contingency tables

A B Total
Yes 40 25 65
  005
No 35 45 80
Total 75 70 145  n

1. Correct

2. Correct

3. Correct
The critical value is determined by knowing the value of the degrees of freedom and the level of
significance   005
The degrees of freedom df = r  1c  1  2  12  1  1  1  1
Using the Chi-squared tables.

df 0.10 0.05 0.025 0.01 0.005

1 2.706 3.841 5.024 6.635 7.879


:

Therefore the critical value is 3.841.

4. Incorrect

The Row total of cell  Column total of the cell


The expected frequency 
Total number  n

The Row total for Yes  Column total of B



Total number  n

65  70
  313793
145

5. Correct
Rule : Reject H0 when the value of the test statistic is greater than the critical value. Since 4.5455 
3.841, H0 is rejected at 5% level of significance.

Option (4)

212
Question 3

A sport preference poll showed the following data for men and women:

Favorite sport
Gender Basketball Football Golf Tennis Total
Male 24 17 30 18 89
Female 21 20 22 12 75
Total 45 37 52 30 164

Use the 5% level of significance and test to determine whether sport preferences depend on gender.

Which one of the following statements is incorrect?

1. Gender and favorite sport are independent.

2. Gender and favorite sport are dependent.

3. The degrees of freedom is 3.

4. The critical value is 7815.

5. Suppose that the calculated test statistic  2  330, the conclusion is to reject the null hypothesis H0 

Solution

1. Correct

2. Correct

3. Correct

The degrees of freedom df = r  1c  1  2  14  1  1  3  3

213 STA1610/1
4. Correct

Using the Chi-squared tables

df 0.10 0.05 0.025 0.01 0.005


1 2.706 3.841 5.024 6.635 7.879
:
3 6.251 7.815 9.348 11.345 12.838
:

Therefore the critical value is 7815

5. Incorrect

Rule : Reject H0 when the value of the test statistic is greater than the critical value. Since 330 
7815, H0 is not rejected at 5% level of significance.

Option (5)

Question 4

In a test for the independence of two variables, one of the variables has two possible categories and the other
has three possible categories. Suppose that the calculated test statistic is  2  3456 at the 5% level of
significance.

Which one of the following statement is incorrect?

1. H0 : The two variables are independent.

2. H1 : The two variables are dependent.

3. The degrees of freedom is 6

4. The critical value is 5991

5. Fail to reject H0 at the 5% level of significance.

214
Solution

The given information are

The number of row equals to 2.

The number of column equals to 3.

The value of the test statistic  2  3456

The level of significance   005

1. Correct

2. Correct

3. Incorrect

The degrees of freedom df = r  1c  1  2  13  1  1  2  2

4. Correct

df 0.10 0.05 0.025 0.01 0.005


1 2.706 3.841 5.024 6.635 7.879
2 4.605 5.991 7.378 9.210 10.597
3 6.251 7.815 9.348 11.345 12.838
:

Therefore the critical value is 5.991.

5. Correct

Rule : Reject H0 when the value of the test statistic is greater than the critical value. Since 3.456 
5.991, H0 is not rejected at 5% level of significance.

Option (3)

215 STA1610/1
Question 5

To perform a chi-squared test of independence you don’t require

1. Two or more nominal variables.

2. The distribution to be negatively skewed.

3. The degrees of freedom.

4. The level of significance.

5. A test of a contingency table.

Solution

Option (2)

Question 6

Two employees (Peter and John) are monitored to determine whether there is any difference in the proportions
of acceptance parts produced by the employees. The sample of parts produced is given below:

Quality Employee Total


Peter John
Acceptable 265 ? 950
Unacceptable ? ? ?
Total ? 700 1000

Which one of the following statements is correct?

1. H0 : The two variables are dependent.

2. The observed frequency for acceptable and employee John is 665

3. The expected frequency for acceptable and employee Peter is 265

4. The degree of freedom is 1

5. Suppose the calculated test statistic  2  40 H0 is not rejected at 5% level of significance.

216
Solution

The observed contingency tables

Peter John Total


Acceptable 265 685 950
  005
Unacceptable 35 15 50
Total 300 700 1000  n

1. Incorrect
The two variables are independent

2. Incorrect
The observed frequency is 685.

3. Incorrect
The Row total of cell  Column total of the cell
The expected frequency 
Total number  n

The Row total for acceptable  Column total of Peter



Total number  n

950  300
  285
1000

4. Correct
The degrees of freedom d f  r  1c  1  2  12  1  1  1  1

5. Incorrect
Rule : Reject H0 when the value of the test statistic is greater than the critical value.
The critical value is determined by knowing the value of the degrees of freedom and the level of
significance   005
Using the Chi-squared tables

df 0.10 0.05 0.025 0.01 0.005

1 2.706 3.841 5.024 6.635 7.879


:

Therefore the critical value is 3.841.


Since the test statistic  2  40  3841, H0 is rejected at 5% level of significance.

Option (4)

217 STA1610/1
Question 7

Let X  level of income and Y  political preference. Use the results shown in the table below and test on
a 1% level of significance whether the political preference and the level of income are independent.

Political party
A B C
Level 1 23 11 1
of 2 40 75 31
income 3 16 107 60
4 2 14 10

Suppose that the test statistic calculated is 693875, Which statement is incorrect?

1. H0 :The variables are independent H1 :The variables are dependent.

2. The degrees of freedom d f  12

3. The critical value  16812

4. There are 390 people.

5. H0 is rejected. The variables are dependent.

Solution

The observed contingency tables

A B C T otal
1 23 11 1 35
2 40 75 31 146
3 16 107 60 183
4 2 14 10 26
T otal 81 207 102 390  n

Given the calculated test statistic  693875.

218
1. Correct

2. Incorrect

The degrees of freedom df = r  1c  1  4  13  1  3  2  6

3. Correct

Using the Chi-squared tables

df 0.10 0.05 0.025 0.01 0.005


1 2.706 3.841 5.024 6.635 7.879
:
6 10.645 12.592 14.449 16.812 18.458

Therefore the critical value is 16.812.

4. Correct

5. Correct

Since the test statistic  2  693875  16812, H0 is rejected at 1% level of significance.

Option (2)

Question 8

The trustee of a company’s pension plan has solicited the opinions of a sample of the company’s employees
about a proposed revision of the plan. A breakdown of the responses is shown in the table below. We want
to test if there is enough evidence to infer that the response differ among the three groups of employees.

Responses Blue collar White collar Managers TOTAL


For 67 32 11 110
Against 63 18 9 90
TOTAL 130 50 20 200

Which option provides the correct expected observations in order to calculate the  2 value?

219 STA1610/1
1.
Responses Blue collar White collar Managers
For 11 32 67
Against 9 18 63

2.
Responses Blue collar White collar Managers
For 715 275 11
Against 585 225 9

3.
Responses Blue collar White collar Managers
For 72 27 11
Against 59 22 9

4.
Responses Blue collar White collar Managers
For 71 28 11
Against 58 23 9

5.
Responses Blue collar White collar Managers
For 71 28 11
Against 59 22 9

Solution

The observed frequencies

Group of employees
Responses Blue collar White collar Managers Total
For 67 32 11 110
Against 63 18 9 90
Total 130 50 20 200

Expected frequencies
Responses Blue collar White collar Managers Total
For 71.5 27.5 11 110
Against 58.5 22.5 9 90
Total 130 50 20 200

220
The calculation of the expected frequencies are as follows using the formula
The Row total of cell  Column total of the cell

Total number  n
T he Ro total o f For  Column total o f the Blue
For  Blue collar 
T otal number  n
110  130
  715
200
T he Ro total o f For  Column total o f the W hite
For  White collar 
T otal number  n
110  50
  275
200
T he Ro total o f For  Column total o f the Managers
For  Managers 
T otal number  n
110  20
  11
200
T he Ro total o f Against  Column total o f the Blue
Against  Blue collar 
T otal number  n
90  130
  585
200
T he Ro total o f Against  Column total o f the W hite
Against  White collar 
T otal number  n
90  50
  225
200
T he Ro total o f Against  Column total o f the Managers
Against  Blue collar 
T otal number  n
90  20
  9
200
Option (2)

Question 9

Consider the following EXCEL output testing for independence of two variables:
Contingency table
Column 1 Column 2 Column 3 Total
Row 1 93 91 174 358
Row 2 907 909 1826 3642
Total 1000 1000 2000 4000
Chi–squared stat 0331
df 2
p–value 0847
Chi–squared critical 5992

221 STA1610/1
Which one of the following statements is incorrect?

1. Since 0331  5992 the null hypothesis of independence cannot be rejected.

2. Since the p–value 0847  005 the two variables are dependent.

3. The degrees of freedom d f  r  1 c  1  2

4. The observed frequency for row 1, column 3 is 174

5. The expected frequency for row 1 column 3 is 179

Solution

The observed frequencies

Contingency tables
Column 1 Column 2 Column 3 Total
Row 1 93 91 174 358
Row 2 907 909 1826 3642
Total 1000 1000 2000 4000
Chi-squared stat 0.331
df 2
p-value 0.847
Chi-squared critical value 5.992

1. Correct

2. Incorrect

When the p-value is greater than the level of significance, we fail to reject H0 and therefore the two
variables are independent.

3. Correct

4. Correct

5. Correct
The Row total of row 1  Column total of col 3 358  2000
The expected frequency    179
Total number  n 4000

Option (2)

222
Question 10

A sample of 500 shoppers was selected in a large metropolitan area to determine various information con-
cerning consumer behavior. Among the questions asked was, “Do you enjoy shopping for clothing”. The
results are summarized in the following contingency table:

Observed frequencies Expected frequencies


Gender Gender
Enjoy for shopping Male Female Total Enjoy for shopping Male Female Total
Yes 12 23 35 Yes 951 2549 35
No 10 36 46 No 1249 3351 46
Total 22 59 81 Total 22 59 81

The test statistic  2ST AT is

1. 15766

2. 23547

3. 53214

4. 157662

5. 02135

Solution
Male Female T otal
Y es 12951 232549 35
No 101249 363351 46
T otal 22 59 81  n

 
2
 OE 2
The test statistic  
E

 OE 2         
2 2 12  951 2 23  2549 2 10  1249 2 36  3351 2
      
E 951 2549 1249 3351

 06520  02432  04964  01850


 15766

Option (1)

223 STA1610/1
Question 11

Consider the following table

Car size bought


Buyers age Small Medium Large Total
Under 30 10 22 34 66
30 - 45 24 42 48 114
Above 45 45 35 40 120
Total 79 99 122 300

Which one of following statements is incorrect?

1. The expected frequency of buyers above 45 and bought medium car is 396

2. The observed frequency of buyers under the age of 30 and bought large car is 34

3. The degrees of freedom is 4

4. The critical value is 9488 at   005

5. Suppose that the test statistic  2ST AT is 1835, H0 cannot be rejected at the 5% level of significance.

Solution

The observed frequencies

Car size bought


Buyers age Small Medium Large Total
Under 30 10 22 34 66
30 - 45 24 42 48 114
Above 45 45 35 40 120
Total 79 99 122 300

224
1. Correct

The expected frequency


T he Ro total o f aboe 45  Column total o f medium 120  99
   396
T otal number  n 300
2. Correct

3. Correct

The degrees of freedom df = r  1c  1  3  13  1  2  2  4

4. Correct

The critical value is determined by having the value of the degrees of freedom and the level of signifi-
cance   005

Using the Chi-squared tables

df 0.10 0.05 0.025 0.01 0.005


1 2.706 3.841 5.024 6.635 7.879
:
4 7.779 9.488 11.143 13.277 14.860

Therefore the critical value is 9488

5. Incorrect

Since the test statistic = 18.35 is greater than the critical value, we reject H0 

Option (5)

225 STA1610/1
Question 12

Consider the following information:


Seasons
Sales (units) Summer Winter Autumn Spring Total
Shoprite 98 114 123 105 440
Spar 199 84 210 67 560
Total 297 198 333 172 1000

Which one of the following statements is correct?

1. H0 : The two variables are dependents.

2. H1 : The two variables are independent.

3. The degrees of freedom is 8.

4. The critical value at the 5% level is equal to 15507

5. The expected frequency for cell Autumn and Spar is equal to 18648.

Solution

1. Incorrect
H0 :The two variables are independent.

2. Incorrect
H1 :The two variables are dependent.

3. Incorrect
The degrees of freedom d f  r  1c  1  2  14  1  1  3  3

4. Incorrect
The critical value is 7815

5. Correct
The expected value is

T he Ro total o f Aut min Column total o f Spar 333  560


  18648
T otal number  n 1000

Option (5)

226
Question 13

Conduct a test to determine whether the two classifications L and M are independent, using the data given in
the table below (use   005).
M1 M2 Total
L1 28 68 96
L2 56 36 92
Total 84 104 188

Which one of the following statements is incorrect:

1. H0 : The two classifications are independent.

2. H1 : The two classifications are dependent.

3. The degrees of freedom is 4.

4. The critical value (table value) is 3841.

2
5. Suppose that the test statistic X cal  19094 H0 is rejected at 5% level of significance.

Solution

1. Correct

2. Correct

3. Incorrect

The degrees of freedom d f  r  1c  1  2  12  1  1 1  1

4. Correct

5. Correct

Since 19.094 is greater than 3.841, H0 is rejected at 5% level of significance.

Option (3)

227 STA1610/1
228
Chapter 11

Simple linear regression

SUMMARY

 We explore methods that define possible relationships or association between two interval (or ordinal)
scaled data.

 When dealing with two data variables, we can visually explore whether there is association by plotting
a scatter plot of one variable against another variable. This gives us hints at the possible form of
association. The form of association can be linear or non-linear. In this module we will focus on linear
relationship alone.

 To calculate the strength of the association we have to calculate the correlation coefficient using a
simple data.

 Because we are interested in studying the association between two variables known as the dependent
variable denoted by Y and independent variable denoted by X.

 The independent variable provides the basis for calculating the value of the dependent variable.

 The linear regression model is given by the equation Y   0   1 X  error

229 STA1610/1
  b0  b1 X
 The estimated regression model is given by the equation Y

 The least squares provide the formulae for these estimators of b0 and b1 

 
Y X
 The formulae are b0  Y  b1 X Y  X
n n
         
n XY  X Y n X i Yi  n1 Xi Yi
b1   2  2 or b1   2  2
n X  X n Xi  Xi

 The correlation coefficient is given by the formulae

  
XY   
X Y
n
r    
 2  2   2
X n X 2
Y n Y

 The value of the correlation coefficient r lies between 1 and 1.

 If r is close to zero, this indicates there is a little linear relationship between X and Y . That means
there is no relationship between X and Y .

 If r is in the range between 03 to 05, this indicate there a weak linear relationship between X and Y
or a medium relationship between X and Y .

 If r is in the range between 05 to close to 1, this indicates a strong relationship between X and Y .

 If r equals to 1 or 1, there is a perfect relationship between X and Y .

 If r is greater than 0, we say there is positive relationship between X and Y .

 If r is less than 0, we say that there is a negative relationship between X and Y .

i 
 The residual or error term ei  Yi  Y

SS R
 The coefficient of determination r 2 
SST
 2
  Y
SS R  b0 Yi  b1 XY 
n
 2
 Y
SST  Yi2 
n

230
Activities

Question 1

The manager wants to analyse the relationship between advertising and sales, the manager of a furniture store
recorded the monthly advertising budget (thousand of rands) for a sample of 6 months. The data are presented
below:
Advertising X 23 46 60 54 28 33
Sales Y 96 113 128 98 89 125

The equation of the regression line of Y on X takes the form.

1. 
y  358  132X

2. 
y  1328  358X

3. 
y  4066  894X

4. 
y  1491  4019X

5. 
y  048  023X

Solution

X Y X2 Y2 XY
23 96 529 9216 2208
46 113 2116 12769 5198
60 128 3600 16384 7680
54 98 2916 9604 5292
28 89 784 7921 2492
33 125 1089 15625 4125
Total 244 5365 11034 5605025 232825

   
Y  5365 X  244 XY  232825 X 2  11034


Y 2  5605025 b1  13176

231 STA1610/1
b0  Y  b1 X


Y 5365
Y    894167
n 6


X 244
X   406667
n 6

b0  Y  b1 X

 894167  13176  406667

 358343

  b0  b1 X therefore Y
The regression line is Y   358343  13176X

Option (1)

Question 2

A study was conducted to determine the effects of sleep deprivation on people’s ability to solve problems.
The results were obtained as follows:

Number of hours X 8 12 16 20 24
Number of errors Y 86 610 814 1412 1612

If the slope b1  05765 the intercept b0 is

1. 05765

2. 10616

3. 1392

4. 63246

5. 22022

232
Solution

b1  05765


Y 86  6010  814  1412  1612 5308
Y     10616
n 5 5


X 8  12  16  20  24 80
X    16
n 5 5

b0  Y  b1 X

 10616  05765  16

 1392

Option (3)

Question 3

Consider a data set containing number of pages and price for n  15 books on a professor’s bookshelf.

The correlation coefficient, r, was calculated as 0474

Which one of the following statements is incorrect?

1. Number of pages and price are directly related.

2. A scatter diagram of the data will show an upwards slope.

3. The more pages a book has the higher the price.

4. The coefficient of determination will lie between 1 and 1

5. If there is absolutely no relationship between of pages and price, r will be equal to 0

233 STA1610/1
Solution

1. Correct

Because the correlation coefficient r is positive.

2. Correct

Because r is positive, it means that when the variable X increases, the variable Y increases as well.

3. Correct

4. Incorrect

The coefficient of determination r 2 lies between 0 and 1.

5. Correct

Option (4)

Question 4

If all the points in a scatter diagram lie on the regression line, then the correlation coefficient r:

1. must be 1

2. must be 1

3. must be either 1 or 1

4. must be 0

5. need more information

234
Solution

This is a perfect relationship between X and Y .

Option (3)

Question 5

Consider the following data


X 4 6 8 9 12
Y 12 16 25 30 39

    
XY  1082 Y  122 X  39 X 2  341 Y 2  3446
    2
SS XY  Xi  X Yi  Y  1304 SS X  Xi  X  368

Which one of the following statements is correct?

1. The intercept is 354

2. The slope  321

  321  354X
3. The regression line is Y

4. When the coefficient of correlation is 099, the coefficient of determination is 09950

  354
5. When X  10, then the estimated Y

235 STA1610/1
Solution

1. Incorrect

SS XY 1304
b1    35435
SS X 368
   
 X Y
SS XY  XY 
n

Y 122
Y    244
n 5

X 39
X   78
n 5

b0  Y  b1 X

 244  35435  78

 32393

2. Incorrect

3. Correct

  b0  b1 X
The regression line is Y than   321  354X
Y

4. Incorrect

When r  099 than r 2  0992  09801

5. Incorrect

  321  354X  321  354  10  3219


When X  10 then Y

Option (3)

236
Question 6

Which one of the following statements is incorrect?

1. If r  086, it implies that the relationship between the two variables examined is strong enough.

2. If r 2  0 70, it implies that 70% of the variation in Y is explained by the regression line.

3. If the coefficient of correlation r is highly negative, it cannot be reliable.

4. If r  064 then r 2  04096

5. r , indicates the strength and the direction of the relationship.

Solution

1. Correct

2. Correct

3. Incorrect

It can be reliable in the negative way. When one variable increases than the other one decreases.

4. Correct

r  064 r 2  0642  04096

5. Correct

Option (3)

237 STA1610/1
Question 7

A production manager has compared the dexterity scores of five assembly line employees with their hourly
  192  30X If a job
productivity (units per hour). A least-squares regression equation is calculated as Y
applicant has a dexterity score of 15, his predicted productivity per hour will be

1. 192 units

2. 222 units

3. 642 units

4. 150 units

5. 450 units

Solution

  192  30X
The estimated equation is Y

  192  30  15  642


When X  15 then Y

Option (3)

Question 8

The following data represent marks obtained in a mathematics test, X and an economics test, Y :

X 22 14 17 7 10
Y 7 17 12 27 22

   
X  70 Y  85 XY  1005 X 2  1118

238
Suppose that the value of SS XY  185, the slope is equal to

1. 1341

2. 35768

3. 134

4. 138

5. 134

Solution

   
 X Y
SS XY  XY 
n

 2
 X
SS X  X2 
n

702
 1118 
5

 1118  980

 138

SS XY 185
The slope equals to b1    13406
SS X 138

Option (5)

239 STA1610/1
Question 9

  3468  420X was fitted to a random sample of 11 pairs


Suppose that the least squares regression line Y
of variables.

Which of the following statements is incorrect?:

1. Y is called the dependent variable, X is called the independent variable.

2. The Y intercept for this regression line is equal to 3468

3. The slope is equal to 420

4. Using the regression line, the estimated value of Y when X  20 equals to 4932

5. For each increase of one unit of X, Y –value is predicted to decrease by 420

Solution

1. Correct

2. Correct

3. Incorrect

The slope equals to 420.

4. Correct

  3468  420  20  4932


When X  20 then Y

5. Correct

Option (3)

240
Question 10

Consider the following data:

X: 81 75 71 61 96 56 85 18
Y : 80 82 83 57 100 30 68 56

Use the summary statistics below and calculate the coefficient of determination

  
X 2  40849 X i  543 Yi2  41922
i 
Yi  556 X i Yi  40068 SS R  1359063

1. 04143

2. 06437

3. 05124

4. 01464

5. 04673

Solution

SS R
The coefficient of determination r 2 
SST
 2
 Y
2
SST  Y 
n

5562
 41922 
8

 41922  38642
 3280

SS R 1359063
r2    04143
SST 3280

Option (1)

241 STA1610/1
Question 11

Which one of the following statements is incorrect?

1. The coefficient of correlation r is a number that indicates the direction of the relationship between the
dependent variable Y and the independent variable X

2. The coefficient of correlation r is a number that indicates the strength of the relationship between the
dependent variable Y and the independent variable X

3. If the coefficient of correlation r  1 then the best–fit linear equation will include all the data points.

4. If the coefficient of correlation r  0 then there is a linear relationship between the dependent variable
y and the independent variable X

5. If the coefficient of determination r 2  081 the coefficient of correlation, r can be 090 or 090

Solution

1. Correct

2. Correct

3. Correct

4. Incorrect

When r  0, there is no linear relationship between X and Y

5. Correct


When r 2  081 than r   081  09 and r  09.

Option (4)

242
Question 12

Consider the following sample data:

X 12 23 11 23 14
Y 28 43 21 40 33

  
Suppose that b0  96 b1  141 X  83 Y  165 XY  2938 and SST  318 then the
coefficient of determination is

1. 094

2. 089

3. 066

4. 078

5. 049

Solution

SS R
The coefficient of determination r 2 
SST
 2
  Y
SS R  b0 Y  b1 XY 
n
1652
 96  165  141  2938 
5
 1584  414258  5445
 28158

SS R 28158
r2    08855
SST 318

Option (2)

243 STA1610/1
Question 13

The following are the number of minutes it took mechanics to assemble a piece of machinery in the morning,
X and in the afternoon, Y :
X 111 120 137 173 148 153
Y 142 215 211 193 190 174

The coefficient of determination is

1. 0238

2. 01667

3. 0057

4. 0488

5. 078

Solution

From the scientific calculator we have obtained:


    
X  842 Y  1125 X 2  120732 Y 2  214535 XY  1586
n  6 b0  147932 b1  02820
 2
  Y
SS R  b0 Y  b1 XY 
n
11252
 147932  1125  02820  1586 
6
 1664235  447252  2109375
 2112
 2
 Y
2
SST  Y 
n
11252
 214535 
6
 214535  2109375
 35975

SS R 2112
r2    00587
SST 35975

Option (3)

244
Question 14

Which one of the following statements is incorrect?

1. To draw a scatter diagram (plot), we need data for two variables.

2. If most of the points fall close to the line, we say that there is a linear relationship.

3. If one variable increases when the other does, we say that there is a negative linear relationship.

4. The objective addressed by the model is to analyze the relationship between two variables, X and Y .

5. If the coefficient of correlation r  09 than the coefficient of determination R 2  081.

Solution

1. Correct

2. Correct

3. Incorrect

We say there is a positive relationship.

4. Correct

5. Correct

r 2  09 r 2  092  081

Option (3)

245 STA1610/1

You might also like