STA 1610 Work Book PDF

© 2017 University of South Africa
All rights reserved
Printed and published by the

University of South Africa
Muckleneuk, Pretoria
STA1610/WB01/2018–2020
70690561
Layout done by the Department
Florida
Contents
1 Data and Statistics 1
1.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Population and Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Descriptive statistics: Tabular and Graphical Presentations 23
2.1 Visualizing Numerical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Descriptive Statistics: Numerical Measures 33
3.1 The Central Tendency or Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 The Dispersion or Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Quartile and Coefficient of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Introduction to Probability 61
5 Discrete Probability Distribution: Binomial and Poisson 81
5.1 General Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Discrete Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.2.1 Binomial Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.2.2 Poisson Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
iii STA1610/1
6 Continuous Probability Distribution 113
6.1 Normal Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7 Sampling Distribution of the Mean and Proportions 137
7.1 Sampling Distribution of the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.2 Sampling Distribution of the Proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8 Point Estimations and Confidence Intervals 161
8.1 Point Estimations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
8.2 Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
8.2.1 Confidence Interval of the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.2.2 Confidence Interval of the Proportion . . . . . . . . . . . . . . . . . . . . . . . . . 180
9 Hypothesis Testing of the Mean and Proportions 187
9.1 Hypothesis Testing of the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.2 Hypothesis Testing of the Proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
10 Chi–squared Test 209
11 Simple linear regression 229
iv
Preface
Welcome to the exciting workbook of Statistics.
I have written this workbook to make Statistics accessible to everyone, including those with a limited math-
ematics background. Statistics affects all aspects of our lives and its applications are so numerous that, in a
sense, we are limited only by our own imagination in discovering new uses for Statistics. The workbook of
statistics continues to emphasize some important concepts of Statistics.
The applied nature of the Statistics discipline is reinforced by showing and teaching students how to choose a
correct statistical procedure or formulae with a clear understanding of a such particular concept. Fulfilling this
objective requires several features built into this workbook that include a driven-scope of Statistical concepts
and a large number of activities.
In the workbook, calculators are very good at providing numerical results of statistical processes. One reason
for using calculator is for students to be able to understand the technique and concepts by doing calculations
by hand. Students have to be aware that all assignments and examinations in this module will be done by
hand with the support of formulaes and statistics tables. The approach adopted in the workbook is to divide
the solution of statistical problems into two stages and include them in every appropriate activity:
1. Identify a technique and
2. Calculate the Statistics. The calculation stage is completed in a manual way.
It is important that you do the exercises on your own before looking at my solutions. Even if you cannot
do an exercise, you should at least read through it and try to do it. Each exercise is designed to test your
understanding of the work immediately preceding it.
The scope helps students determine whether or not a statistical method or a particular formulae is appropriate.
To enrich students’ learning experience, each topic was chosen for its relatively straightforward presentations
and useful applications. Some of the other topics in expanding students knowledge for a quick overviews
include steam-and-leaf plots, statistical tables and graphs.
I hope that you enjoy studying this module as much as I have enjoyed compiling the workbook. In closing
I invite you to help me improve my presentation of this module. You can do this by bringing any errors,
obscurities, comments, suggestions or misprints to the attention of your lecturer.
Your Lecturer
Mr B J Kanyama
v STA1610/1
vi
Chapter 1
Data and Statistics
1.1 Variables
SUMMARY
 Variables can be denoted using symbols.
 Qualitative variables or categorical variables are used to measure qualitative, descriptive or categorical
characteristics of subjects. The value of qualitative variable cannot be described meaningfully using
numbers.
 Quantitative variables are used for quantitative characteristics of subjects, It is a variable that can be
expressed numerically.
 Qualitative - nominal variable: Data values represent descriptions or classifications. The order of the
values has no logical order.
 Qualitative - ordinal variable is used to show order. The values used as labels cannot be logically
interpreted.
1 STA1610/1
 Quantitative - discrete variable: Data occur as integers or whole numbers.
 Quantitative - continuous variable: Continuous data occur as continuous numbers with any level of
accuracy. The difference between measured values make sense but the data have no natural zero.
 Quantitative - ratio scale data: The data are ordered, having a continuous scale and have a natural or
true zero
Activities
Question 1
The variable “customer satisfaction” using the following four–point scale:
1  ‘not satisfied’, 2  ‘slightly satisfied’, 3  ‘satisfied’ and 4  ‘very satisfied’ represents
1. discrete variable
2. nominal variable
3. ordinal variable
4. continuous variable
5. discrete and ordinal variable
Solution
Variable "customer satisfaction" (1  ’not satisfied’, 2  ’slightly satisfied’, 3  ’satisfied’ and 4  ’very
satisfied’). This statement represents a qualitative - ordinal variable as the values 1, 2, 3 and 4 are used to
show the order but cannot be interpreted. We can even remove them to see the data values in the form of
ordered list.
Option (3)
2
Question 2
A survey of readers asked respondents to complete the following:
A. Gender
B. Marital status
C. Number of magazine subscriptions
D. Annual income
E. Rate the lecturer (very effective, effective, not too effective, not at all effective).
Determine the type of variable from each items given above, which one represents the discrete variable?
1. C and D
2. A, B and E
3. C and E
4. Only D
5. Only C
Solution
A. Gender (male and female): represents a qualitative - nominal variable.
B. Marital status (single, married, divorced, widow): represents a qualitative - nominal variable.
C. Number of magazine subscriptions (can be 1, or 2, or 4 and so on): represents a quantitative - discrete

variable.
D. Annual income (can be R200 00000 or R600 00000, and so on ): represents a quantitative - continuous
variable.
E. Rate the lecturer (very effective, effective, not too effective, not at all effective): represents a qualitative
- ordinal variable.
Option (5)
3 STA1610/1
Question 3
The variable “country” with possible values encoded as
1  ‘South Africa’, 2  ‘USA’, 3  ‘UK’ and 4  ‘Zimbabwe’ represents
1. nominal variable
2. ordinal variable
3. discrete variable
4. continuous variable
5. both ordinal and discrete variable
Solution
The variable country (1  ’South Africa’, 2  ’USA’, 3  ’UK’, 4  ’Zimbabwe’) represents a qualitative -
nominal variable.
Option (1)
Question 4
Which one of the following statements is incorrect?
1. The average marks for STA1610, the values could be 75%, 748% and 7489% is a continuous variable.
2. The number of children in a family is a discrete variable.
3. The median is sensitive to outlier.
4. The mean makes use of all observations.
5. In a data set we can have more than one mode.
4
Solution
1. Correct
The average marks for STA1610 are 075; 0748; and 07489
2. Correct
The number of children in the family can be for example 1, 10 or 15 or 25 and so on: Integers numbers.
3. Incorrect
The median is the middle value while the outlier is an extreme value from the dataset.
4. Correct
The mean is the sum of values in the dataset divide by the total number.
5. Correct
When we have one mode we say unimodal, when we have two mode we say bimodal and so on.
Option (3)
Question 5
1. Nominal data are also called qualitative data.
2. Quantitative data in Statistics refers to continuous data alone.
3. Qualitative variables indicate that a person or object belongs in a category.
4. The method of payment is a qualitative variable.
5. The amount of money a person withdraws from an ATM today is a discrete variable.
5 STA1610/1
Solution
1. Correct
2. Incorrect
Quantitative data refers to discrete and continuous variable.
3. Correct
4. Correct
Method of payment can be cash, cheque, electronic etc.
5. Correct
The withdrawal can only be referred to integers number representing the number of money notes with-
drawal at a time.
Option (2)
Question 6
Consider the following statements refers to a
A. whether or not you own a Panasonic television set.
B. your status as either a full time or a part time student.
C. the number of pupils who attended your primary school.
D. an weight of the unfilled container.
E. condition, either poor, fair, good or excellent.
6
Which one of the following statements is quantitative variable?
1. Only B
2. B and D
3. B , C and D
4. C and D
5. A, B and E
Solution
Given the following statements
A. A Panasonic television set (own or not own): represents a qualitative - nominal variable.
B. The status of the students (full time or part time): represents a qualitative - nominal variable.
C. The number of pupils who attended your primary school: represents a quantitative - discrete variable
D. The variable weight: represents a quantitative - continuous variable.
E. Variable condition (poor, fair, good and excellent): represents a qualitative - ordinal variable.
Option (4)
7 STA1610/1
Question 7
1. Stress level, age, gender, religion are examples of variables.
2. A person’s nationality (Mexican, Ethiopian, Australian) represents a nominal variable.
3. The number of times a mouse makes a wrong turn in a laboratory represents a continuous variable.
4. The position one finishes in a race is a qualitative variable.
5. The ethnic group to which a person belongs represents a categorical variable.
Solution
1. Correct
2. Correct
3. Incorrect
The statement about the number of a mouse makes a wrong turn represents a discrete variable.
4. Correct
The position one finishes in a race can be first, 2nd, 3rd, ... as this represents a qualitative - ordinal
variable.
5. Correct
Option (3)
8
Question 8
1. The number of minutes it takes to read a page is a continuous variable.
2. The number of problems in the text is a discrete variable.
3. The place of residence (Pretoria, Johannesburg, Danville) is a nominal variable.
4. Annual income represents a quantitative variable.
5. The type of residence (single or multiple family home) is an ordinal variable.
Solution
1. Correct
2. Correct
3. Correct
4. Correct
5. Incorrect
This is a qualitative - nominal variable
Option (5)
9 STA1610/1
Question 9
Residents were asked a series of questions:
A. On what floor is your flat?
B. Do you own or rent your accommodation?
C. How large is your apartment (in square meters)?
D. How much did you spend?
E. Rate the availability of parking space as excellent, good, fair, poor or very poor.
Which one of the following statements is a quantitative variable?
1. A and C
2. C and D
3. Only D
4. A , C and D
5. B and E
Solution
A. This question refers to a quantitative - discrete variable.
B. This question refers to a qualitative - nominal variable.
C. This question represents a quantitative - continuous variable.
D. This question represents a quantitative - continuous variable.
E. The variable "rate the availability of parking" (excellent, good, fair or poor): represents a qualitative -
ordinal variable.
Option (4)
10
Question 10
The question “what is your marital status?” had the following responses
A. Married
B. Widowed
C. Divorced
D. Separated
E. Never married
The best graphical techniques to summarize the data is:
1. Bar chart
2. Pie chart
3. Histogram
4. Stem–and–leaf
5. Bar and Pie chart
Solution
The variable "response to the question about marital status" (Married, widowed, divorced, separated, never
married): represents a categorical variable.
There are two graphical techniques to summarize the data with the qualitative variable: Bar and Pie chart.
The best between the two is the bar chart.
Option (1)
11 STA1610/1
Question 11
Consider the following statements:
A. Score in a soccer match.
B. Temperature in degrees Celsius.
C. Number of defective machine part.
D. Age to the nearest year of children in a classroom.
E. Number of modules needed for a degree.
Which one of the following variables is / are discrete?
1. A and B
2. B , C and D
3. C and E
4. A , C , D and E
5. A , B , C , D and E
Solution
Each statement is defined as indicated below
A. Quantitative -discrete variable
B. Quantitative - continuous variable
C. Quantitative - discrete variable
D. Quantitative - discrete variable
E. Quantitative - discrete variable
Option (4)
12
Question 12
1. The number of ear pierces a person has, represents a discrete variable.
2. The opinion about legalization of marijuana is a qualitative variable.
3. The number of oil cans sold at a given petrol station is a continuous variable.
4. Height of a customer at a boutique is a quantitative variable.
5. The daily number of students going to school is discrete variable.
Solution
1. Correct
2. Correct
An opinion can be good or bad, alternatively an opinion can be valid or invalid.
3. Incorrect
The number of cans sold gives us a countable characteristics of data therefore we can only obtain
integers. This is a discrete variable.
4. Correct
Height can be measured therefore height represents a quantitative - continuous variable.
5. Correct
Option (3)
13 STA1610/1
Question 13
Before leaving a particular restaurant, customers are asked to respond to the questions listed below.
A. What is the approximate distance of the restaurant from your residence?
B. Have you eaten at the restaurant previously?
C. If your answer to part (B) was yes, on how many occasions?
D. Which of the following attributes of the restaurant do you find most attractive: service, prices, quality
of food, or varied menu?
For each question, determine which possible responses are qualitative:
1. Only B
2. B and C
3. B and D
4. A and C
5. B C and D
Solution
The question has a characteristic of data that is
A. Quantitative - continuous variable, the variable is ’distance’ that can be measured.
B. Qualitative - nominal variable, the variable is ’have you eaten previously’ (the response is yes or no).
C. Quantitative - discrete variable.
D. Qualitative - nominal variable, the variable is ’attribute of the restaurant do you find more attrative’
(service, prices, quality of food, or varied menu).
Option (3)
14
Question 14
The following information is collected from an application form for a car-loan to a certain bank
A. Marital status
B. Total of monthly expenditures in rands
C. Number of jobs in the past ten years
D. Gender of the applicant
E. Street address of the applicant
Which of the above variables are quantitative?
1. A and B
2. B and C
3. C and D
4. A, B and E
5. B and E
Solution
A. Qualitative - nominal
B. Quantitative - continuous
C. Quantitative - discrete
D. Qualitative - nominal
E. Qualitative nominal
Option (2)
15 STA1610/1
Question 15
1. Academic rank is an ordinal variable.
2. Discrete variable arises from counting process.
3. Internet provider is a nominal variable.
4. Number of magazines subscribed to is a discrete variable.
5. Your status on whether you own a computer is a continuous variable.
Solution
1. Correct
2. Correct
3. Correct
4. Correct
5. Incorrect
Your status on whether you own a computer is a qualitative - nominal variable.
Option (5)
16
Question 16
Consider the following questions:
A. What is your occupations?
B. What is your income?
C. What degree did you do?
D. What is the amount of your student loan?
E. How would you rate the quality of instruction? (excellent, very good, good, fair, poor).
Identify the type of variable for each question, which one represents the quantitative variable?
1. Only B
2. Only D
3. A, C and E
4. B and D
5. B, C, and D
A. Qualitative - nominal variable.
B. Quantitative - continuous variable.
C. Qualitative - ordinal variable.
D. Quantitative - continuous variable.
E. Qualitative - ordinal variable.
Option (3)
17 STA1610/1
1.2 Population and Sample
SUMMARY
 A sample is a portion (subset) of the population.
 A population is a complete set of objects that are considered in the study.
 A population size is the number of items in the population denoted by N 
 A sample size is the number of items in a sample denoted by n
 A population parameter : Summarising measure of a specific aspect of an entire population, it is repre-

sented by the population mean  and the population variance  2 
 A sample statistic : Summarising measure of a specific aspect of a sample, it is represented by the

sample mean X and the sample variance S 2
Activities
Question 1
A descriptive measure of a population is called a
1. parameter
2. statistic
3. mean 
4. mean X
5. size N
18
Solution
Option (1)
Question 2
Consider the following statistical measurements:
A. The mean
B. The median
C. The mode
D. The range
E The standard deviation
The less commonly used measure of central tendency of a distribution of scores is
1. Only A
2. C and D
3. A, B and D
4. C, D and E
5. B and C
Solution
Option (5)
19 STA1610/1
Question 3
1. One form of descriptive statistics uses graphical techniques.
2. One form of descriptive statistics uses numerical techniques.
3. In the language of statistics, population refers to a group of people.
4. Statistical inference is used to draw conclusions or inferences about characteristic of populations based
on sample data.
5. A summary measure that is computed from a sample to describe a characteristic of the population is
called a statistic.
Solution
1. Correct
2. Correct
3. Incorrect
A sample refers to a group of people.
4. Correct
5. Correct
Option (3)
20
Question 4
Which of the statements is/are incorrect?
1. A variable is a characteristic of an item or individual being measured.
2. Population is a portion of a sample selected for analysis.
3. In a pie chart, the size of segments varies according to the percentage in each category.
4. The difference between the histogram and bar chart is that the bars of the histogram are jointed together
whereas those of a bar graph are not.
5. Stem-and-leaf display reveals more information than the frequency distribution.
Solution
1. Correct
2. Incorrect
Population is the entire or complete set of objects.
3. Correct
4. Correct
5. Correct
Option (2)
21 STA1610/1
22
Chapter 2
Descriptive statistics: Tabular and

Graphical Presentations
2.1 Visualizing Numerical data
SUMMARY
 Visualising data involves using various tables and charts to help draw conclusions about data. The
tables and charts depend on the type of data we have.
 Stem-and-leaf allows us to see how a data set are distributed and where the concentration of data exist.
The values in stem-and-leaf display are presented in ascending order.
 Histogram is a way of summarising data that are measured on continuous or discrete scale variable.
 Frequency polygon is a graph that is made by joining the centre of the top of the columns of a frequency.
 Bar and pie charts are used of summarising data that are measured on categorical scale variable.
23 STA1610/1
Activities
Question 1
Given the following stem–and–leaf display for midterm exam score on information systems:
5 0
7 4 4 6
8 1 9
9 2
1. The sample size is 7.
2. Approximately most of the candidate had distinctions.
3. The median is 76
4. The fifth exam score is 1.
5. An ordered array for exam score is 50 74 74 76 81 89 92
Solution
Given the stem-and-leaf midterm exam score
5 0
6
7 4 4 6
8 1 9
9 2
1. Correct
The sample data are 50 74 74 76 71 89 92 therefore the sample size n  7
2. Correct
The candidates that had distinction can be represented by 74 74 76 81 89 92
24
3. Correct
4. Incorrect
The median is 76
5. Correct
Option (4)
Question 2
Consider the nine e-mail receipts data of two-digit integers given below
11 33 28 32 13 24 28 22 17
The correct stem-and-leaf display is
1. 1 1 3 7
2 8 4 8 2
3 3 2
2. 1 1 3 7
3 3 2
2 8 4 8 2
3. 1 1 3 7
2 2 4 8 8
3 2 3
4. 1 1 3 7
3 2 3
2 2 4 8 8
5. 11 13 17
22 24 28 28
32 33
25 STA1610/1
Solution
The stem-and-leaf display is

1 1 3 7
2 2 4 8 8
3 2 3
Option (3)
Question 3
In the following stem-and-leaf display for a set of two digit integers,
2 0 2 2 7
3 1 1 3 5 9
The correct original set of data is
1. 0 2 2 7 1 1 3 5 9
2. 20 22 22 27 31 31 33 35 39
3. 20 22 22 27 31 31 33 35 39
4. 20 22 27 31 33 35 39
5. 2 22 22 72 13 13 33 53 59
Solution
Given a stem-and-leaf
2 0 2 2 7
3 1 1 3 5 9
The numerical values in a set of two digit integers are 20 22 22 27 31 31 33 35 39
Option (3)
26
Question 4
The ages of a sample of 40 workers are shown below using a stem–and–leaf diagram
2 5 6 6 8 8 9
3 0 1 2 3 3 3 4 5 5 6 6 7 7 8 8 9 9
4 0 0 1 1 1 1 6 6 6 8 9
5 0 0 1 2 3
6 1
1. The smallest age in the sample is 25
2. Most ages in the sample are greater than 40
3. The mode of the distribution is 41
4. The range of the distribution is 36
5. The median is 38
Solution
Given a stem-and-leaf display
2 5 6 6 8 8 9
3 0 1 2 3 3 3 4 5 5 6 6 7 7 8 8 9 9
4 0 0 1 1 1 1 6 6 6 8 9
5 0 0 1 2 3
6 1
The numerical values are

25 26 26 28 28 29 30 31 32 33 33 33 34 35 35
36 36 37 37 38 38 39 39 40 40 41 41 41 41 46
46 46 48 49 50 50 51 52 53 61
1. Correct
2. Incorrect
Because there are 15 values that are more than 40 and 25 values that are less or equal to 40.
27 STA1610/1
3. Correct
The most repeated number.
4. Correct
The range: The largest number - the smallest number  61  25  36
5. Correct
n1 40  1
The median position :   205
2 2
Because the position 205 falls between the 20th and 21st value therefore the median is the average of
38  38
the 20th value and 21st value in increasing order:  38
2
Option (2)
Question 5
The following data gives the marks obtained in a statistics exam as a percentage:
3 5
4 3 8 8 9 9
5 4 5 5 9
6 1 3 6
7 3
9 5
Which one of the following statements is correct?
1. The range is 35
2. The mode is 48 and 55.
3. About 33% of the marks lie between the marks 40 and 50
4. The median is 5.
5. The majority of the students failed the statistics exam.
28
Solution
3 5
4 3 8 8 9 9
5 4 5 5 9
6 1 3 6
7 3
9 5
1. Incorrect
The range  the largest number  the smallest number  95  35  60
2. Incorrect
The mode is 48, 49 and 55
3. Correct
Between 40 and 50, we have five values among 15 values and the five values are 43 48 48 49 and
5
49. These values correspond to the proportion equals to  03333 or 3333%
15
4. Incorrect
The median is 54 as the middle number:
35 43 48 48 49 49 54 55 55 59 61 63 66 73 95
5. Incorrect
The majority of the students have passed the statistics exam as they are nine among the 15 students.
Option (3)
Question 6
The following stem-and-leaf display was constructed for a random sample:

3 0
4 2 6
5 5 6 7
6 0 2 4 6
7 2 7 7
8 3
29 STA1610/1
1. The median for the random sample is 62
2. 50% of the values lie between the values 50 and 70.
3. The mode of the data set is 77
4. The sixth largest value in the random sample is 64
5. The range of the random sample is 53
Solution
Given the stem-and-leaf

3 0
4 2 6
5 5 6 7
6 0 2 4 6
7 2 7 7
8 3
1. Incorrect
The ranked data:
30 42 46 55 56 57 60 62 64 66 72 77 77 83
60  62
the median is the average of 60 and 62   61
2
2. Correct
7
Between 50 and 70 we have 7 values that correspond to the proportion of  05 or 50%
14
3. Correct
The most occurrence value is 77.
30
4. Correct
The sixth value is 64 when we count from 83
5. Correct
The range equals to the largest value minus the smallest value in increasing order: 83  30  53
Option (1)
31 STA1610/1
32
Chapter 3
Descriptive Statistics: Numerical Measures
3.1 The Central Tendency or Location
SUMMARY
 The central tendency or location defines the location of the middle or the centre of a distribution.
 The central tendency allows us to assign a value to what is the most representative value of the group.
 The measures of the central tendency are the mean, median and the mode.

x
 The mean is a measure of 
the average data values for a given dataset: The sample mean X  n
and
the population mean   Nx 
 The median is the value in the middle of an ordered dataset.
 The mode is the most occurring value in a set of discrete data set.
 A distribution is symmetrical when the value of the mean, median and mode are all equal. The distrib-
ution is called normal (bell-shaped).
33 STA1610/1
 A distribution is positively skewed when the value of the mean is more than the value of the median
and the mode. The tail is on the right hand side.
 A distribution is negatively skewed when the value of the mean is smaller than the value of the median
and the value of the mode. The tail is on the left hand side
3.2 The Dispersion or Variability
 Dispersion is a degree to which values are spread out on the number line.
 The range is the difference between the largest and the smallest data values. it shows how widely spread
the data are by considering the distance between the largest and smallest values.
 The variance: Related to the sum of the squared distances of observations from their sample mean.
This sum of squared values measures the dispersion of the observations.
 2 
2
xi X 2 xi  2
 The sample variance S  The population variance   
n1 N

 The standard deviation is the square root value of the variance = ariance
Activities
Question 1
The distribution is symmetrical when
1. the value of the mean, median and mode are all equal.
2. we use the sample to conclude about the population.
3. each observation in a sample is likely to be selected.
4. the value of the mean, median and mode are not equal.
5. we are able to calculate the mean and the standard deviation through the calculator.
34
Solution
Option (1)
Question 2
Consider the data collected on returns on investment as follows:
31 27 26 30 28 31
1. The range is 5
2. The median is 28
3. The mode is 30.
4. The sample mean X is 28.
5. The distribution is symmetrical.
Solution
Given information below
The data are 31 27 26 30 28 31
The ranked data are 26 27 28 30 31 31
1. Correct
The range : The largest - the smallest value in increasing order: 31  26  5
35 STA1610/1
2. Incorrect
The median : is the middle value 26 27 28 30 31 31
28  30
The average of 28 and 30 which equals to  29
2
3. Incorrect
The mode is 31
4. Incorrect
31  27  26  30  28  31 173
The sample mean equals to   288333
6 6
5. Incorrect
Because the mean  288333, the median  29 and mode  31 are not all equal the distribution is not
symmetrical.
Option (1)
Question 3
Consider a random sample of the daily cost of electricity of five citizens:
23 11 15 26 12
Calculate the standard deviation of the cost of electricity.
1. 174
2. 637
3. 673
4. 051
5. 453
36
Solution
Given the sample data 23 11 15 26 12
 2
2
xi X
The sample variance S 
n1

xi 23  11  15  26  12
The sample mean X    174
n 5
The sample variance

 2
2 xi  X
S 
n1
23  1742  11  1742  15  1742  26  1742  12  1742

51
1812
  453
4

The standard deviation S  453  67305
Option (3)
Question 4
Measures of central tendency include
1. the value of the mean, median and mode, when they have all the same value.
2. the presence of outlier.
3. whether the distribution is symmetrical or skewed.
4. sample size, mean and variance.
5. mean, median and mode.
37 STA1610/1
Solution
The measure of central tendency (or location) is mean, median and mode.
Option (5)
Question 5
Consider the data collected on the mass of six laboratory rats as follows:
27 31 28 30 26 27
1. The range is zero.
2. The median is 29.
3. The mean is 2017.
4. The mode is 27.
5. Unordered data set is 26 27 27 28 30 31.
Solution
Given the data 27 31 28 30 26 27
The ranked data are 26 27 27 28 30 31
1. The range  31  26  5
27  28
2. The median is the average of 27 and 28 which is  275
2
27  31  28  30  26  27 169
3. The mean is X    338
5 5
38
4. The mode is 27
5. The unordered data set is 27 31 28 30 26 27
Option (4)
Question 6
The examination marks are given below:
24 27 36 48 52 52 53 59

8
Given that the  2  1202875 the sample variance equal to:
xi x
i1
1. 15036
2. 1311
3. 17184
4. 43875
5. 1784
Solution
From the given data
24 27 36 48 52 52 53 59
8  2
If the i1 xi  X  1202875
 2
2
xi X 1202875 1202875
The variance is S     171839
n1 81 7
Option (3)
39 STA1610/1
Question 7
Consider the following statistical measurements:
A. The mean
B. The median
C. The mode
D. The range
E. The standard deviation
The most commonly used measure of central tendency of a distribution scores is
1. Only A
2. C and D
3. A , B and D
4. C, D and E
5. B and C
Solution
The most commonly used measure of central tendency is the mean.
Option (1)
40
Question 8
For a sample of eight employees, the most recent hourly wage increases were 18 5 7 2 10 6 12 15 cents
per hour. The sample variance is
1. 2913
2. 20388
3. 540
4. 9213
5. 2391
Solution
The given the data are 18 5 7 2 10 6 12 15
 2
2
xi X
n1
18  5  7  2  10  6  12  15 75
The sample mean X    9375
8 8
18  93752  5  93752  7  93752  2  93752
10  93752  6  93752  12  93752  15  93752
S2 
81
203875

7
 29125
Option (1)
Question 9
Consider the following scores

5 3 2 13 2
The outlier score is:
41 STA1610/1
1. 5
2. 2
3. 2 and 5
4. 2 and 13
5. 13
Solution
The outlier score is 13
Option (5)
Question 10
The mean tells you
1. The middle score if you line the score up from the lowest to the highest.
2. The central tendency of the distribution.
3. The most frequent score in a distribution.
4. One high point that you are sure about.
5. About how spread out the scores are.
Solution
The mean tells us the most frequent score in a distribution
Option (3)
42
Question 11
The most commonly used measure of dispersion of a distribution of scores is
1. The mode
2. The mean
3. The median
4. The standard deviation
5. The range
Solution
Option (4)
Question 12
A social psychologist asked 15 College students how many times they fell in love before they were eleven
years old. The number of times were as follows:
2 0 6 0 3 1 0 4 9 0 5 6 1 0 2
Which one of the following statement is incorrect?
1. The range is 9
2. The median is 2
3. The distribution tails to the right
4. The mode is 1, 2 and 6 since 0 adds nothing
5. The value of the mode is smaller than the value of the median?
43 STA1610/1
Solution
The ranked data 0 0 0 0 0 1 1 2 2 3 4 5 6 6 9
1. Correct
Range: 9  0  9
2. Correct
n1 15  1
Middle value is obtained by   8, the 8th value in ordered array is 2.
2 2
3. Correct

xi 000001122345669 39
The mean X     26
n 15 15
because the mean  26 is greater than the median  2, the distribution is positively skewed on the
right hand side.
4. Incorrect
Zero is also a value counted amongst the values that were given.
5. Correct
The mode is 0
Option (4)
Question 13
The variance tells you
1. about how spread out the scores are.
2. where the mean of the distribution is.
3. about the sum of the scores divided by the number of scores.
4. about the central tendency of the distribution.
5. that the population is known.
44
Solution
Option (1)
Question 14
The following performance scores have been recorded for 10 jobs applicants who have taken a pre-employment
aptitude test at UNISA:
22 19 22 24 21 22 17 20 17 21
1. The sample mean X is 205
2. The median is 215
3. The mode is 22
4. The range is 7
5. The distribution is skewed.
Solution
Given data 22 19 22 24 21 22 17 20 17 21
1. Correct 
xi 22  19  22  24  21  22  17  20  17  21 205
The sample mean X     205
n 10 10
2. Incorrect
The median: The ranked data 17 17 19 20 21 21 22 22 22 24
The median is the average of 21 and 21 that gives 21.
3. Correct
45 STA1610/1
4. Correct
The range  24  17  7
5. Correct
Because the mean, median and mode values are completely different.
Option (2)
Question 15
The following data gives the ages in years of a sample of 8 employees from a government department:
31 43 56 23 49 42 33 61
Which statement is correct?

1. xi  410
2. X  41

3. xi2  15540
4. The standard deviation s  1293
5. The median is 36.
Solution
Consider the given data 31 43 56 23 49 42 33 61

1. xi  31  43  56  23  49  42  33  61  338

xi 31  43  56  23  49  42  33  61 338
2. X     4225
n 8 8
46

3. xi2  312  432  562  232  492  422  332  612  15450
4. The standard variance
31  42252  43  42252  56  42252  23  42252

49  42252  33  42252  61  42252
S2 
81
116995

7
 1670714

The standard deviation S  1670714  129256
5. The ranked data 23 31 33 42 43 49 56 61
The median is the average of 42 and 43 values that corresponds to 425
Option (4)
Question 16
The following data gives the typing speeds (in words per minute) for several stenographers.
125 140 170 155 132 175 225 210 125 310
Which one of the following computations for this sample is incorrect?

10
1. xi  1767
i1

10
2. xi2  1 7672
i1
3. The mean is X  1767
4. The range is equal to 185
5. The mode is 125
47 STA1610/1
Solution
1. Correct

xi  125  140  170  155  132  175  225  210  125  310  1767
2. Incorrect
 2
xi  1252  1402  1702  1552  1322  1752  2252  2102  1252  3102  342649
3. Correct
4. Correct
 125  140  170  155  132  175  225  210  125  310
xi
The sample mean X  n
 
10
1767
 1767
10
5. Correct
The range  the largest value  the smallest value  310  125  185
6. Correct
The mode  125
Option (2)
Question 17
Which one of the following statements is not true about the mean?
1. It is the best measure of central tendency when data is not skewed.
2. In a symmetric distribution, the mean, the median and the mode are all equal.
3. It is not affected by extreme values or outliers.
4. It utilises all values in its calculation.
5. To calculate the mean, sum all values and divide by the count.
Solution
Option (3)
48
Question 18
A summary measure that is computed from a sample to describe a characteristic of the population is called
1. a parameter
2. a population
3. a statistic
4. inferential statistics
5. box–plot
Solution
Option (1)
Question 19
The statistics Department randomly selected 15 students and recorded their marks (in percentage) during the
last examination. The data of the 15 students are as follows:
50 72 63 60 80 79 41 48 23 72 96 34 43 55 51
1. The mean mark is 578
2. The median mark is 55
3. The range mark is 73
4. Approximately 66% of the students pass the exam
5. The mode is 2
49 STA1610/1
Solution
1. Correct 
xi 50  72  63  60  80  79  41  48  23  72  96  34  43  55  51
The mean X  
n 15
867
  578
15
2. Correct
n1
The median: The middle value when the data are arranged. The position of the median is 
2
15  1
 8 this means that the median is the 8th value in the increasing order.
2
23 34 41 43 48 50 51 55 60 63 72 72 79 80 96
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th
The median equals to 55.
3. Correct
The range  the largest value  the smallest value in increasing order
 96  23  73
4. Correct
66% of 15  066  15  99  10 students pass the exam.
Alternatively, we count the number of students who pass the exam and we divide by 15. This gives us
10
15
 06667 as this can be expressed as about 66%.
5. Incorrect
The mode is 72
Option (5)
Question 20
In a survey of office workers in a large city, each worker in a random sample of 10 workers was asked to
report the number of times during the previous month he or she has eaten an evening meal at a restaurant.
The results were

5 0 9 1 9 8 5 5 2 6
The standard deviation of the above data is
50
1. 5
2. 1022
3. 320
4. 283
5. 803
Solution

xi 5091985526 50
The mean X    5
n 10 10
The formulae of the sample variance

 2
xi X
S2 
n1
5  52  0  52  9  52  1  52  9  52
 8  52  5  52  5  52  2  62  6  62

10  1
92
  102222
9
 
The standard deviation S  V ariance  102222  31972
Option (3)
Question 21
Which one of the following sample statistics is a measure of spread ( or dispersion)?
1. Sample mean
2. Sample mode
3. Sample median
4. Sample proportion
5. Sample standard deviation
51 STA1610/1
Solution
Option (5)
Question 22
1. The mean, median and mode are measures of central tendency.
2. When a distribution has more values to the left and tails to the right we say, it is skewed negatively.
3. When there is no difference in the values of the mean, median and mode we say, it is a normal distrib-
ution.
4. When a distribution is bell–shaped with the left half identical to the right half, it is symmetrical.
5. For the following data values: 9 7 8 6 9 10 14 the mean, median and mode are all equal.
Solution
1. Correct
2. Incorrect
The distribution is skewed positively.
3. Correct
4. Correct
5. Correct 
xi 9  7  8  6  9  10  14 63
The sample mean X    9
n 7 7
The median is the middle value. The ranked data are: 6 7 8 9 9 10 14
Therefore the median is 9
The mode is the most repeated value which is 9
Option (2)
52
3.3 Quartile and Coefficient of Variation
 Quartiles is to split a set of data into four equal parts called:
 The first quartile Q 1 , which divides the smallest 25% of the values. The position of the first quartile
n1
Q1 
4
 The second quartile

 Q 2, 
is the median since it represents 50% of the values. The position of the second
n1
quartile Q 2  2
4
 
n1
 The third quartile Q 3  represents 75% of the values. the position of the third quartile Q 3  3
4
 The interquartile range equals to the value of Q 3  the value of Q 1
 The boxplot provides a graphical representation of the data based on the five -number summary: The
smallest value, the value of Q 1 , the value of Q 2 = median, the value of Q 3 and the largest value.
Activities
Question 1
The bounced check fees (in Rand) for a sample of 10 banks are:
26 28 20 21 22 25 18 23 15 30
1. The position of Q 1 is 275.
2. The value of Q 2 is not equal always to the median.
3. The position of Q 3 is 825.
4. To calculate the value of Q 3 , ranked first the values from the smallest to the largest.
5. The interquartile range is 6.
53 STA1610/1
Solution
Given the values
26 28 20 21 22 25 18 23 15 30
The ranked data : 15 18 20 21 22 23 25 26 28 30
Q 1  825 Q 2  55 Q 3  825
n1 10  1 11
1. The position Q 1     275
4 4 4
2. Incorrect, the value of Q 2 is always equal to the median.
The median is the average 22 and 23 equals to 225.
 
10  1
The position of Q 2  2  2  275  55
4
To get the value of Q 2 , because 5.5 falls between the 5th and 6th value than the value of Q 2 is the
22  23 45
average of the 5th value and the 6th value:   225
2 2
   
n1 10  1
3. The position of Q 3  3 3  3  275  825
4 4
4. The value of Q 3 , we have to round the position of Q 3  825 to 8 and consider the eight value equals
to 26
5. The interquartile rangeI Q R equals to Q 3  Q 1  26  20  6
The value of Q 1 , we have to round the position of Q 1  275 to 3 and consider the third value in the
ordered array which equals to 20
Option (2)
54
Question 2
The daily electricity consumption in kilowatt hours (kwh) by a sample of 10 household is
51 50 47 33 37 43 61 55 44 41
1. The value of Q 1  275
2. The value of Q 2  455
3. The position of Q 3  825
4. The value of Q 3  51
5. The interquartile range I Q R  10
Solution
The ranked data: 33 37 41 43 44 47 50 51 55 61
n1 10  1 1
1. The position of Q 1     275
4 4 4
To calculate the value of Q 1 , we have to round the position of Q 1  275 to 3 and consider the third
value which is 41
   
n1 10  1
2. The position of Q 2  2 2  2  275  55
4 4
44  47 91
The value of Q 2 is the average of the 5th and 6th value which gives   455
2 2
   
n1 10  1
3. The position of Q 3  3 3  3  275  825
4 4
55 STA1610/1
4. The value of Q 3 , we have to round the position of Q 3 and consider the 8th value in the ranked data
which is equal to 51
5. The interquartile range I Q R : 51  41  10
Option (1)
Question 3
A sample of 13 students marks are: 6 5 6 6 7 8 9 10 10 10 9 9 6
1. The value of the first quartile Q 1 is 35
2. The position of the second quartile Q 2 is 8
3. The median is not always equal to Q 2 
4. The value of the third quartile Q 3 is 85
5. The interquartile range is 35
Solution
The given data 6 5 6 6 7 8 9 10 10 10 9 9 6 n  13
The ranked data 5 6 6 6 6 7 8 9 9 9 10 10 10
n1 13  1 14
4 4 4
For the value of Q 1 , we have to calculate the average of third and the fourth value in an ordered array
66 12
which is   6
2 2
56
   
n1 13  1
2. The position of Q 2  2 2  2  35  7
4 4
3. The median is always equals to the value of Q 2 as they are representing the 50% of the values, alterna-
tively you can show with the calculation.
   
n1 13  1
4. The position of Q 3  3 3  3  35  105
4 4
For the value of Q 3 , we have to calculate the average of 10th and the eleventh value in an ordered array
9  10 19
which is   95
2 2
5. The interquartile range I Q R  95  6  35.
Option (5)
Question 4
The following sample shows the starting salary for new university graduates (in thousands of rand):
307 288 291 311 301

297 307 300 306 305
1. The position of Q 1 is 297
2. The value of third quartile is 825
3. The second quartile is equal to the median.
4. The position of Q 3 is 307
5. The value of the interquartile range is 55.
57 STA1610/1
Solution
Given the following data
307 288 291 311 301 297 307 300 306 305
The ranked data 288 291 297 300 301 305 306 307 307 311
n1 10  1 11
4 4 4
   
n1 10  1
2. The position Q 3  3 3  3  275  825
4 4
To calculate the value of Q 3 we have to round the position of Q 3 and consider the 8th value in the
ranked data which is equal to 307.
3. Correct
   
n1 10  1
4. The position of Q 3  3 3  3  275  825
4 4
5. The interquartile range I Q R  307  297  1
To calculate the value of Q 1  we have to round the position of Q 1  275 to 3 and we consider the 3rd
value in increasing order which is 297.
Option (3)
Question 5
The following data give the amount paid in rentals (in hundreds) for a random sample of 14 one–bedroomed
apartments in Arcadia Pretoria.
14 20 28 16 18 17 21 20 17 18 25 15 20 30
58
1. The location of the first quartile is Q 1  375
2. The location of the second quartile is Q 2  75 and the value of Q 2  190
3. The location of the third quartile is Q 3  1125
4. The value of the second quartile represents the value of the median.
5. The value of the interquartile range is 75
Solution
Given the following data
14 20 28 16 18 17 21 20 17 18 25 15 20 30
The ranked data 14 15 16 17 17 18 18 20 20 20 21 25 28 30
n1 14  1 15
1. The position of Q 1     375
4 4 4
   
n1 13  1
2. The position of Q 2  2 2  2  375  75
4 4
The value of Q 2  we have to calculate the average of 7th and the eight value in an ordered array which
18  20 38
is   19
2 2
   
n1 14  1
3. The position of Q 3  3 3  3  375  1125
4 4
4. Correct
5. The interquartile range I Q R  Q 3  Q 1  21  17  4.
Option (5)
59 STA1610/1
Question 6
A statistics practitioner gives the following output:
The sample size n  105, the value of Q 1  2100, the value of Q 3  2400, the median  22000 and the
sample mean X  22238.
The distribution of the data set is
1. symmetric
2. positively skewed
3. asymptotic
4. negatively skewed
5. normal
Solution
Because the value of the sample mean X  22238 is greater than the value of the median  22000, the
distribution is positively skewed.
Option (2)
60
Chapter 4
Introduction to Probability
SUMMARY
 Probability is a way of expressing the likely occurrence of a particular event as a number between 0
and 1.
 Probability calculations allows us to quantify uncertainty. That means to allow us to express our uncer-
tainty numerically.
 In order to determine any probability, you first need to obtain data. We can obtain data through an
experimentation, observation or experience.
 Outcome: The potential result of a random experiment, where the exact value is unknown before the
experiment, but known after the experiment has been concluded.
 Sample space: An exhaustive list of all the possible outcomes of an experiment.
 An event is any subset of the collection of outcomes of an experiment. That means an event is a
subset of the sample space. In this module we are interested in determining the probability of an event
occurring.
61 STA1610/1
 The probability of each individual outcome lies between 0 and 1.
 The sum of the probabilities of the outcomes in the sample space equals 1.
 The relative frequence is a probability. This is because it is a ratio of the frequency of occurrence of
each outcome or event X to the total number of times the experiment was repeated n.
X
Relative frequency of an event A to occur 
N
 The compliment rule: PA or AC   1 that means PA   P AC   1 therefore PAC   1  PA
 Two events A and B are said to be mutually exclusive if they do not intersect in any way, that means
PA and B  0
 The additive rule for mutually exclusive: PA or B  PA  PB
 The general additive rule: PA or B  PA  PB  PA and B
 Independent events: Two events are independent if the occurrence of one of the events has no influence
on the occurrence of the other event, this means PA and B  PA  PB
 Conditional probabilities will look how to calculate probabilities of events that are conditional on out-
comes of other experiments.
P B and A
PB A  PA and B  P B and A
PA
Activities
Question 1
The sample space of the toss of a fair die is
S  1 2 3 4 5 6
If the die is balanced each simple event has the same probability. Let event A represents an even number and
let event B represents a number less than or equal to 4
62
1. Event A  2 4 6
2. Event B  1 2 3 4
3. Joint events A and B  2 4
3 4
4. P A  P B 
6 6
2
5. P A and B 
4
Solution
From the given information
The sample space S  1 2 3 4 5 6
A  even numbers from the sample space S, that means A  2 4 6
B  a number less than or equal to 4 from the sample space S, that means B  1 2 3 4
1. Correct
2. Correct
3. Correct
Joint events A and B represents element of intersection, appearing in both A and B  2  4
4. Correct
number of outcome of event of A 3
P A     05
total number in S 6
number of outcome of event of B 4

P B     06667
total number in S 6
5. Incorrect
number of outcome of event of A and B 2
PA and B  
total number in S 6
Option (5)
63 STA1610/1
Question 2
The following table lists the joint probabilities of achieving grades of A and not achieving grades of A in two
MBA courses.
Achieve a grade Does not achieve a grade Total

of A in marketing of A in marketing
Achieve a grade of A 0053 0130 0183
in statistics
Does not achieve a grade 0237 0580 0817
of A in statistics
Total 029 071 10
1. PAchieve a grade of A in marketing  029
2. P Achieve a grade of A in statistics  0183
3. Event does not achieve a grade of A in marketing and does not achieve a grade A in statistics are
mutually exclusive events.
4. Events achieve a grade of A in marketing and achieve a grade of A in statistics are independent events.
5. The probability that a student achieves a grade of A in marketing, given that he or she does not achieve
a grade of A in statistics is 02901
Solution
Given the contingency table above
1. Correct
P Achieve a grade of A in marketing   0053  0237  029
2. Correct
PAchieve a grade of A in Statistics  0053  0130  0183
64
3. Incorrect
Two events are mutually exclusive if the probability of the two events equals to zero i.e. PA and B 
0
PDoes not achieve a grade of A in Statistics and Does not achieve a grade of A in marketing 
0580
 0580
10
Since the result of the probability  0580 which is different than zero, therefore the two events are not
mutually exclusive.
4. Correct
Two events are independent events if PA and B  PA   P B where A represent the first event
and B the second event.
0053
PA and B   0053 PA  029 P B   0183
10
PA  PB  029  0183  005307
Because PA and B  0053 is approximately equal to PA   P B  005307, therefore the two
events are independent.
5. Correct
P A and B
Conditional probability rule: PBA  where A represents the first event and B the
PA
second event under the conditional of outcome of A
0237
P A and B   0237 PA  029
10
P B and A 0237
PB A    02901
PA 0817
Option (3)
65 STA1610/1
Question 3
Consider a sample space from an experiment in which a die is rolled. The sample space is S  1 2 3 4 5 6
Let event A represents the event of rolling an odd number, A  1 3 5; let B represent the event of rolling
a number less than or equal to 4, B  1 2 3 4 and C be the event that a 5 or 6 is rolled, C  5 6.
The probability that A and B both occur when the die is rolled is:
3
1.
6
4
2.
6
2
3.
6
1
4.
6
5
5.
6
Solution
The given information are
The sample space S  1 2 3 4 5 6 A  1 3 5 B  1 2 3 4 C  5 6 event:

A and B  1 3
PA and B?

PA and B    03333
total number in S 6
Option (3)
66
Question 4
Given the following table of joint probability:
A1 A2
B1 04 03
B1 02 01
1. PA1   06
2. PB1   07
3. PA1 and B1   04
4. PA1 and A2   1
5. PB1 and B2   0
Given the contingency table of joint probability
A1 A2 Total
B1 04 03 07
B2 02 01 03
Total 06 04 10
1. Correct
PA1   04  02  06
2. Correct
PB1   04  03  07
3. Correct
04
P A1 and B1    04
10
67 STA1610/1
4. Incorrect
P A1 and A2   0
Because the two events are mutually exclusive.
5. Correct
Because the two events are mutually exclusive.
Option (4)
Question 5
The following table represents gas well completions during 1986 in North and South America.
Dry (A) Not dry (B) Total

North America (C) ? 31 45
South America (D) 15 ? ?
29 ? 70
Complete the numbers missing in the above table and calculate PA and C:
1. 14
2. 027
3. 041
4. 064
5. 020
Solution
Given the contingency table

A B Total
C 14 31 45
D 15 10 25
Total 29 41 70
68
number of outcome of event of  A and C 14
PA and C    02
total number 70
Option (5)
Question 6
If PA  04, PB    05 and PA and B  01
1. PB  05
2. PAB  02
3. PA or B  08
4. Events A and B are not mutually exclusive.
5. Events A and B are dependent.
Solution
PA   04 PB C   05
1. Correct
P B  1  P B C   1  05  05
2. Correct
P A and B
PA B  where A represents the first event and B the second event under the
PB
conditional of outcome of A.
PA and B  01
P A and B 01
PA B    02
PB 05
69 STA1610/1
3. Correct
PA or B  PA  PB  PA and B

 04  05  01
 08
4. Incorrect
Because PA and B  01 therefore the two events A and B are not mutually exclusive. Two events
are mutually exclusive when the probability of the two events equals to zero i.e. PA and B  0
5. Correct
Because the two events are dependent when
PA and B  PA   P B

01  04  05
01  02
Option (4)
Question 7
1. The probability refers to a number between 0 and 1 which expresses the change that an event will occur.
2. An experiment is an activity of measurement that results in an outcome.
3. If the event of interest is A, then the probability that A will not occur is the compliment of A.
4. When events A and B are independent then P A and B  P A  P B
5. The sex of the students (males and females) cannot be used as an example of mutually exclusive events.
70
Solution
1. Correct
2. Correct
3. Correct
4. Correct
5. Incorrect
Two events are mutually exclusive when P A and B  0. In this case the probability of having sex
of the students male and sex of students female equals to zero.
Option (5)
Question 8
Suppose that P A  05 and P B  03
1. If events A and B are mutually exclusive, then P A or B  08
2. If events A and B are independent, then P A or B  08
3. If events A and B are mutually exclusive, then P AB  0
4. If events A and B are independent, then P AB  05
5. If P A and B  02 then P A or B  06
71 STA1610/1
Solution
Given information such as: PA   05 P B  03
1. Correct
If A and B are mutually exclusive than PA or B  PA  PB  PA and B
 05  03  0
 08
The two events are mutually exclusive when PA and B  0
2. Incorrect
If A and B are independent than PA or B  PA  PB  PA and B
Events A and B are independent when P A and B  P A  P B

PA and B  PA  PB  05  03  015
 05  03  015
 065
3. Correct
P A and B 0
PA B   0 Because the two events are mutually exclusive PA and B 
PB 03
0
4. Correct
P A and B PA   P B
PA B    PA  05
PB PB
5. Correct

 05  03  02
 06
Option (2)
72
Question 9
The following table shows a sample of voters cross-classified according to place of residence and their pref-
erence for two candidates for parliament:
Place of residence
Preferred candidate Urban Suburban
A 100 30
B 60 40
What is the probability that a voter picked at random will be a suburban dweller and will prefer candidate B?
1. 01739
2. 03043
3. 04348
4. 40
5. 07139
Solution
Given the contingency table

Urban Suburban Total
A 100 30 130
B 60 40 100
Total 160 70 230
40
Psuburban and B   01739
230
Option (1)
73 STA1610/1
Question 10
1 3
Two events A and B are independent such that PA  and PB  . What is the value of PA  B?
4 4
1. 075
2. 025
3. 000
4. 10
5. Need the value of PA and B
Solution
1 3
Given that PA  PB 
4 4
Because A and B are independent: P A and B  P A  P B
P A and B PA   P B 1
PA B    PA   025
PB PB 4
Option(2)
Question 11
The physical science degrees conferred by a school between 1992 and 1995 were broken down as follows:
Gender
Major Male (M) Female (F) TOTAL
Physics (P) 25 25 50
Chemistry (C) 60 40 100
Geology (G) 30 20 50
TOTAL 115 85 200
74
Suppose that a person is selected at random from these graduates.
1. P M  0575
2. P MP  05
3. P M and P  0125
4. P C or F  0275
5. Events M and P are not mutually exclusive.
Solution
The contingency table

M F Total
P 25 25 50
C 60 40 100
G 30 20 50
Total 115 85 200
1. Correct
115
PM   0575
200
2. Correct
P M and P
PM  P 
PP
25
PM and P   0125
200
50
PP   025
200
P M and P
PM  P 
PP
0125

025
 05
75 STA1610/1
25
3. PM and P   0125
200
4. Correct
PC or F  PC  PF  PC and F
100 85 40
  
200 200 200
100  85  40

200
145

200
 0725
5. Correct
25
Because PM and P   0125 and not zero
200
Option (4)
Question 12
Given that P A  07 P B  06 and P A and B  035 which one of the
following statements is incorrect?
 
1. P B   04
2. A and B are dependent events.
3. P BA  050
4. P A or B  095
5. Events A and B are mutually exclusive
76
Solution
Given PA  07 PB  06 PA and B  035
1. Correct
PB C   1  PB  1  06  04
2. Correct
Events A and B are dependent when PA and B  PA  PB
PA  PB  07  06  042
PA and B  PA  PB
035  042
P B and A 035
3. PBA    05
PA 07
4. PA or B  PA  PB  PA and B

 07  06  035
 095
5. Incorrect
Events A and B are mutually exclusive when PA and B  0 but because PA and B  035 ,
we can say that events A and B are not mutually exclusive.
Option (5)
Question 13
Assume A and B are independent events with PA  040 and PB  030
77 STA1610/1
1. PA   060
2. P A and B  012
3. PA or B  058
4. P B  A  03
5. Events A and B are mutually exclusive.
Solution
Considering the information given below
Events A and B are independent when PA and B  PA  PB
PA  040 PB  030
1. Correct
PAC   1  PA  1  040  060
2. Correct
PA and B  PA  PB  040  030  012
3. Correct

 040  030  012
 058
4. Correct
P B and A 012
PB  A    03
PA 04
78
5. Incorrect
Events A and B are independent as PA and B  012 not Zero to satisfy the mutually exclusive
rule.
Option (5)
Question 14
A computer is programmed to generate the eight single-digit integers, 1 2 3 4 5 6 7 and 8 with

equal frequency. Consider the experiment, “the next integer generated”. Define : Event A  “Odd number”
 1 3 5 7, Event B  “Number greater than 4”  5 6 7 8 and Event C  “1 or 2”  1 2, then
PA  B is
1. 05
2. 025
3. 10
4. 075
5. 0125
Solution
The sample space S  1 2 3 4 5 6 7 8
A  1 3 5 7 B  5 6 7 8 C  1 2 Joint event A and B 

5 7 as we are looking for elements that appearing in A and B
P A and B
PA  B 
PB
79 STA1610/1
PA and B    025
total number S 8
number of outcome of event of B 4

P B    05
total number S 8
P A and B 025
PAB    05
PB 05
Option (1)
80
Chapter 5
Discrete Probability Distribution: Binomial

and Poisson
5.1 General Information
SUMMARY
 Discrete probability distribution means probability distribution when dealing with discrete random
variables.

 The sum of the probabilities of the random variables equals to 1. Mathematically that means pxi  
1

 The mean () or the expected value denoted by EX = xi   pxi 

 The variance  2  xi  2  Pxi 

 The standard deviation   ariance
81 STA1610/1
Activities
Question 1
Which one of the following statements is a valid probability distribution:
1.
x 0 1 2 3
p x 0512 0384 0096 0008
2.
x 0 1 2 3
p x 01 03 04 01
3.
x 0 1 2 3
p x 001 001 001 098
4.
x 0 1 2 3
p x 025 046 004 024
5.
x 0 1 2 3
p x 015 025 05 03
Solution
x 0 1 2 3
Px 0512 0384 0096 0008

pxi   0512  0384  0096  0008  1
Option (1)
Question 2
The number of pizzas delivered to university students each month is a random variable with the following
probability distribution
x 0 1 2 3
p x 01 03 ? 02
82
1. p 2  04
2. P 0  X  2  08
3. P 1  X  3  04
4. P X  1  06
5. The mean   E X  07
Solution
Given the probability distribution
x 0 1 2 3
px 01 03 ? 02
1.
p2  1  01  03  02

 1  06
 04
2. P0  X  2  01  03  04  08
3. P1  X  3  P2  04
4. PX  1  04  02  06
5.

  xi pxi 
 0  01  1  03  2  04  3  02
 0  03  08  06
 17
Option (5)
83 STA1610/1
Question 3
Based on past experience, a researcher knows that the probability distribution for X= the number of students
who come to her office on Wednesdays is given as
x 0 1 2 3 4
px 010 020 050 015 005
1. PX  2  02
2. PX  1  010
3. EX  185
4. P1  X  4  07
5. P2  X  4  065
Solution
1. Incorrect
PX  2  050  015  005  07
2. Incorrect
PX  1  010  020  030
3. Incorrect

EX    xi  pxi 
 0  010  1  020  2  050  3  015  4  005
 0  020  1  045  02
 185
84
4. Correct
P1  X  4  050  015  005  07
5. Incorrect
P2  X  4  P3  015
Option (4)
Question 4
Suppose that the number of defective welds in a length of pipe has the probability distribution given below.
X, represents the number of defective welds
x 0 1 2 3 4 5 6
p x 060 020 010 005 003 001 001
The mean (expected value) of the number of defective welds is
1. 35
2. 017
3. 078
4. 138
5. 10
Solution
x 0 1 2 3 4 5 6
px 060 020 010 005 003 001 001

The mean   xi  pxi 
 0  060  1  020  2  010  3  005  4  003  5  001  6  001
 0  020  020  015  012  005  006
 078
Option(3)
85 STA1610/1
Question 5
Suppose X represent the number of the students in STA1610. The probability distribution of X is as follows:
x 1 2 3 4 5
p x 025 033 017 015 010
If the mean   252 then the variance of the best students in STA1610 is
1. 16496
2. 16006
3. 30409
4. 17438
5. 12844
Solution
x 1 2 3 4 5
  252
px 025 033 017 015 010
The variance

2  xi  2  pxi 
 1  2522  025  2  2522  033  3  2522  017  4  2522  015  5  2522  010
 05776  008923  003917  032856  061504
 16496
Option (1)
86
Question 6
The probability distribution of a discrete random variable X is shown below, where X represents the number
of cars owned by a family:
x 0 1 2 3
px 025 040 020 015
Which probability is incorrect?
1. PX  1  035
2. PX  2  085
3. P1  X  2  060
4. PX  1  025
5. P0  X  1  065
Solution
x 0 1 2 3
px 025 040 020 015
1. Correct
PX  1  020  015  035
2. Correct
PX  2  025  040  020  085
3. Correct
P1  X  2  040  020  060
87 STA1610/1
4. Correct
PX  1  P0  025
5. Incorrect
P0  X  1  P1  040
Option(5)
Question 7
After analyzing the frequency with which cross-country skiers participate in their sport, a sportswriter created
the following probability distribution for X  number of times per year cross–country skiers ski.
x 0 1 2 3 4 5 6 7 8
p x 004 009 019 021 016 012 008 006 005
1. P X  3  012
2. P X  5  019
3. P 5  X  7  014
4. P X  3  032
5. The mean   364
Solution
Given the probability distribution
x 0 1 2 3 4 5 6 7 8
px 004 009 019 021 016 012 008 006 005
88
1. Incorrect
PX  3  021
2. Incorrect
PX  5  012  008  006  005  031
3. Incorrect
P5  X  7  012  008  006  026
4. Incorrect
PX  3  004  009  019  021  053
5. Correct

The mean   X i  PX i   0  004  1  009  2  019  3  021  4  016  5  012 
6  008  7  006  8  005  364
Option (5)
Question 8
The mean length of stay in hospital is useful for planning purposes. Suppose that the following is the distrib-
ution of the length of stay in a hospital after a minor operation:
Days 2 3 4 5 6
Probability 005 020 040 020 ?
The mean  (or expected number Ex) length of stay is
1. 015
2. 017
3. 33
4. 40
5. 42
89 STA1610/1
Solution
The given probability distribution
Days 2 3 4 5 6 Total
Probability 005 020 040 020 015 1
p6  1  005  020  040  020  1  085  015

The mean   xi  pxi   2  005  3  020  4  040  5  020  6  015  420
Option (5)
Question 9
Consider the probability distribution of the number of sales as below:
x 0 1 2 3
p x 0512 0382 ? 0008
1. p2  0098
2. P0  x  2  048
3. P1  x  3  0488
4. PX  0  1
5. EX    00602
90
Solution
1. Correct
p2  1  0512  0382  0008  1  0902  0098
2. Correct
P0  x  2  p1  p2  0382  0098  048
3. Correct
P1  x  3  p2  0098
4. Correct
Px  0  p0  p1  p2  p3  0512  0382  0098  0008  1
5. Incorrect

EX    x  px
 0  0512  1  0382  2  0098  3  0008
 0602
Option (5)
91 STA1610/1
5.2 Discrete Probability Distribution
5.2.1 Binomial Probability Distribution
SUMMARY
 Binomial distribution is a discrete random variable that describes the number of successful outcomes
of n simple independent trials that can either succeed or fail.
 Binomial distribution is associated with questions that allow "yes" or "no" type of answers. This means
we have two outcomes as "success" or "failure".
 The probability of success for each simple trial is the same and is denoted by 
n!
 To calculate the binomial probability we can use either the formula px   x 1  nx
x!n  x!
5!  5  4  3  2  1  120 or we can use the binomial statistics tables enclosed at the end of the
section 2.
 The mean   n  
 The variance  2  n1  

 The standard deviation   ariance
 The question on how identifying binomial random variables?
1. The experiment consists of n identical trials.
2. Each trial results in one of two outcomes, which we can define as either a success or a failure.
3. The outcomes from trial to trial are independent.
4. The probability of success () is the same for each trial. In other hand the probability of failure
is 1  
5. The random variable equals the number of successes in the trials, and can only take on whole
number values between 0 and 1.
92
Activities
Question 1
Given a binomial random variable with n  6 and   020 (Hints: Use the formulae or the binomial Table
1 to calculate the following probabilities).
The incorrect statement is
1. P X  2  02458
2. P X  5  00005
3. P X  1  06553
4. P X  5  00016
5. The mean   12
Solution
The probability of success   020
The sample size n  6
1. Correct
PX  2?
93 STA1610/1
Using the formulae: 6!  720 2!  2 4!  24 0202  004 0804  04096 so
6!
p2   0202 1  02062
2!6  2!
720
  004  04096
2  24
 15  004  04096  024576
Using the statistics tables:
n X 0.01 0.02 0.20 X n

2 0
1
2
:
:
6 0 0.2621
1 0.3932
2 0.2458
3 0.0819
4 0.0154
5 0.0015
6 0.0001
2. Incorrect
PX  5?
Because PX  5  00015
3. Correct
PX  1  PX  0  PX  1  02621  03932  06553
4. Correct
PX  5  PX  5  PX  6  00015  00001  00016
5. Correct
The mean   n    6  020  12
Option (2)
94
Question 2
A certain type of tomato seed germinates 90% of the time. A backyard farmer planned 25 seeds. The expected
number (mean) of seeds that germinate is
1. 225
2. 259
3. 0036
4. 2778
5. 225
Solution
  090 n  25 ?
The mean   n    25  090  225
Option(5)
Question 3
Suppose that 10% of butterflies have damaged wings. If a random sample of 10 butterflies is selected, what
is the probability that more than four have damaged wings?
1. 00112
2. 00128
3. 00016
4. 09372
5. 09984
95 STA1610/1
Solution
The given information
  010 n  10 PX  4?
There are two ways of calculating this probability:
1.
PX  4  P X  5
 PX  5  PX  6  PX  7  PX  8  PX  9  PX  10
 00015  00001  00000  00000  00000  00000 (binomial tables)
 00016
2.
PX  4  P X  5
 1  PX  4
 1  PX  0  PX  1  PX  2  PX  3  PX  4
 1  03487  03874  01937  00574  00112 (binomial tables)
 1  09984
 00016
Option (3)
Question 4
Using the Binomial distribution, if n  4 and   020 then PX  2 is
1. 01536
2. 09728
3. 01808
4. 00272
5. 08192
96
Solution
  020 n4 PX  2?
PX  2  PX  0  PX  1

 04096  04096
 08192
Option (5)
Question 5
In the Limpopo province about 30% of adults have four–year college degrees. Suppose five adults are ran-
domly selected. Calculate the expected value (mean) and the standard deviation of this binomial distribution.
1. 15 and 105
2. 35 and 045
3. 15 and 1025
4. 030 and 12
5. 105 and 125
Solution
The mean   030 n5 mean? ?
The mean   n    5  030  15
  
The standard deviation   n1    5  030  1  030  105  10247
Option (3)
97 STA1610/1
Question 6
According to a report from the center for studying health system change, 20% of South Africans delay or go
without medical care because of concerns about cost. Suppose six individuals are randomly selected. The
probability that more than 4 will delay or go without medical care is
1. 00015
2. 09984
3. 00154
4. 00016
5. 00017
Solution
  020 n6 PX  4?

PX  4  PX  5  PX  6
 00015  00001
 00016
Option (4)
Question 7
Suppose that an admission test for a certain university is designed so that the probability of passing it is 45%
Find the probability that among 5 candidates who take the test, more than 3 will pass.
1. 02757
2. 01128
3. 01313
4. 00185
5. 00102
98
Solution
  045 n5 PX  3?
PX  3  PX  4  PX  5

 01128  00185
 01313
Option (3)
Question 8
The probability that a certain machine will produce a defective item is 025. If a random sample of 6 items is
taken from the output of this machine, what is the probability that there will be at least five defectives items
in the sample?
1. 02373
2. 00044
3. 00046
4. 025
5. 15
Solution
The given information are shown below
  025 n6 PX  3?
PX  5  PX  5  PX  6

 00044  00002
 00046
Option (3)
99 STA1610/1
Question 9
A motor company has purchased steel parts form a supplier for several years and has found that 10% of the
parts must be returned because they are defective. An order of 5 parts is received. What is the probability
that more than three of these parts are defective?
1. 00004
2. 00001
3. 05905
4. 00729
5. 00086
Solution
This is a binomial probability distribution
  010 n5 PX  3?
PX  5  PX  4  PX  5

 00004  00000
 00004
Option (1)
100
5.2.2 Poisson Probability Distribution
SUMMARY
 Poisson distribution is a discrete probability distribution that describes the probability of X events to
occur during a specified interval that could be time, distance, area or volume, if the average occurrence
is known and the events are independent of the specified interval denoted  since the last event occurred.
 To calculate the probability of a Poisson random variable. we can use either the formula or the poisson
statistics tables provided,
e x
The formula is PX  x  
x!
x, represents the number of occurrences of an event.
x! is the factorial of x
 is a positive number that represents the expected number (or the mean ) of occurrences for a given
interval.
e is the symbol of the probability function of the natural logarithm (e  271828).
 The mean and variance for Poisson distribution are closely identical, that why we can say   V arianceX 
meanX
Activities
Question 1
Calculate the probability that three bank robberies occurred in a day where the number of bank robberies that
occur in a large Gauteng city is Poisson distributed with a mean  equals to 18 per day. (Hints: Use the
Poisson tables to calculate the probability).
1. 01653
2. 02975
3. 01607
4. 02768
5. 03329
101 STA1610/1
Solution
The mean   18 PX  3?
Using the formula
27182818  183
PX  3  where 27182818  01653
3!
183  5832 3!  3  2  1  6
01653  5832
PX  3 
6
PX  3  01607
Using the Poisson distribution tables
X 01 02  10

0
1
:
7
X 11 12 13 14 15 16 17 18 19 20
0
1
2
3 00738 00867 00998 01128 01255 01378 01496 01607 01710 01804
4
:
9
Option (3)
Question 2
The number of accidents that occur at a busy intersection is Poisson distributed with a mean of 3.5 per week.
The probability of no accidents in one week is
102
1. 00302
2. 00000
3. 01359
4. 03020
5. 10000
Solution
Considering the information as given below
The mean   35 PX  0
Using the formulae:

27182835  350
PX  0  where 27182835  00302 350  1 0!  1
0!
00302  1

1
 00302
X 01 02 03 10
X 31 32 33 34 35 36 37 38 39 40
0 00450 00408 00369 00334 00302 00273 00247 00224 00202 00183
Option (1)
103 STA1610/1
Question 3
Poisson distribution was used to model the number of faults in the gearboxes of buses. Suppose the faults
occur at an average of 2.5 per month. The probability that at a least 2 faults are found in a month is
1. 07127
2. 00821
3. 02565
4. 09692
5. 07172
Solution
Given that the mean   25 PX  2?
PX  2  P2  P3  P4  P5  P6  P7  P8  P9  P10  P11  P12
 02565  02138  01336  00668  00278  00099  00031  00009  00002  00000
 07126
104
X 01 02 03 10
X 21 22 23 24 25 26 27 28 29 30
0 01225 01108 01003 00907 00821 00743 00672 00608 00550 00198
1 02052
2 02565
3 02138
4 01336
5 00668
6 00278
7 00099
8 00031
9 00009
10 00002
11 00000
12 00000
Alternatively we know that the total number of each column equals to 1. In this case we will count from 0 up
to 12. To easily resolve this problem we have to add the probability 0 and 1 and the result we subtract from
1, this means
PX  2  1  PX  1
 1  P0  P1
 1  00821  02052
 1  02873
 07127
Option (1)
105 STA1610/1
Question 4
Assume that X is a Poisson random variable with an average   60
1. P X  0  00025
2. The variance is 245
3. P X  3  0062
4. P X  5  05543
5. P 2  X  5  02824
106
Solution
Using the Poisson statistics tables with a mean   60
X 01 02 03 10
X 51 52 53 54 55 56 57 58 59 60
0 00061 00055 00050 00045 00041 00037 00033 00030 00027 00025
1 00149
2 00446
3 00892
4 01339
5 01606
6 01606
7 01377
8 01033
9 00688
10 00413
11 00225
12 00113
13 00052
14 00022
15 00009
16 00003
17 00001
18 00000
107 STA1610/1
1. Correct
PX  0
2. Incorrect
The variance equals to the mean  6
3. Correct
PX  3  PX  0  PX  1  PX  2

 00025  00149  00446
 00620
4. Correct
There are two-way of doing it, we can add the values from PX  6  01606 up to PX  18 
00000 as indicated by the above tables or we can use the second procedure demonstrated below
PX  5  P X  6
 1  PX  5
 1  PX  0  PX  1  PX  2  PX  3  PX  4  PX  5
 1  00025  00149  00446  00892  01339  01606
 1  04457  05543
5. Correct
P2  X  5  P1  X  4  PX  4  P0

 PX  0  PX  1  PX  2  PX  3  PX  4  P X  0
 00025  00147  00446  00892  01339  00025
 02824
Option(2)
108
Question 5
A random variable Y has the Poisson distribution with parameter 38 The probability PY  5 is then equal
to
1. 01477
2. 08156
3. 01944
4. 08344
5. 01844
Solution
We are given:
The mean   38 PX  5?
PX  5  PX  0  PX  1  PX  2  PX  3  PX  4  PX  5

 00224  00850  01615  02046  01944  01477
 08156
Option (2)
Question 6
Tomatoes for January in Kansas follow a Poisson distribution with an average of 32 per month. The proba-
bility that in the next January Kansas will experience exactly 2 tomatoes is:
1. 07913
2. 04076
3. 02087
4. 01304
5. 02226
109 STA1610/1
Solution
The mean   32 PX  2?
PX  2  02087 using the statistics tables
Option (3)
Question 7
A cashier at Caster’s cafeteria can total an average of 12 trays per minute.
1. The problem follows a Poisson distribution.
2. The variance of this problem is 12 trays per minute.
3. The probability that the cashier will total exactly zero tray per minute is 03012.
4. The probability that the cashier will total at least four trays per minute is 00338.
5. The probability that the cashier will total at most three trays per minute is 08795.
Solution
The mean   12
1. Correct
The occurrence number is given as 1.2 per interval of time
2. Correct
The variance is equal to the mean
3. Correct
PX  0  03012
110
4. Correct
PX  4  1  PX  0  PX  1  PX  2  PX  3

 1  03012  03614  02169  00867
 1  09662  00338
5. Incorrect
PX  3  PX  0  PX  1  PX  2  PX  3

 03012  03614  02169  00867
 09662
Option (5)
Question 8
Poisson distribution was used to model the number of faults that arise in the gearboxes of buses. Suppose the
faults occur at an average rate of 25 per month. The probability that at least one PX  1 fault is found in
a month is
1. 02052
2. 02873
3. 07127
4. 09179
5. 09197
Solution
The mean   25
PX  1  1  PX  0
 1  00821
 09179
Option (4)
111 STA1610/1
112
Chapter 6
Continuous Probability Distribution
6.1 Normal Probability Distribution
SUMMARY
 Continuous probability distribution used a random variable that is continuous variable.
 In this module we explore the concept of normal probability alone without discussing other continuous
probability distribution.
 A normal distribution is symmetric, bell shaped curve and centred at its mean value.
 A normal distribution has two parameters: The mean  and the variance  2 , we denote a normal
random variable by N   2 
 To calculate the probability of a random variable, we have to perform the concept of standardisation.
The standardisation is a way to standardise the values from normal population so that they have a
standard mean and variance. By so doing we remove the unit of measurement of the variable (for
example kilograms, meters or second) to the standardised values that have no unit.
 By standardising a normal random variable X with a mean  and the standard deviation  , we create a
random variable called Z that has a standard normal distribution.
113 STA1610/1
 The standard normal variable is a normal random variable with a mean equal to zero and the standard
deviation equal to 1. We denote the standard normal variable by Z0 1
 The calculation performed to convert a normal population N   2  into a standard normal Z denoted
X 
by Z0 1 we use the formula Z  

 The standard normal variable is symmetric, bell-shape, asymptotic and the total area under a normal
curve equal to 1.
 We use the standard normal Z statistics tables to calculate the probabilities.
Activities
Question 1
Using the standardise normal table Z, calculate PZ  165 where Z is normally distributed with a mean
equals to 0 and the variance equals to 1. The correct answer is
1. 00495
2. 00548
3. 09452
4. 09505
5. 06139
Solution
PZ  165?
Z is normally distributed with the value of the mean equal to Zero and the variance equals to 1.
0.9505
_ 1.65 0 Z
114
Because we shade a big area, we use the positive Z- standardised normal table.
Z 000 001 002 003 004 005 006 007 008 009
00
01
02
16 09452 09463 09474 09484 09495 09505 09515 09525 09535 09545
PZ  165  09505
Option (4)
Question 2
The long–distance calls made by the employees of a company are normally distributed with a mean of 63
minutes and a standard deviation of 22 minutes. Use the normal standardised table to calculate the probability
that a call last less than 7 minutes.
1. 06179
2. 06293
3. 03821
4. 04880
5. 05120
115 STA1610/1
Solution
The population mean   63 The population standard deviation   22 PX  7?
X 
Let us convert the random variable X into Z, using the standardised formulae Z  

 
763
PX  7  P Z   PZ  003
22
0.5120
0 0.03 Z
Because we shade a big area under a normal curve, we use the positive Z-standardised tables
Z 000 001 002 003 004 0.05 006 007 008 009
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
01
02
3.0
PZ  003  05120
Option (5)
116
Question 3
Suppose Z is normally distributed with a mean   0 and the variance  2  1, calculate PZ  159.
1. 09441
2. 00559
3. 00668
4. 01469
5. 00559
Solution
PZ  159?
0.0559
_ 1.59 0 Z
Because we shade a small area under a normal curve, we use the negative Z-standardised tables
Z 000 001 002 0.03 004 0.05 006 007 008 009
- 3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
29
28
:
-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
-0.0
PZ  159  00559
Option (2)
117 STA1610/1
Question 4
X is normally distributed with mean   100 and the standard deviation   20. What is the probability
that X is greater than 145?
1. 09878
2. 00139
3. 09778
4. 09861
5. 00122
Solution
The population mean   100 The population standard deviation   20 PX  145?
X 
Let us convert the random variable X into Z, using the standardised formulae Z  

 
145  100
PX  145  P Z   PZ  225
20
0.0122
0 2.25 Z
118
Because we shade a small area under a normal curve, we use the negative Z-standardised tables.
Z 000 001 002 0.03 004 0.05 006 007 008 009
31
29
:
22 00139 00136 00132 00129 00125 00122 00119 00116 00113 00110
PZ  225  00122
Option (5)
Question 5
The distribution that has a mean of zero and a standard deviation of one is called the
1. binomial probability distribution.
2. Poisson probability distribution.
3. standard normal distribution.
4. frequency distribution.
5. skewed distribution.
Solution
Option (3)
119 STA1610/1
Question 6
If the Z-score is given as Z  196, and the distribution of X is normally distributed with a mean   60
and a standard deviation   6, then the x-value that this Z-score corresponds to is
Solution
Z  196  6   60 X?
X 
The standardised Z 

X  60
196 
6
X  60
6  196  6  Let multiply both side by 6 to solve the equation
6
1176  X  60
1176  60  X
7176  X
Option (1)
120
Question 7
For a random variable Z from the standard normal distribution with a mean   0 and a standard deviation
  1, which one of the following is incorrect?
1. PZ  152  00643
2. PZ  148  09306
3. PZ  043  06664
4. PZ  074  07704
5. P210  Z  234  00083
Solution
1. Correct
PZ  152  00643
0.0643
_ 1.52 0 Z
Because we have shaded the a area than we use the negative Z-standardised normal table.
2. Correct
PZ  148  09306
0.9306
0 1.48 Z
We use the positive Z-standardised normal tables
121 STA1610/1
3. Correct
PZ  043  06664
0.6664
_ 0.43 Z
0
We have used the positive Z-standardised normal tables
4. Incorrect
PZ  074  07704
0.2296
0 0.74 Z
We have used the negative Z-tables
5. Correct
P210  Z  234  PZ  234  PZ  210

 09904  09821
 00083
122
0.9821
0 2.10 Z
2.34
0.9904
In this case we read the values based on the sign of the number that means a negative Z value from
the negative Z-tables and a positive Z value from a positive Z-tables than we subtract the two values
obtained. We have used the positive Z-tables because the two Z values are positive 210 and 234.
Option (4)
Question 8
For a particular group of scores, the calculated mean and standard deviation are 20 and 5 respectively. The Z
score for a raw score of 30 is
1. 55
2. 2
3. 26
4. 10
5. 2
123 STA1610/1
Solution
  20  5 X  30
X 
The standardised Z 

30  20
Z 2
5
Option (2)
Question 9
A psychologist has been studying eye fatigue using a particular measure, which she administers to students
after they have worked for 1 hour writing on a computer. On this measure she has found that the distribution
follows a normal curve. Using a normal probabilities table, what is the probability of students having Z–score
below 15?
1. 00668
2. 09394
3. 09332
4. 00606
5. 09345
Solution
PZ  15?
0.9332
0 1.5 Z
Because we have shaded a big area, we have to use the positive Z-standardised tables.
Option (3)
124
Question 10
The average high school teacher annual salary is R43 000 Let teacher salary be normally distributed with a
standard deviation of R18 000 Calculate P X  R80 000?
1. 00228
2. 09803
3. 206
4. 09772
5. 00197
Solution
The population mean   43000 The population standard deviation   18000 PX  80000?
X 
Let us convert the random variable X into Z, using the standardised Z 

 
80000  43000
PX  80000  P Z   PZ  206
18000
0.0197
0 2.06 Z
Because we shade a small area, we have to use the negative Z-standardised tables.
Option (5)
125 STA1610/1
Question 11
If the area to the right of a positive Z 1 is 00869 then the value of z 1 must be
1. 136
2. 136
3. 05319
4. 008
5. 180
Solution
0.0869
0 Z ?
We were given the value of the area called probability equals to 00869, we have been asked to get the
corresponding value of Z. Because the value of the area is small, we use the negative Z-standardised tables
knowing that a standardised Z normal distribution is symmetric.
Z 000 001 002 0.03 004 0.05 006 007 008 009
31
29
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0853 0.0838 0.0823
0.0869
The value of Z is 13  006  136.
Option (2)
126
Question 12
Given that Z is a standard normal variable, the variance of Z
1. is always greater than 20
2. is always greater that 10
3. is always equal to 10
4. is always equal to zero
5. Cannot be calculated
Solution
Option (3)
Question 13
Suppose that Z is normally distributed with a mean   0 and the variance equals 1.
Which of the following statements is incorrect?
1. PZ  264  00041
2. PZ  087  01922
3. P14  Z  14  08384
4. PZ  28  09974
5. PZ  074  PZ  074
127 STA1610/1
Solution
1. Correct
0.0041
0 2.64 Z
2. Correct
0.1922
_ 0.87 0 Z
3. Correct
P14  Z  14  PZ  14  PZ  14
 09192  00808
 08384
128
0.0808
_ 1.4 0 1.4 Z
0.9192
4. Incorrect
0.0026
_ 2.8 0 Z
PZ  28  00026
5. Correct
PZ  074  02296
PZ  074  02296
0.2296
_ 0.74 0 Z
129 STA1610/1
0.2296
0 0.74 Z
Option (4)
Question 14
The owner of an appliance store uses a normal distribution with mean 10 and variance 9 to model the weekly
net sales Calculate PX  35 ?
1. 2167
2. 00062
3. 02358
4. 09850
5. 00150
Solution

The population mean   10 The population standard deviation   93 PX  35?
X 

130
 
35  10
PX  35  P Z   PZ  217  00150
3
0.0150
_ 2.17 0 Z
Because we shade a big area under a normal curve, we use the positive Z-standardised tables
Z 000 001 002 0.03 004 0.05 0.06 007 008 009
31
29
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
:
Option (5)
Question 15
Consider Z is normally distributed with the mean   0 and the variance  2  1
1. P Z  160  00548
2. P Z  155  00606
3. P 140  Z  060  06494
131 STA1610/1
4. P Z  209  00019
5. P Z  167  09525
Solution
1. Incorrect
PZ  160  09452
0.9452
_ 1.60 0 Z
2. Correct
PZ  155  00606
0.0606
0 1.55 Z
3. Incorrect
P140  Z  060  PZ  060  PZ  140
 07257  00808
 06449
132
0.0808
_ 1.40 Z
0 0.6
0.7257
4. Incorrect
PZ  209  09817
0.9817
0 2.09 Z
5. Incorrect
PZ  167  00475
0.0475
_ 1.67 Z
0
Option (2)
133 STA1610/1
Question 16
The distribution of weights of a large group of high school students is normally distributed with a mean of
55 kg and a standard deviation of 5 kg. What is the probability of weights of a large group of high school
students will be more than 63 kg?
1. 09452
2. 00458
3. 01446
4. 00548
5. 08554
Solution
The population mean   55 The population standard deviation   5 PX  63?
X 

 
63  55
PX  63  P Z   PZ  160  09452
5
0.9452
0 1.60 Z
Option (1)
134
Question 17
A courier service company has found that their delivery time of parcels to clients is normally distributed with
a mean of 45 minutes and a standard deviation of 8 minutes. The probability that a randomly selected parcel
will take less than 48 minutes to deliver is
1. 0375
2. 032
3. 00648
4. 03520
5. 06480
The population mean   45 The population standard deviation   8 PX  48?
X 

 
48  45
PX  48  P Z   PZ  0375  PZ  038  06480
8
0.6480
0 0.38 Z
Option (5)
135 STA1610/1
136
Chapter 7
Sampling Distribution of the Mean and

Proportions
SUMMARY
 We explore the concept of taking a sample from a population, and discuss the distributional properties
of the sample mean and sample proportion calculated from these samples.
 The sample mean is normally distributed when the underlying data are normally distributed.
 The idea behind probability sampling is random selection, and for this reason we typically call these
random samples.
 In this module will not discuss the categories of probability samples such as sample random, systematic,
stratified and cluster.
 Sampling distribution describes how a sample statistic varies when calculated from all samples of size
n drawn from a single population.
137 STA1610/1
7.1 Sampling Distribution of the Mean
SUMMARY
 A sample mean X is an unbiased estimator for the population mean  , because the mean (expected
value) of all sample means of size n selected from the population is equal to the population mean,  .
This is denoted by EX  
 Unbiased means when the mean (expected value) of the sampling distribution of a statistic is equal to
the population parameter, that statistic is said to be unbiased estimator of that parameter, that is, on
average, we say that the statistic estimates the correct value.
 If the variance of the variable of X is  2 , then the variance of the sample mean X , denoted by
2
 2X  .
n

 The standard deviation of the mean X , which we call the standard error of X is denoted by  X   
n
 If a random variable X is normally distributed with the mean  and the variance  2 , this is denoted
by X~N   2 , then the random variable
  is also normally distributed with the mean  and the
X
2 2
variance  this is denoted by X ~N  .
n n
 To calculate the probability under a random variable X , we have to convert X into the standardised
X 
Z-value given by the formulae Z    

n
X
 The test statistic is Z    

n
138
Activities
Question 1
A sample of n  16 observations is drawn from a normal population with a mean   1000 and a standard
 
deviation   200 Use the standardised normal Z tables to calculate the probability P X  1050
1. 08413
2. 01587
3. 05398
4. 08438
5. 04602
Solution
The sample size n  16 The population mean   1000 The population standard deviation   200
PX  1050?
Step 1
Because we are given a random variable X we need to convert X into Z by using the formulae of the test
X 
statistic Z    as it is developed below

n
X  1050  1000 50
If X  1050what is the corresponding of Z? Z       1
 200 50
n 
16
139 STA1610/1
Step 2
PX  1050? can be written as PZ  1? and this takes us back to the concept of chapter 6. To solve this
probability we draw a normal graph as shown below
0.1587
0 1.00 Z
Because we shade a small area under a normal curve, we use the negative Z-table to calculate the probability.
PZ  100  01587
Option (2)
Question 2
A normally distributed population has a mean of 40 and a standard deviation 12. The standard error of the
sample mean if the sample size is 100 equal to:
1. 4
2. 28
3. 12
4. 02
5. 12
140
Solution
The population mean   40
The population standard deviation   12
 12
The standard error  X      12
n 100
Option (3)
Question 3
For a random variable that is normally distributed, with a population mean   80 and a population standard
deviation   10, the probability that a simple random sample of 25 items will have a mean that is less than
85 is
1. 09798
2. 09938
3. 00062
4. 00054
5. 00202
141 STA1610/1
Solution
The given information are as below
 The random variable is normally distributed
 The population mean   80
 The population standard deviation   10
 The sample size n  25
 The question is PX  85?
Step 1
Because we are given a random variable Xwe need to convert X into Z by using the formulae of the test
X
statistic Z    as shown below

n
X 85  80 5
If X  85what is the corresponding of Z? Z      25
 10 2
n 25
Step 2
PX  85? can now be written as PZ  25? and this takes us back to the concept of chapter 6. To solve
this probability we have a normal graph as shown below
0.9938
0 2.5 Z
Because we shade a big area under a normal curve, we use the positive Z-table.
PZ  24  09938
Option (2)
142
Question 4
1. The sampling distribution of the mean will have the mean as the original population from which the
samples were drawn.
2. The standard deviation of the sampling distribution of the mean is also called the standard error.
3. When the population mean   160 the population standard deviation   25 n  64 the standard
error is 3125
4. A confidence interval is an estimate for which there is a specified degree of certainty that the population
parameter will be in the interval.
5. We use the t–distribution for the statistical inference of the population mean when the population
standard deviation is known under the assumption that the population is normally distributed.
Solution
1. Correct
Because the mean (expected value) of all sample means of size n selected from the population is equal
to the population mean,  . This is denoted by EX  
2. Correct
3. Correct
When X  160   25 n  64 The standard error  X ?

 25
 X      3125
n 64
4. Correct
5. Incorrect
We use the t-distribution when the population standard deviation is unknown but the sample standard
deviation is known.
Option (5)
143 STA1610/1
Question 5
For a random variable that is normally distributed, with a population mean   80 and a population standard
deviation   10, the probability that a simple random sample of 25 items will have a mean that is between
79 and 85 is
1. 09938
2. 03085
3. 06853
4. 04997
5. 06835
Solution
We are given
The population mean   80 The population standard deviation   10 The sample size n  25
The question is P79  X  85?
Step 1
Because the random variable X is given, we need to convert X into Z by using the formulae of the test
X
statistic Z    as shown below

n
X 79  80
When X  79what is the corresponding value of Z? Z     1
2
 05
 10
n 25
X 85  80 5
When X  85what is the corresponding value of Z? Z      25
 10 2
n 25
144
Step 2
P79  X  85? can now be written as P05  Z  25? and this takes us back to the concept of
chapter 6. To solve this probability we have to draw a standardised normal graph as shown below
Because the shade is between the two Z-values, the probability is equal to the difference between P(Z  2.5)
and P(Z  -0.5) as we read based on the sign of the Z- value. A negative value to the negative Z and a
positive value to the positive Z normal tables.
P05  Z  25  PZ  25  PZ  05
 09938  03085
 06853
0.6853
0.3085
_ 0.5 0 2.5 Z
0.9938
Option (3)
Question 6
A random sample of size n  12 was selected from a population and the data are as follows:
36 61 47 23 51 82 71 12 71 65 42 50
The sample mean 5092 and the sample standard deviation  2064
The standard error (or the standard deviation of the mean) is equal to
145 STA1610/1
1. 1720
2. 35490
3. 0379
4. 20637
5. 59583
Solution
The data : 36 61 47 23 51 82 71 12 71 65 42 50 n  12

X 36  61  47  23  51  82  71  12  71  65  42  50 611
The sample mean X    
n 12 12
509166

2 XX2
n1
36  5091662 61  5091662 47  5091662 23  5091662

51  5091662 82  5091662 71  5091662 12  5091662
71  5091662 65  5091662 42  5091662 50  5091662
S2   4259015
12  1

The sample standard deviation is S  4259015  206374
S 206374
The standard error  X      59575
n 12
Option (5)
146
Question 7
The amount of time it takes to complete a final examination is normally distributed with a mean of 75
minutes and a standard deviation of 8 minutes. If 64 students were randomly sampled, the probability that
the sample mean of the sampled students exceeds 76 minutes is
1. 08413
2. 04602
3. 01587
4. 05398
5. 01578
Solution
The mean   75  8 n  64 PX  76?
 
X 76  75 1
PX  76  PZ      P Z  PZ    PZ  1  01587
  864  1
n
0.1587
0 1.00 Z
Option (5)
147 STA1610/1
Question 8
Employers in a large manufacturing plant worked an average of 620 hours of overtime last year, with a
standard deviation of 150 hours . For a random sample of 36 employees, the probability that the average
number of overtime hours will be greater than 58 hours is
1. 04452
2. 03936
3. 00548
4. 09452
5. 04364
Solution
The population mean   620   150 n  36 PX  58?
     
X 58  62 4
PX  58  P  Z      P Z P Z  PZ  16  00548
  1536  25
n
0.0548
0 1.6 Z
Option (2)
148
Question 9
Given a normal population whose mean is 50 and whose standard deviation is 5, find the probability that a
random sample of 25 has a mean greater than 52.
1. 09772
2. 09778
3. 00228
4. 00222
5. 02280
Solution
  50  5 n  25 PX  52?
     
X 52  50 2
PX  52  P  Z      P Z  P Z  PZ  2  00228
  525  1
n
0.0228
0 2.00 Z
Option (3)
149 STA1610/1
Question 10
A random sample of size n  12 was selected from a population and the data are as follows:
36 61 47 23 51 82 71 12 71 65 42 50
mean x  5092 Standard deviation S  2064
The standard error of the sample mean is equal to
1. 1720
2. 35490
3. 0379
4. 20637
5. 59583
Solution
S 2064
The standard error of the sample mean is equal to     59583
n 12
Option (5)
150
7.2 Sampling Distribution of the Proportion
SUMMARY
 The population proportion is denoted by 
 The sample proportion is denoted by p
 The sampling distribution for the sample proportion is approximated using the normal distribution by
making use of the formula  p  
 The standard deviation of the sampling

 distribution (called also standard error of the proportion ) when
 1  
 is known is given by  p  
n
 The standard deviation of the sampling

 distribution (called also standard error of the proportion ) when
p 1  p
 is unknown is given by  p  
n
 The approximation for the sampling distribution of the proportion is valid if:
1. the population of success , is not too close to 0 or 1.
2. the sample size n, is reasonably large.
3. the products n   and n1   are at least 5.
 The stardardised sample mean Z-value is given by the formula commonly called the test statistics
p
Z
 1  
n
 We can now conduct probability calculations for the sample proportion in the same way as for the
sample mean.
151 STA1610/1
Activities
Question 1
In a binomial experiment with the sample size n  300 and the sample proportion p  05 the standard
error for proportion p is
1. 00283
2. 00008
3. 01238
4. 00016
5. 02886
Solution
The sample size n  300 The sample proportion p  05 The standard error for proportion p ?
 
p 1  p 05 1  05
The standard error of the proportion  p    00283
n 300
Option (1)
Question 2
Determine the probability that in a sample of 100 the sample proportion is less than 075 if   080:
1. 125
2. 004
152
3. 01056
4. 01469
5. 08944
Solution
P p  075?   080 n  100
Step 1
Because the random variable p is given, we need to convert p into Z by using the formulae of the test statistic
p
Z as shown below
 1  
n
p 075  080

When p  075what is the corresponding of Z value. Z     
 1   080 1  080
n 100
005
 125
004
Step 2
P p  075? can now be written as PZ  125? and this takes us back to the concept of chapter 6. To
solve this probability we have to draw a standardised normal graph as shown below
0.1056
_ 1.25 0 Z
Because we shade a small area than we use the negative Z-tables.
Option (3)
153 STA1610/1
Question 3
A random sample of size n = 400 was selected from a binomial population with the population proportion
  02. The number of observed successes in the sample is 96.
1. The sample proportion is p  024
2. The standard error is 002
3. The test statistic is 202
4. P p  024  00228
5. This is a case of a sampling distribution for proportion.
Solution
The sample size n  400 The population proportion   02 The number of successes X  96
1. Correct
X 96
The sample proportion p    024
n 400
2. Correct
 
 1   02 1  02 
The standard error  p    00004  002
n 400
3. Incorrect
p 024  02 004

The test statistic is Z   Z  2
 1   002 002
n
154
4. Correct
P p  024?
p
Let us convert p into Z by using the formulae of the test statistic Z   as shown below
 1 
n
p 024  02 004

Z   2
 1   02 1  02 002
n 400
P p  024  PZ  2  00228
5. Correct
Option (3)
Question 4
Consider a population proportion   068 The standard error of the proportion for n  20 is
1. 01043
2. 00109
3. 02176
4. 01404
5. 0034
Solution
  068 n  20 The standard error  p ?
 
 1   068 1  068 
n 20
Option (1)
155 STA1610/1
Question 5
A simple random sample with n  300 is drawn from a binomial process in which   04. The test statistic
for the proportion of success p  035 is
1. 17668
2. 00283
3. 137843
4. 17668
5. 16786
Solution
n  300   04 p  035 the test statistic Z?
p
The test statistic Z  
 1  
n
p 035  04 005 005

Z       17668
 1   04 1  04 00008 00283
n 300
Option (4)
156
Question 6
In a random sample of 85 people from a population, X is the number of left-handed people. In the population
a proportion p  020 of the people are left-handed. The standard error for proportion equals to:
1. 00019
2. 00434
3. 00024
4. 080
5. 04338
Solution
n  85 The sample proportion p  020 The standard error  p ?
 
p 1  p 020 1  020 
n 85
Option (2)
Question 7
The probability of success on any trial of a binomial experiment is 25%. Find the probability that the pro-
portion of success in a sample of 500 is less than 22%.
1. 00606
2. 09332
3. 09394
4. 00668
5. 06060
157 STA1610/1
Solution
The population proportion   025
The sample proportion p  022
P p  022?
Step 1
Because the random variable p is given, we need to convert p into Z by using the formulae of the test
p
statistic Z   as demonstrated below
 1  
n
p 022  025

When p  022what is the corresponding Z value? Z     
 1   025 1  025
n 500
003
 125
004
Step 2
P p  075? can now be written as PZ  125? and this takes us back to the concept of chapter 6. To
solve this probability we draw a standardised normal graph as indicated below
0.1056
_ 1.25 0 Z
Because we shade a small area than we use the negative Z-tables.
Option (3)
158
Question 8
A random sample of 50 households was selected for a telephone survey. The key question asked was, “Do you
or any member of your household own a cellular telephone with a built–in camera?” Of the 50 respondents,
15 said yes and 35 said no.
The population standard error of households with cellular telephones with built–in camera is
1. 03
2. 00042
3. 00648
4. 07
5. 01684
Solution
The number of successes X  15
X 15
n 50
 
p 1  p 03 1  03 
The standard error S p    00042  00648
n 50
Option (3)
159 STA1610/1
160
Chapter 8
Point Estimations and Confidence Intervals
8.1 Point Estimations
 In many population, we do not know the value of the population mean and proportion. fortunately, we
can use the sample mean (or proportion) to provide an estimate of the population value. We can use
the information provided in a sample. This is called estimation.
 The objective of estimation is to determine the appropriate value of a population parameter on the basis
of the sample statistic.
 To estimate the population value we can proceed by point estimate and interval estimates.
8.2 Confidence Interval
 Confidence interval estimate provide a range of possible values that the true parameter value can assume
along with the degree of confidence that the parameter value lies within the interval.
 We often talk about a 95% or 90% or 99% confidence interval for a parameter value.
161 STA1610/1
 Confidence level is represented by the probability value 1   associated with a confidence interval,
that means the interval contains the specified parameter with probability (1  ).
 In this section we will discuss the confidence interval of the mean and the confidence interval of the
proportion.
8.2.1 Confidence Interval of the Mean
 The data have to be normally distributed.
 When the population standard deviation  is known, the confidence interval of the mean is given by
the formulae

X  Z 2  
n
where Z 2 is called the critical value of the confidence interval and this value can be expressed by the
standardized normal Z-tables.
 When the population standard deviation  is unknown (but given the sample standard deviation S of
the data ), the confidence interval of the mean is determined by the formulae.
  S
X  t n  1 2  
n
 
where t n  1 is called the critical value of the the confidence interval and this value can be
2
expressed by the t-student table. In most cases the sample size n  30
162
Activities
Question 1
A statistics practitioner took a random sample of 50 observations from a population with a standard deviation
of 25 and a sample mean of 100. The 95% confidence interval of population mean is
1. 9307 10693
2. 100 69296
3. 931 10693
4. 9418 10582
5. 3907 10629
Solution
The question talks about the confidence interval of the mean
The sample mean X  100
The confidence interval of the mean is given by the formula below with   005

X  Z 2  
n
25
100  Z 005  
2
50
100  Z 0025  35355
163 STA1610/1
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233
The corresponding value of 0025 equals to Z  19  006  196
100  196  35355
100  69296
100 69296  100  69296
930704  1069296
Option (1)
Question 2
From the information given below, determine the 95% confidence interval of the population mean:
X  200 n  100 S5
1. 18902 2098
2. 109008 20192
3. 199008 200992
4. 200 0992
5. 198 209
164
Solution
The sample standard deviation S  5
The confidence interval of the mean is given by the formulae as shown below with   005
  S
X  t n  1  
2 n
 
005 5
200  t 100  1  
2 100
200  t 99 0025  05
The t-student tables
Degrees of freedom 010 005 0025 001 0005

1
:
99 1290 1660 1984 2365 2626
The critical value  table value at 99 degrees of freedom and   005 equals to 1984
200  1984  05
200  0992
200 0992  200  0992
199008  200992
Option (3)
165 STA1610/1
Question 3
A random sample of 25 was drawn from a normal distribution with a population standard deviation of 5. The
sample mean is 80. The 95% confidence interval estimate of the population mean is:
1. 7804 8196
2. 77936 82064
3. 80 196
4. 7840 8159
5. 7804 8196
Solution
The confidence interval of the mean is given by the formulae below with   005

X  Z 2  
n
5
80  Z 005  
2
25
80  Z 0025  1
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233
166
80  196  1
80  196
80 196  80  196
7804  8196
Option (1)
Question 4
It is know that the ages are normally distributed with a mean x  4375 and a sample standard deviation
S  1505 for a random sample of 8 men in a bar. The 95% confidence interval of the mean is:
1. 4375 125841
2. 311659 563341
3. 131659 653344
4. 4375 104291
5. 333209 541791
Solution
The sample mean X  4375
The sample standard deviation S  1505
167 STA1610/1
The confidence interval of the mean is given by the formulae as shown below with   005
  S
X  t n  1  
2 n
 
005 1505
4375  t 8  1  
2 8
4375  t 7 0025  53210

1
:
7 1415 1895 2365 2998 3499
4375  2365  53210
4375  125842
4375 125842  4375  125842
311658  563342
Option (2)
Question 5
A random sample of 25 was drawn from a population. The sample mean and the sample standard deviation
are 510 and 125 respectively.
When calculating a confidence interval of the population mean, the t-test should be used because
168
1. the population standard deviation is known.
2. the population mean is known.
3. the sample standard deviation is known.
4. the population standard deviation is unknown.
5. the sample size is small, regardless of the population standard deviation.
Solution
Option (4)
Question 6
A simple random sample of 30 has been collected from a population for which it is known that the population
standard deviation is   100 . The sample mean has been calculated as 240. The 95% confidence interval
for the population mean is
1. 2364215  2435785
2. 240  35785
3. 2362664  2437336
4. 2362451  2437558
5. 2435785  2364251
Solution
The population standard deviation   100
169 STA1610/1
The confidence interval of the mean is given by the formulae as shown below with   005

X  Z 2  
n
10
240  Z 005  
2
30
240  Z 0025  18257
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233
240  196  18257
240  35784
240 35784  240  35784
2364216  2435784
Option (1)
Question 7
A random sample of 24 observations is used to estimate the population mean. The sample mean and the
sample standard deviation are calculated as 1046 and 288 respectively. The critical value (table value) to
be used to construct a 95% confidence interval for the population mean is
170
1. 196
2. 1645
3. 2069
4. 2064
5. 1711
Solution
The sample standard deviation S  288
The level of significance   005
  
 005
The critical value for the confidence interval equals t n  1  t 24  1  t 23 0025 
2 2
2069

1
:
23 1319 1714 2069 2500 2807
The critical value  table value at 23 degrees of freedom and   005 equals to 2069.
Option (3)
171 STA1610/1
Question 8
A research firm conducted a survey to determine the mean amount steady smokers spend on cigarettes dur-
ing a week. They found the distribution of amounts spent per week followed a normal distribution with a
population standard deviation of R5 A random sample of 49 steady smokers revealed that the sample mean
X  R20 Determine the 95% confidence interval for 
1. 1860 2140
2. 1937 2063
3. 1980 2020
4. 1983 2017
5. 18825 21175
Solution

X  Z 2  
n
5
20  Z 005  
2
49
20  Z 0025  07143
172
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233
20  196  07143
20  14000
20 14000  20  14000
186  214
Option (1)
Question 9
A random sample of 40 men drank with an average of 20 cups of coffee per week during a final examination,
with a sample standard deviation Sx equal to 6 cups. A lower limit of an appropriate 90% confidence interval
for the population average number of cups of coffee drunk, is
1. 20000
2. 217674
3. 19708
4. 182326
5. 18014
173 STA1610/1
Solution
The sample standard deviation S  6
The confidence interval of the mean is given by the formulae as below with   010
  S
X  t n  1  
2 n
 
010 6
20  t 40  1  
2 40
20  t 39 005  09487

1
:
39 1303 1683 2021 2423 2704
20  1683  09487
20  17674
20 17674  20  17674
182326  217674
The lower limit  182326 and the upper limit  217674.
Option (4)
174
Question 10
The principle purpose of a 95% confidence interval for a mean is
1. to estimate a sample mean.
2. to test a hypothesis about a sample mean.
3. to estimate a population mean.
4. to provide an interval that covers 95% of the individual values in the population.
5. to estimate a population proportion.
Solution
Option (3)
Question 11
The average cost per night of a hotel room in Port Elizabeth township is R273. Assume this estimate is based
on a sample of 46 hotels and that the sample standard deviation is R65. The 95% confidence interval of the
population mean cost per night is
1. 2536984 2923016
2. 2567195 2892805
3. 2540083 2919917
4. 2651248 2872631
5. 2925285 2534715
175 STA1610/1
Solution
The sample mean X  273
The sample standard deviation S  65
The sample size n  46
The confidence interval of the mean is given by the formulae below with   005
  S
X  t n  1  
2 n
 
005 65
273  t 46  1  
2 46
273  t 45 0025  95837

1
:
45 1301 1679 2014 2412 2690
The critical value  table value at 45 degrees of freedom and   0025 equals to 2014.
273  2014  95837
273  193016
273 193016  273  193016
2536984  2923016
Option (1)
176
Question 12
The mean X  10 for a sample of 100 and the population standard deviation found as 1. The upper limit of
the 90% confidence for the population mean estimate is
1. 0196
2. 10196
3. 9804
4. 101645
5. 00196
Solution

X  Z 2  
n
1
10  Z 010  
2
100
10  Z 005  01
177 STA1610/1
Z 000 001 002 003 004 005 006 007 008 009
30
:
16 00548 00537 00526 00516 00505 00495 00485 00475 00465 00455
00505  00495
we can see that 005is between  005
2
164  165
In the same way the corresponding value of Z is  1645
2
10  1645  01
10  01645
10 01645  10  01645
98355  101645
The upper limit equals to 101645
The lower limit equals to 98355
Option (4)
Question 13
A simple random sample of 30 has been collected from a population for which it is known that the population
standard deviation   100. The sample mean has been calculated as 2400. The 95% confidence intervals
for the population mean is
1. 2364216 2435784
2. 240 35785
3. 2362664 2437336
4. 2632451 2347558
5. 2435785 2634251
178
Solution
n  30   10 X  240   005

X  Z 2  
n
10
240  Z 005  
2
30
240  Z 0025  18257
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233
240  196  18257
240  35784
240 35784  240  35784
2364216  2435784
Option (1)
179 STA1610/1
8.2.2 Confidence Interval of the Proportion
 The data have to be normally distributed.


p 1  p
 The confidence interval of the proportion is expressed by the formulae p  Z  
 
2
n
p 1  p  1  
where  for estimating the population proportion.
n n
number of successes
 The sample proportion is p 
Total
 The population proportion is 
 The critical value when estimating the confidence interval is Z 2 

p 1  p
 The standard error of the proportion is S p  
n
p
 The test statistic Z   
 1  
n
Activities
Question 1
An airline has surveyed a simple random sample of travelers to find out whether they would be interested
in paying a higher fare in order to have access to e-mail during their flight. Of the 400 travelers surveyed,
80 said e-mail access would be worth a slight extra cost. The manager wants to construct a 95% confidence
interval for the population proportion of air travelers who are in favor of the airline’s e-mail idea.
1. The sample proportion is p  02
2. The critical value at   5% is 11645
3. The standard error for proportion is 002
180
4. The lower confidence limit is 01608
5. The upper confidence limit is 02392.
Solution
This question is related to the confidence interval of the proportion.
The sample size n  400
The number of successes X  80
1. Correct
number of successes 80
Total 400
2. Incorrect
The critical value is Z 2  Z 005  Z 0025  196

2
The standardized Z normal table
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233
3. Correct
 
p 1  p 02 1  02
The standard error of the proportion is S p    002
n 400
4. Correct
181 STA1610/1
5. Correct

p 1  p
The confidence interval of the proportion is p  Z 2 
n
02  196  002
02  00392
02  00392  02  00392
01608  02392
The lower confidence limit is 01608.
The upper confidence limit is 02392
Option (2)
Question 2
A random sample of 80 observations results in 50 successes. The 95% confidence interval for the population
proportion of successes is
1. 0625 01060
2. 0536 0714
3. 0622 0628
4. 0519 0731
5. 0591 0371
182
Solution
Total 80

2
The standardized normal table
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233
 
p 1  p 0625 1  0625
The standard error of the proportion is S p    00541
n 80

p 1  p
n
0625  196  00541
0625  01060
0625  01060  02  01060
0519  0731
Option (4)
183 STA1610/1
Question 3
According to statistics reported on STATSA, a surprising number of motor vehicles are not covered by in-
surance. The results, consistent with the STATSA report, showed 46 of 200 vehicles were not covered by
insurance. The 95% confidence interval estimate for the population proportion is
1. 01716 02884
2. 02317 02283
3. 023 02884
4. 02884 01716
5. 01810 02790
Solution
Total 200

2
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
 
p 1  p 023 1  023
The standard error of the proportion is S p    00298
n 200
184

p 1  p
n
023  196  00298
023  00584
023  00584  023  00584
01716  02884
Option (1)
185 STA1610/1
186
Chapter 9
Hypothesis Testing of the Mean and

Proportions
SUMMARY
 Hypothesis is a proposition or statement made by usually a researcher or an analyst concerning the

nature or value of some unknown population quantity.
 This statement needs to be substantiated using real data. This is because we do not want to take the
researcher or analyst statement simply like this but we need to collect the data in an attempt to confirm
the hypothesis.
 The process of testing these hypotheses using sample data is called statistical hypothesis testing. This
involves the knowledge of probability to come to the conclusion about the stated hypothesis.
 The conclusion is about whether the hypothesis should be rejected or not.
 A series of five steps are used to determine whether we reject or not a null hypothesis based on a sample
data as follows:
Step 1. To formulate or to provide the formal hypothesis statements called null hypothesis H0 and the
alternative hypothesis H1 
187 STA1610/1
Step 2. To determine the appropriate test for the given hypothesis statement. This can be between
one-tail test or two-tail test depending on the problem being considered. This step has to be correctly
specified.
When the symbol:  or  appear in the alternate hypothesis H1 , we refer to one tail-test.
When the symbol:  appears in the alternate hypothesis H1 , we refer to two-tail test.
Step 3. To specify the level of significance denoted by  . The level of significance is a fixed probability
of making the error of rejecting the null hypothesis even though it is true. This is specified by the
researcher and represents the degree of accuracy that the test should exhibit. In practice, we use 
equals 5%, 10% or 1%.
Step 4. To calculate the relevant test statistic. This is a quantity calculated from a sample data. It is
used to determine if the null hypothesis H0 should be rejected or not.
Step 5. To make a decision we consider three different approaches in this module:
(1) By using the test statistic and the critical value.
When the test statistic is greater than the critical value, we will reject the null hypothesis H0
The critical value or the table value, it defines the range of possible values of the test statistic
for which we will reject H0 , otherwise we fail to reject H0 
(2) By using the p-value.
When the p-value is less than the level of significance , we will reject the null hypothesis H0
otherwise we fail to reject H0
p-value is the probability of getting a value of the test statistic as extreme as more extreme than
that observed by chance alone, if the null hypothesis is true.
In making use of the above (1) and (2), our decision will be based on the nature of the alternative
hypothesis (whether it is one-tail or two-tail test).
(3) By using the confidence interval,
When the value zero lies between the two confidence limits, we fail to reject H0 and when the
value zero lies outside of the confidence limits, we reject H0 .
 Type 1 error occurs when the null hypothesis H0 is rejected whereas it is in fact true.
 Type error 2 occurs when the null hypothesis H0 is not rejected whereas it is in fact false.
188
9.1 Hypothesis Testing of the Mean
X
 The test statistic when the population standard deviation is known equals to Z   


n
X
 The test statistic when the sample standard deviation is known equals to t   
S

n
 To calculate the p-value, we need to know both
the value of the test statistic Z and the level of significance  
The calculation of p-value will be based on the nature of the alternative hypothesis H1 (Whether it is
one-tail or two-tail test). In case of a two-tail test, the value of the p-value is multiplied by two.
 To calculate the critical value,

we need to know the level of significance  and make use of the standardised test statistic Z .
The calculation of critical value will be based on the nature of the alternative hypothesis H1 (Whether
it is one-tail or two-tail test). In case of a two-tail test the value of the level of significance  is divided
by two before getting the corresponding value Z from the table.
Activities
Question 1
Consider the following information
H1 :   1000 vs H1 :   1000
  200 n  100 x  980   001
1. A tow tailed is used.
2. The standard error is 20.
3. The test statistic is 01.
4. The critical value is 258
5. Reject H0 when p-value is less than   001.
189 STA1610/1
Solution
H0 :   1000 s H1 :   1000
  200 n  100 X  980   001
1. Correct
Because H1 is using the symbol 
2. Correct
 200
The standard error  X      20
n 100
3. Incorrect
X 980  1000

The test statistic Z     1
 20

n
4. Correct
The critical value depends on the nature of the alternative hypothesis H1 and level of significance 
 001
Because of two-tail test :   0005
2 2
Z 000 001 002 003 004 005 006 007 008 009
30
:
25 00062 00060 00059 00057 00055 00054 00052 00051 00049 00048
1. The nearest critical value Z  258
5. Correct
Option (1)
190
Question 2
Consider testing the hypotheses
H0 :   50 s H1 :   50
If n  64 X  535   005
1. A one-tail is appropriate in this situation.
2. The critical value at 25% level is 196
3. The test statistic is 280
4. The p-value is 00052
5. H0 is rejected at the 25% level of significance.
Solution
H0 :   50 s H1 :   50
  10 n  64 X  535   0025
1. Correct
Because H1 is using the symbol 
191 STA1610/1
2. Correct
Because of one-tail test :   0025
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
The critical value Z  196
3. Correct
X 535  50 35

The test statistic Z        28
 10 125
 
n 64
4. Incorrect
The p-value depends on the nature of hypothesis and the test statistic Z(Because of one-tail test, the
value of p-value is not multiply by two)
p-value = PZ  28 This takes us back to chapter 6.
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
p-value =PZ  28 = 00026
5. Correct
Reject H0 when p-value is less than   0025 Since 00026  0025, H0 is rejected at 5% level of
significance.
Option (4)
192
Question 3
For a sample of 35 items from a population for which the population standard deviation is   205, the
sample mean is x  4580. At the 005 level of significance, the tutor wants to test H0 :   450 against
H1 :   450.
1. A two-tailed test is appropriate to this situation.
2. The critical value is 196
3. The test statistic z  231
4. The p-value is 00104
5. H0 is rejected at the 5% level of significance.
Solution
H0 :   450 s H1 :   450
  205 n  35 X  458   005
1. Correct
Because H1 is using the symbol 
193 STA1610/1
2. Correct
 005
Because of two-tail test :   0025
2 2
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
3. Correct
X 458  450 8

 205 34651
 
n 35
4. Incorrect
The p-value depends on the nature of hypothesis and the test statistic Z(Because of two-tail test, the
value of p-value will be multiplied by two)

p-value  2 PZ  23087  2 PZ  231 This takes us back to chapter 6.
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
p-value  2 00104  00208
5. Correct
Reject H0 when p-value is less than   005 Since 00208  005, H0 is rejected at 5% level of
significance.
Option (4)
194
Question 4
A professor of statistics refuses the claim that the average student spends 3 hours studying for the exam.
Which one of the following hypothesis is used to test the claim?
1. H0 :   3 H1 :   3
2. H0 :   3 H1 :   3
3. H0 :   3 H1 :   3
4. H0 :   3 H1 :   3
5. H0 : x  3 H1 : x  3
Solution
Option (2)
Question 5
Consider testing the hypotheses

H0 :   400 vs H1 :   400
If the value of the test statistic z equals 087 then the p–value is:
1. 01922
2. 08078
3. 03844
4. 04681
5. Need more information.
195 STA1610/1
Solution
H0 :   450 s H1 :   450
The test statistic Z  087
The p-value depends on the nature of hypothesis and the test statistic Z(Because of two-tail test, the value
of p-value will be multiplied by two)
p-value  2 PZ  087

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
p-value  2 01922  03844
Option (3)
Question 6
A laboratory tested a random sample of 30 chicken eggs and found that the mean amount of cholesterol
per egg is 235 milligrams and the standard deviation is 20 milligrams. If H0 :   230 is tested against
H1 :   230 at the 5% significant level, with the assumption that the cholesterol of chicken eggs is normally
distributed and suppose that the test statistic Z is equal to 137
What is the p–value of the test?
1. 00853
2. 09147
3. 08533
4. 06125
5. 01706
196
Solution
H0 :   230 s H1 :   230
The p-value depends on the nature of hypothesis and the test statistic Z
p-value  2 PZ  137
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
p-value  2 00853  01706
Option (5)
Question 7
Which one of the following describe correctly a possible way to do hypothesis testing?
A. Calculate the test statistic and compare it to the significance level.
B. Calculate the p–value from the significance level, calculate the test statistic and compare the test sta-
tistic to the p–value.
C. Calculate the test statistic, find the corresponding p–value and compare the p–value to the significance
level.
D. Find the critical Z–value from the significance level, calculate the test statistic and compare the test
statistic to the critical Z–value.
197 STA1610/1
E. Find the critical Z–value from the significance level, calculate the p–value from the critical Z–value,
and compare the p–value to the significance level.
1. B only
2. B and D
3. E only
4. B and E
5. C and D
Solution
A. Incorrect
The test statistic is compared to the critical value.
B. Incorrect
The p-value is calculated from the test statistic and compare the p-value to the level of significance.
C. Correct
D. Correct
p-value is calculated from the test statistic and the critical value is calculated from the level of significance
and we can compare the test statistic and the critical value.
Option (5)
198
Question 8
In the hypothesis process, you should
1. define the statistical hypotheses H0 and H1 
2. determine the region of acceptance of H0
3. calculate the test statistic.
4. compare the test statistic to the critical value.
5. all the above
Solution
Option (5)
Question 9
A bakery stated that the average number of breads sold daily is 3000. An employee thinks that the actual
value might differ from this and wants to test this statement. The correct hypotheses are:
1. H0 :   3000 H1 :   3000
2. H0 :   3000 H1 :   3000
3. H0 :   3000 H1 :   3000
4. H0 :   3000 H1 :   3000
5. H0 :   3000 H1 :   3000
199 STA1610/1
Solution
Option (1)
Question 10
In testing the hypothesis H0 :   15 vs   15 the following information was given:
 5 X  181 n  10   003. The test statistic is
1. 0025
2. 196
3. 0975
4. 062
5. 196
Solution
H0 :   15 s H1 :   15
 5 n  10 X  181   003
X 181  15 31

 5 15811
 
n 10
Option (2)
200
Question 11
A researcher wants to carry out a hypothesis test involving the mean for a sample of size n  18. The popula-
tion standard deviation is unknown, but she is reasonably sure that the underlying population is approximately
normally distributed. The test statistic she should use in carrying out the analysis is:
1. Z–test
2. t–test
3. Binomial test
4. Poisson test
5. Normal test
Solution
Option (2)
9.2 Hypothesis Testing of the Proportion
p
The test statistic when the population standard deviation is known Z   
1
n
The population proportion is 
The sample proportion is p
201 STA1610/1
Activities
Question 1
Calculate the p–value of the test of the following hypothesis given that the sample proportion p  063 n 
100 and the calculated test statistic z  005 The null hypothesis and the alternative hypothesis are
H0 :   060 vs H1 :   060
1. 04801
2. 05000
3. 05199
4. 06915
5. 07088
Solution
H0 :   060 s H1 :   060
p  063 n  100 Z  005
The p-value depends on the nature of hypothesis and the test statistic Z . This is one -tail test, the value of
p-value will not be multiply by two.
p-value = PZ  005
202
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
p-value = 0.4801
Option (1)
Question 2
A statistics practitioner formulated the following hypotheses.
H0 :   070 vs H1 :   070
A random sample of 100 produced p  073 with a test statistic of 066, the p-value is:
1. 003
2. 00003
3. 05239
4. 07454
5. 02546
203 STA1610/1
Solution
H0 :   070 s H1 :   070
p  073 n  100 Z  066
The p-value depends on the nature of hypothesis and the test statistic Z (This is a one-tail test).
p-value = PZ  066
Z 0.00 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09

- 3.0
:
-0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
p-value = 0.2546
Option (5)
Question 3
A popular weekly magazine asserts that fewer than 40% of households in South Africa have changed their
lifestyles because of escalating gas prices. A recent survey of 100 households finds that 67 households have
made lifestyle changes due to escalating gas prices.
1. H0 :   040 vs H1 :   040
2. The test statistic is 55102
3. The critical value when   10% is 128
4. The p–value equals 00000
5. Conclusion: H0 is not rejected at 10% level of significance.
204
Solution
Given the following
H0 :   040 s H1 :   040
number o f successes 67
The sample proportion p   100
 067
T otal
1. Correct
Because the statement speaks about the fewer of households therefore H1 is using the symbol 
2. Correct
p 067  040 027

The test statistic Z        55102
1 0401040 004900
n 100
3. Correct
The critical value depends on the nature of the alternative hypothesis H0 and level of significance
  010
The standardised Z normal table
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.2 0.1151 0.1131 0.1093 0.1075 0.1076 0.1056 0.1038 0.1020 0.1003 0.0985
205 STA1610/1
4. Correct
The p-value depends on the nature of hypothesis and the test statistic Z
p-value = PZ  55102  PZ  551 = 0.0000
5. Incorrect
Rule: Reject H0 when the p-value is less than the level of significance  . Since 0.0000  0.10, we
reject H0 at 10% of level of significance.
Option (5)
Question 4
In testing the hypothesis H0 :   040 H1 :   040 at the 5% significance level, if the sample proportion
is 045 and the p–value is 00764 the appropriate conclusion would be:
1. to reject H0
2. not to reject H0
3. to reject H1
4. need more information
5. to accept H0 because p–value is minimum
Solution
H0 :   040 s H1 :   040   005
The sample proportion p  045
The p-value 00764
206
Rule : Fail to reject H0 when the p-value is greater than the level of significance  . Since 0.0764  0.05,
we fail to reject H0 at 5% of level of significance.
Option (2)
207 STA1610/1
208
Chapter 10
Chi–squared Test
SUMMARY
 Chi-squared test is applied when dealing with data that are categorical ( nominal or qualitative) in
nature.
 Chi-squared test makes used of a contingency tables for two or more qualitative variables.
 The Chi-squared test operates by comparing the observed values in a frequency tables or contingency
tables to the values that we would expect if a given hypothesis is true.
 A series of five steps are used to determine whether we reject or not a null hypothesis based on a sample
data as follows:
Step 1. To formulate or to provide the formal hypothesis statements called null hypothesis H0 and the
alternative hypothesis H1 
H0 : The two variables are independent or the two variables are not related
H1 : The two variables are dependent or the two variables are related
Step 2. To determine the appropriate degrees of freedom df : The number of rows minus 1 times the
number of column minus 1 denoted by r  1c  1 for the given frequency tables called observed
frequency tables.
209 STA1610/1
Step 3. To specify the level of significance denoted by  . The level of significance is a fixed probability
of making the error of rejecting the null hypothesis even though it is true. This is specified by the
researcher and represents the degree of accuracy that the test should exhibit. In practice, we use 
equals 5%, 10% or 1%.
Step 4. To calculate the relevant test statistic. This is a quantity calculated from an observed frequency.
It is used to determine if the null hypothesis H0 should be rejected or not.
 
2
 OE 2
The test statistic  
E
Where O represents the observed frequency
E represents the expected frequency. The expected frequency is calculated by taking
The Row total of cell  Column total of the cell

Total number  n
Step 5. To make a decision by using the test statistic and the critical value.
Rule : When the test statistic is greater than the critical value, we will reject the null hypothesis H0
 The critical value or the table value, it defines the range of possible values of the test statistic for which
we will reject H0 , otherwise we fail to reject H0 
Activities
Question 1
If a contingency table has 4 rows and 5 columns, how many degrees of freedom are there for the chi–square
 2 test for independence?
1. 20
2. 12
3. 15
4. 9
5. 10
210
Solution
The degrees of freedom df = r  1c  1  4  15  1  3  4  12
Option (2)
Question 2
Use the following contingency table:
A B Total
Yes 40 25 65
No 35 45 80
Total 75 70 145
The professor wants to test the independence for the two variables given in columns and in rows at the 5%
level of significance.
1. The null hypothesis H0 is the two variables are independent.
2. The alternative hypothesis is the two variables are dependent.
3. The critical value is 3841
4. The expected frequency for Yes and B is 25
5. Suppose that the calculated test statistic  2  45455 the null hypothesis H0 is rejected at 5% level of
significance.
211 STA1610/1
Solution
The observed contingency tables
A B Total
Yes 40 25 65
  005
No 35 45 80
Total 75 70 145  n
1. Correct
2. Correct
3. Correct
The critical value is determined by knowing the value of the degrees of freedom and the level of
significance   005
The degrees of freedom df = r  1c  1  2  12  1  1  1  1
Using the Chi-squared tables.
df 0.10 0.05 0.025 0.01 0.005
1 2.706 3.841 5.024 6.635 7.879

:
Therefore the critical value is 3.841.
4. Incorrect

The expected frequency 
Total number  n
The Row total for Yes  Column total of B


Total number  n
65  70
  313793
145
5. Correct
Rule : Reject H0 when the value of the test statistic is greater than the critical value. Since 4.5455 
3.841, H0 is rejected at 5% level of significance.
Option (4)
212
Question 3
A sport preference poll showed the following data for men and women:
Favorite sport
Gender Basketball Football Golf Tennis Total
Male 24 17 30 18 89
Female 21 20 22 12 75
Total 45 37 52 30 164
Use the 5% level of significance and test to determine whether sport preferences depend on gender.
1. Gender and favorite sport are independent.
2. Gender and favorite sport are dependent.
3. The degrees of freedom is 3.
4. The critical value is 7815.
5. Suppose that the calculated test statistic  2  330, the conclusion is to reject the null hypothesis H0 
Solution
1. Correct
2. Correct
3. Correct
213 STA1610/1
4. Correct
Using the Chi-squared tables
df 0.10 0.05 0.025 0.01 0.005

1 2.706 3.841 5.024 6.635 7.879
:
3 6.251 7.815 9.348 11.345 12.838
:
Therefore the critical value is 7815
5. Incorrect
Rule : Reject H0 when the value of the test statistic is greater than the critical value. Since 330 
7815, H0 is not rejected at 5% level of significance.
Option (5)
Question 4
In a test for the independence of two variables, one of the variables has two possible categories and the other
has three possible categories. Suppose that the calculated test statistic is  2  3456 at the 5% level of
significance.
1. H0 : The two variables are independent.
2. H1 : The two variables are dependent.
3. The degrees of freedom is 6
4. The critical value is 5991
5. Fail to reject H0 at the 5% level of significance.
214
Solution
The number of row equals to 2.
The number of column equals to 3.
The value of the test statistic  2  3456
The level of significance   005
1. Correct
2. Correct
3. Incorrect
4. Correct
df 0.10 0.05 0.025 0.01 0.005

1 2.706 3.841 5.024 6.635 7.879
2 4.605 5.991 7.378 9.210 10.597
3 6.251 7.815 9.348 11.345 12.838
:
5. Correct
Rule : Reject H0 when the value of the test statistic is greater than the critical value. Since 3.456 
5.991, H0 is not rejected at 5% level of significance.
Option (3)
215 STA1610/1
Question 5
To perform a chi-squared test of independence you don’t require
1. Two or more nominal variables.
2. The distribution to be negatively skewed.
3. The degrees of freedom.
4. The level of significance.
5. A test of a contingency table.
Solution
Option (2)
Question 6
Two employees (Peter and John) are monitored to determine whether there is any difference in the proportions
of acceptance parts produced by the employees. The sample of parts produced is given below:
Quality Employee Total

Peter John
Acceptable 265 ? 950
Unacceptable ? ? ?
Total ? 700 1000
1. H0 : The two variables are dependent.
2. The observed frequency for acceptable and employee John is 665
3. The expected frequency for acceptable and employee Peter is 265
4. The degree of freedom is 1
5. Suppose the calculated test statistic  2  40 H0 is not rejected at 5% level of significance.
216
Solution
Peter John Total

Acceptable 265 685 950
  005
Unacceptable 35 15 50
Total 300 700 1000  n
1. Incorrect
The two variables are independent
2. Incorrect
The observed frequency is 685.
3. Incorrect
The expected frequency 
Total number  n
The Row total for acceptable  Column total of Peter


Total number  n
950  300
  285
1000
4. Correct
The degrees of freedom d f  r  1c  1  2  12  1  1  1  1
5. Incorrect
Rule : Reject H0 when the value of the test statistic is greater than the critical value.
The critical value is determined by knowing the value of the degrees of freedom and the level of
significance   005
df 0.10 0.05 0.025 0.01 0.005
1 2.706 3.841 5.024 6.635 7.879

:

Since the test statistic  2  40  3841, H0 is rejected at 5% level of significance.
Option (4)
217 STA1610/1
Question 7
Let X  level of income and Y  political preference. Use the results shown in the table below and test on
a 1% level of significance whether the political preference and the level of income are independent.
Political party
A B C
Level 1 23 11 1
of 2 40 75 31
income 3 16 107 60
4 2 14 10
Suppose that the test statistic calculated is 693875, Which statement is incorrect?
1. H0 :The variables are independent H1 :The variables are dependent.
2. The degrees of freedom d f  12
3. The critical value  16812
4. There are 390 people.
5. H0 is rejected. The variables are dependent.
Solution
A B C T otal
1 23 11 1 35
2 40 75 31 146
3 16 107 60 183
4 2 14 10 26
T otal 81 207 102 390  n
Given the calculated test statistic  693875.
218
1. Correct
2. Incorrect
3. Correct
df 0.10 0.05 0.025 0.01 0.005

1 2.706 3.841 5.024 6.635 7.879
:
6 10.645 12.592 14.449 16.812 18.458
4. Correct
5. Correct
Since the test statistic  2  693875  16812, H0 is rejected at 1% level of significance.
Option (2)
Question 8
The trustee of a company’s pension plan has solicited the opinions of a sample of the company’s employees
about a proposed revision of the plan. A breakdown of the responses is shown in the table below. We want
to test if there is enough evidence to infer that the response differ among the three groups of employees.
Responses Blue collar White collar Managers TOTAL

For 67 32 11 110
Against 63 18 9 90
TOTAL 130 50 20 200
Which option provides the correct expected observations in order to calculate the  2 value?
219 STA1610/1
1.
Responses Blue collar White collar Managers
For 11 32 67
Against 9 18 63
2.
For 715 275 11
Against 585 225 9
3.
For 72 27 11
Against 59 22 9
4.
For 71 28 11
Against 58 23 9
5.
For 71 28 11
Against 59 22 9
Solution
The observed frequencies
Group of employees
Responses Blue collar White collar Managers Total
For 67 32 11 110
Against 63 18 9 90
Total 130 50 20 200
Expected frequencies
Responses Blue collar White collar Managers Total
For 71.5 27.5 11 110
Against 58.5 22.5 9 90
Total 130 50 20 200
220
The calculation of the expected frequencies are as follows using the formula

Total number  n
T he Ro total o f For  Column total o f the Blue
For  Blue collar 
T otal number  n
110  130
  715
200
T he Ro total o f For  Column total o f the W hite
For  White collar 
T otal number  n
110  50
  275
200
T he Ro total o f For  Column total o f the Managers
For  Managers 
T otal number  n
110  20
  11
200
T he Ro total o f Against  Column total o f the Blue
Against  Blue collar 
T otal number  n
90  130
  585
200
T he Ro total o f Against  Column total o f the W hite
Against  White collar 
T otal number  n
90  50
  225
200
T he Ro total o f Against  Column total o f the Managers
Against  Blue collar 
T otal number  n
90  20
  9
200
Option (2)
Question 9
Consider the following EXCEL output testing for independence of two variables:
Contingency table
Column 1 Column 2 Column 3 Total
Row 1 93 91 174 358
Row 2 907 909 1826 3642
Total 1000 1000 2000 4000
Chi–squared stat 0331
df 2
p–value 0847
Chi–squared critical 5992
221 STA1610/1
1. Since 0331  5992 the null hypothesis of independence cannot be rejected.
2. Since the p–value 0847  005 the two variables are dependent.
3. The degrees of freedom d f  r  1 c  1  2
4. The observed frequency for row 1, column 3 is 174
5. The expected frequency for row 1 column 3 is 179
Solution
Contingency tables
Column 1 Column 2 Column 3 Total
Row 1 93 91 174 358
Row 2 907 909 1826 3642
Total 1000 1000 2000 4000
Chi-squared stat 0.331
df 2
p-value 0.847
Chi-squared critical value 5.992
1. Correct
2. Incorrect
When the p-value is greater than the level of significance, we fail to reject H0 and therefore the two
variables are independent.
3. Correct
4. Correct
5. Correct
The Row total of row 1  Column total of col 3 358  2000
The expected frequency    179
Total number  n 4000
Option (2)
222
Question 10
A sample of 500 shoppers was selected in a large metropolitan area to determine various information con-
cerning consumer behavior. Among the questions asked was, “Do you enjoy shopping for clothing”. The
results are summarized in the following contingency table:
Observed frequencies Expected frequencies

Gender Gender
Enjoy for shopping Male Female Total Enjoy for shopping Male Female Total
Yes 12 23 35 Yes 951 2549 35
No 10 36 46 No 1249 3351 46
Total 22 59 81 Total 22 59 81
The test statistic  2ST AT is
1. 15766
2. 23547
3. 53214
4. 157662
5. 02135
Solution
Male Female T otal
Y es 12951 232549 35
No 101249 363351 46
T otal 22 59 81  n
 
2
 OE 2
The test statistic  
E

 OE 2         
2 2 12  951 2 23  2549 2 10  1249 2 36  3351 2
      
E 951 2549 1249 3351
 06520  02432  04964  01850

 15766
Option (1)
223 STA1610/1
Question 11
Consider the following table
Car size bought

Buyers age Small Medium Large Total
Under 30 10 22 34 66
30 - 45 24 42 48 114
Above 45 45 35 40 120
Total 79 99 122 300
Which one of following statements is incorrect?
1. The expected frequency of buyers above 45 and bought medium car is 396
2. The observed frequency of buyers under the age of 30 and bought large car is 34
3. The degrees of freedom is 4
4. The critical value is 9488 at   005
5. Suppose that the test statistic  2ST AT is 1835, H0 cannot be rejected at the 5% level of significance.
Solution
Car size bought

Buyers age Small Medium Large Total
Under 30 10 22 34 66
30 - 45 24 42 48 114
Above 45 45 35 40 120
Total 79 99 122 300
224
1. Correct
The expected frequency

T he Ro total o f aboe 45  Column total o f medium 120  99
   396
T otal number  n 300
2. Correct
3. Correct
4. Correct
The critical value is determined by having the value of the degrees of freedom and the level of signifi-
cance   005
df 0.10 0.05 0.025 0.01 0.005

1 2.706 3.841 5.024 6.635 7.879
:
4 7.779 9.488 11.143 13.277 14.860
Therefore the critical value is 9488
5. Incorrect
Since the test statistic = 18.35 is greater than the critical value, we reject H0 
Option (5)
225 STA1610/1
Question 12
Consider the following information:

Seasons
Sales (units) Summer Winter Autumn Spring Total
Shoprite 98 114 123 105 440
Spar 199 84 210 67 560
Total 297 198 333 172 1000
1. H0 : The two variables are dependents.
2. H1 : The two variables are independent.
4. The critical value at the 5% level is equal to 15507
5. The expected frequency for cell Autumn and Spar is equal to 18648.
Solution
1. Incorrect
H0 :The two variables are independent.
2. Incorrect
H1 :The two variables are dependent.
3. Incorrect
The degrees of freedom d f  r  1c  1  2  14  1  1  3  3
4. Incorrect
The critical value is 7815
5. Correct
The expected value is
T he Ro total o f Aut min Column total o f Spar 333  560

  18648
T otal number  n 1000
Option (5)
226
Question 13
Conduct a test to determine whether the two classifications L and M are independent, using the data given in
the table below (use   005).
M1 M2 Total
L1 28 68 96
L2 56 36 92
Total 84 104 188
Which one of the following statements is incorrect:
1. H0 : The two classifications are independent.
2. H1 : The two classifications are dependent.
4. The critical value (table value) is 3841.
2
5. Suppose that the test statistic X cal  19094 H0 is rejected at 5% level of significance.
Solution
1. Correct
2. Correct
3. Incorrect
The degrees of freedom d f  r  1c  1  2  12  1  1 1  1
4. Correct
5. Correct
Since 19.094 is greater than 3.841, H0 is rejected at 5% level of significance.
Option (3)
227 STA1610/1
228
Chapter 11
Simple linear regression
SUMMARY
 We explore methods that define possible relationships or association between two interval (or ordinal)
scaled data.
 When dealing with two data variables, we can visually explore whether there is association by plotting
a scatter plot of one variable against another variable. This gives us hints at the possible form of
association. The form of association can be linear or non-linear. In this module we will focus on linear
relationship alone.
 To calculate the strength of the association we have to calculate the correlation coefficient using a
simple data.
 Because we are interested in studying the association between two variables known as the dependent
variable denoted by Y and independent variable denoted by X.
 The independent variable provides the basis for calculating the value of the dependent variable.
 The linear regression model is given by the equation Y   0   1 X  error
229 STA1610/1
  b0  b1 X
 The estimated regression model is given by the equation Y
 The least squares provide the formulae for these estimators of b0 and b1 
 
Y X
 The formulae are b0  Y  b1 X Y  X
n n
         
n XY  X Y n X i Yi  n1 Xi Yi
b1   2  2 or b1   2  2
n X  X n Xi  Xi
 The correlation coefficient is given by the formulae
  
XY   
X Y
n
r    
 2  2   2
X n X 2
Y n Y
 The value of the correlation coefficient r lies between 1 and 1.
 If r is close to zero, this indicates there is a little linear relationship between X and Y . That means
there is no relationship between X and Y .
 If r is in the range between 03 to 05, this indicate there a weak linear relationship between X and Y
or a medium relationship between X and Y .
 If r is in the range between 05 to close to 1, this indicates a strong relationship between X and Y .
 If r equals to 1 or 1, there is a perfect relationship between X and Y .
 If r is greater than 0, we say there is positive relationship between X and Y .
 If r is less than 0, we say that there is a negative relationship between X and Y .
i 
 The residual or error term ei  Yi  Y
SS R
 The coefficient of determination r 2 
SST
 2
  Y
SS R  b0 Yi  b1 XY 
n
 2
 Y
SST  Yi2 
n
230
Activities
Question 1
The manager wants to analyse the relationship between advertising and sales, the manager of a furniture store
recorded the monthly advertising budget (thousand of rands) for a sample of 6 months. The data are presented
below:
Advertising X 23 46 60 54 28 33
Sales Y 96 113 128 98 89 125
The equation of the regression line of Y on X takes the form.
1. 
y  358  132X
2. 
y  1328  358X
3. 
y  4066  894X
4. 
y  1491  4019X
5. 
y  048  023X
Solution
X Y X2 Y2 XY
23 96 529 9216 2208
46 113 2116 12769 5198
60 128 3600 16384 7680
54 98 2916 9604 5292
28 89 784 7921 2492
33 125 1089 15625 4125
Total 244 5365 11034 5605025 232825
   
Y  5365 X  244 XY  232825 X 2  11034

Y 2  5605025 b1  13176
231 STA1610/1
b0  Y  b1 X

Y 5365
Y    894167
n 6

X 244
X   406667
n 6
b0  Y  b1 X
 894167  13176  406667
 358343
  b0  b1 X therefore Y
The regression line is Y   358343  13176X
Option (1)
Question 2
A study was conducted to determine the effects of sleep deprivation on people’s ability to solve problems.
The results were obtained as follows:
Number of hours X 8 12 16 20 24
Number of errors Y 86 610 814 1412 1612
If the slope b1  05765 the intercept b0 is
1. 05765
2. 10616
3. 1392
4. 63246
5. 22022
232
Solution
b1  05765

Y 86  6010  814  1412  1612 5308
Y     10616
n 5 5

X 8  12  16  20  24 80
X    16
n 5 5
b0  Y  b1 X
 10616  05765  16
 1392
Option (3)
Question 3
Consider a data set containing number of pages and price for n  15 books on a professor’s bookshelf.
The correlation coefficient, r, was calculated as 0474
1. Number of pages and price are directly related.
2. A scatter diagram of the data will show an upwards slope.
3. The more pages a book has the higher the price.
4. The coefficient of determination will lie between 1 and 1
5. If there is absolutely no relationship between of pages and price, r will be equal to 0
233 STA1610/1
Solution
1. Correct
Because the correlation coefficient r is positive.
2. Correct
Because r is positive, it means that when the variable X increases, the variable Y increases as well.
3. Correct
4. Incorrect
The coefficient of determination r 2 lies between 0 and 1.
5. Correct
Option (4)
Question 4
If all the points in a scatter diagram lie on the regression line, then the correlation coefficient r:
1. must be 1
2. must be 1
3. must be either 1 or 1
4. must be 0
5. need more information
234
Solution
This is a perfect relationship between X and Y .
Option (3)
Question 5
Consider the following data

X 4 6 8 9 12
Y 12 16 25 30 39
    
XY  1082 Y  122 X  39 X 2  341 Y 2  3446
    2
SS XY  Xi  X Yi  Y  1304 SS X  Xi  X  368
1. The intercept is 354
2. The slope  321
  321  354X
3. The regression line is Y
4. When the coefficient of correlation is 099, the coefficient of determination is 09950
  354
5. When X  10, then the estimated Y
235 STA1610/1
Solution
1. Incorrect
SS XY 1304
b1    35435
SS X 368
   
 X Y
SS XY  XY 
n

Y 122
Y    244
n 5

X 39
X   78
n 5
b0  Y  b1 X
 244  35435  78
 32393
2. Incorrect
3. Correct
  b0  b1 X
The regression line is Y than   321  354X
Y
4. Incorrect
When r  099 than r 2  0992  09801
5. Incorrect
  321  354X  321  354  10  3219

When X  10 then Y
Option (3)
236
Question 6
1. If r  086, it implies that the relationship between the two variables examined is strong enough.
2. If r 2  0 70, it implies that 70% of the variation in Y is explained by the regression line.
3. If the coefficient of correlation r is highly negative, it cannot be reliable.
4. If r  064 then r 2  04096
5. r , indicates the strength and the direction of the relationship.
Solution
1. Correct
2. Correct
3. Incorrect
It can be reliable in the negative way. When one variable increases than the other one decreases.
4. Correct
r  064 r 2  0642  04096
5. Correct
Option (3)
237 STA1610/1
Question 7
A production manager has compared the dexterity scores of five assembly line employees with their hourly
  192  30X If a job
productivity (units per hour). A least-squares regression equation is calculated as Y
applicant has a dexterity score of 15, his predicted productivity per hour will be
1. 192 units
2. 222 units
3. 642 units
4. 150 units
5. 450 units
Solution
  192  30X
The estimated equation is Y
  192  30  15  642

Option (3)
Question 8
The following data represent marks obtained in a mathematics test, X and an economics test, Y :
X 22 14 17 7 10
Y 7 17 12 27 22
   
X  70 Y  85 XY  1005 X 2  1118
238
Suppose that the value of SS XY  185, the slope is equal to
1. 1341
2. 35768
3. 134
4. 138
5. 134
Solution
   
 X Y
SS XY  XY 
n
 2
 X
SS X  X2 
n
702
 1118 
5
 1118  980
 138
SS XY 185
The slope equals to b1    13406
SS X 138
Option (5)
239 STA1610/1
Question 9
  3468  420X was fitted to a random sample of 11 pairs

Suppose that the least squares regression line Y
of variables.
Which of the following statements is incorrect?:
1. Y is called the dependent variable, X is called the independent variable.
2. The Y intercept for this regression line is equal to 3468
3. The slope is equal to 420
4. Using the regression line, the estimated value of Y when X  20 equals to 4932
5. For each increase of one unit of X, Y –value is predicted to decrease by 420
Solution
1. Correct
2. Correct
3. Incorrect
The slope equals to 420.
4. Correct
  3468  420  20  4932

5. Correct
Option (3)
240
Question 10
Consider the following data:
X: 81 75 71 61 96 56 85 18
Y : 80 82 83 57 100 30 68 56
Use the summary statistics below and calculate the coefficient of determination
  
X 2  40849 X i  543 Yi2  41922
i 
Yi  556 X i Yi  40068 SS R  1359063
1. 04143
2. 06437
3. 05124
4. 01464
5. 04673
Solution
SS R
The coefficient of determination r 2 
SST
 2
 Y
2
SST  Y 
n
5562
 41922 
8
 41922  38642
 3280
SS R 1359063
r2    04143
SST 3280
Option (1)
241 STA1610/1
Question 11
1. The coefficient of correlation r is a number that indicates the direction of the relationship between the
dependent variable Y and the independent variable X
2. The coefficient of correlation r is a number that indicates the strength of the relationship between the
dependent variable Y and the independent variable X
3. If the coefficient of correlation r  1 then the best–fit linear equation will include all the data points.
4. If the coefficient of correlation r  0 then there is a linear relationship between the dependent variable
y and the independent variable X
5. If the coefficient of determination r 2  081 the coefficient of correlation, r can be 090 or 090
Solution
1. Correct
2. Correct
3. Correct
4. Incorrect
When r  0, there is no linear relationship between X and Y
5. Correct

When r 2  081 than r   081  09 and r  09.
Option (4)
242
Question 12
Consider the following sample data:
X 12 23 11 23 14
Y 28 43 21 40 33
  
Suppose that b0  96 b1  141 X  83 Y  165 XY  2938 and SST  318 then the
coefficient of determination is
1. 094
2. 089
3. 066
4. 078
5. 049
Solution
SS R
The coefficient of determination r 2 
SST
 2
  Y
SS R  b0 Y  b1 XY 
n
1652
 96  165  141  2938 
5
 1584  414258  5445
 28158
SS R 28158
r2    08855
SST 318
Option (2)
243 STA1610/1
Question 13
The following are the number of minutes it took mechanics to assemble a piece of machinery in the morning,
X and in the afternoon, Y :
X 111 120 137 173 148 153
Y 142 215 211 193 190 174
The coefficient of determination is
1. 0238
2. 01667
3. 0057
4. 0488
5. 078
Solution
From the scientific calculator we have obtained:

    
X  842 Y  1125 X 2  120732 Y 2  214535 XY  1586
n  6 b0  147932 b1  02820
 2
  Y
SS R  b0 Y  b1 XY 
n
11252
 147932  1125  02820  1586 
6
 1664235  447252  2109375
 2112
 2
 Y
2
SST  Y 
n
11252
 214535 
6
 214535  2109375
 35975
SS R 2112
r2    00587
SST 35975
Option (3)
244
Question 14
1. To draw a scatter diagram (plot), we need data for two variables.
2. If most of the points fall close to the line, we say that there is a linear relationship.
3. If one variable increases when the other does, we say that there is a negative linear relationship.
4. The objective addressed by the model is to analyze the relationship between two variables, X and Y .
5. If the coefficient of correlation r  09 than the coefficient of determination R 2  081.
Solution
1. Correct
2. Correct
3. Incorrect
We say there is a positive relationship.
4. Correct
5. Correct
r 2  09 r 2  092  081
Option (3)
245 STA1610/1

STA 1610 Work Book PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STA 1610 Work Book PDF

Uploaded by

Copyright:

Available Formats

© 2017 University of South Africa

All rights reserved

Printed and published by the

Layout done by the Department

1 Data and Statistics 1

1.2 Population and Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Descriptive statistics: Tabular and Graphical Presentations 23

2.1 Visualizing Numerical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Descriptive Statistics: Numerical Measures 33

3.1 The Central Tendency or Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 The Dispersion or Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 Quartile and Coefficient of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5 Discrete Probability Distribution: Binomial and Poisson 81

5.1 General Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.2 Discrete Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2.1 Binomial Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2.2 Poisson Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.1 Normal Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7 Sampling Distribution of the Mean and Proportions 137

7.1 Sampling Distribution of the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7.2 Sampling Distribution of the Proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

8 Point Estimations and Confidence Intervals 161

8.1 Point Estimations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

8.2 Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

8.2.1 Confidence Interval of the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

8.2.2 Confidence Interval of the Proportion . . . . . . . . . . . . . . . . . . . . . . . . . 180

9 Hypothesis Testing of the Mean and Proportions 187

9.1 Hypothesis Testing of the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

9.2 Hypothesis Testing of the Proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

10 Chi–squared Test 209

11 Simple linear regression 229

Welcome to the exciting workbook of Statistics.

1. Identify a technique and

2. Calculate the Statistics. The calculation stage is completed in a manual way.

Data and Statistics

 Variables can be denoted using symbols.

The variable “customer satisfaction” using the following four–point scale:

1  ‘not satisfied’, 2  ‘slightly satisfied’, 3  ‘satisfied’ and 4  ‘very satisfied’ represents

5. discrete and ordinal variable

A survey of readers asked respondents to complete the following:

C. Number of magazine subscriptions

A. Gender (male and female): represents a qualitative - nominal variable.

C. Number of magazine subscriptions (can be 1, or 2, or 4 and so on): represents a quantitative - discrete

The variable “country” with possible values encoded as

1  ‘South Africa’, 2  ‘USA’, 3  ‘UK’ and 4  ‘Zimbabwe’ represents

5. both ordinal and discrete variable

Which one of the following statements is incorrect?

2. The number of children in a family is a discrete variable.

3. The median is sensitive to outlier.

4. The mean makes use of all observations.

5. In a data set we can have more than one mode.

Which one of the following statements is incorrect?

1. Nominal data are also called qualitative data.

2. Quantitative data in Statistics refers to continuous data alone.

3. Qualitative variables indicate that a person or object belongs in a category.

4. The method of payment is a qualitative variable.

Quantitative data refers to discrete and continuous variable.

Method of payment can be cash, cheque, electronic etc.

Consider the following statements refers to a

A. whether or not you own a Panasonic television set.

B. your status as either a full time or a part time student.

C. the number of pupils who attended your primary school.

D. an weight of the unfilled container.