Professional Documents
Culture Documents
STA1610/WB01/2018–2020
70690561
Florida
Contents
1.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
4 Introduction to Probability 61
iii STA1610/1
6 Continuous Probability Distribution 113
iv
Preface
I have written this workbook to make Statistics accessible to everyone, including those with a limited math-
ematics background. Statistics affects all aspects of our lives and its applications are so numerous that, in a
sense, we are limited only by our own imagination in discovering new uses for Statistics. The workbook of
statistics continues to emphasize some important concepts of Statistics.
The applied nature of the Statistics discipline is reinforced by showing and teaching students how to choose a
correct statistical procedure or formulae with a clear understanding of a such particular concept. Fulfilling this
objective requires several features built into this workbook that include a driven-scope of Statistical concepts
and a large number of activities.
In the workbook, calculators are very good at providing numerical results of statistical processes. One reason
for using calculator is for students to be able to understand the technique and concepts by doing calculations
by hand. Students have to be aware that all assignments and examinations in this module will be done by
hand with the support of formulaes and statistics tables. The approach adopted in the workbook is to divide
the solution of statistical problems into two stages and include them in every appropriate activity:
It is important that you do the exercises on your own before looking at my solutions. Even if you cannot
do an exercise, you should at least read through it and try to do it. Each exercise is designed to test your
understanding of the work immediately preceding it.
The scope helps students determine whether or not a statistical method or a particular formulae is appropriate.
To enrich students’ learning experience, each topic was chosen for its relatively straightforward presentations
and useful applications. Some of the other topics in expanding students knowledge for a quick overviews
include steam-and-leaf plots, statistical tables and graphs.
I hope that you enjoy studying this module as much as I have enjoyed compiling the workbook. In closing
I invite you to help me improve my presentation of this module. You can do this by bringing any errors,
obscurities, comments, suggestions or misprints to the attention of your lecturer.
Your Lecturer
Mr B J Kanyama
v STA1610/1
vi
Chapter 1
1.1 Variables
SUMMARY
Qualitative variables or categorical variables are used to measure qualitative, descriptive or categorical
characteristics of subjects. The value of qualitative variable cannot be described meaningfully using
numbers.
Quantitative variables are used for quantitative characteristics of subjects, It is a variable that can be
expressed numerically.
Qualitative - nominal variable: Data values represent descriptions or classifications. The order of the
values has no logical order.
Qualitative - ordinal variable is used to show order. The values used as labels cannot be logically
interpreted.
1 STA1610/1
Quantitative - discrete variable: Data occur as integers or whole numbers.
Quantitative - continuous variable: Continuous data occur as continuous numbers with any level of
accuracy. The difference between measured values make sense but the data have no natural zero.
Quantitative - ratio scale data: The data are ordered, having a continuous scale and have a natural or
true zero
Activities
Question 1
1. discrete variable
2. nominal variable
3. ordinal variable
4. continuous variable
Solution
Variable "customer satisfaction" (1 ’not satisfied’, 2 ’slightly satisfied’, 3 ’satisfied’ and 4 ’very
satisfied’). This statement represents a qualitative - ordinal variable as the values 1, 2, 3 and 4 are used to
show the order but cannot be interpreted. We can even remove them to see the data values in the form of
ordered list.
Option (3)
2
Question 2
A. Gender
B. Marital status
D. Annual income
E. Rate the lecturer (very effective, effective, not too effective, not at all effective).
Determine the type of variable from each items given above, which one represents the discrete variable?
1. C and D
2. A, B and E
3. C and E
4. Only D
5. Only C
Solution
B. Marital status (single, married, divorced, widow): represents a qualitative - nominal variable.
D. Annual income (can be R200 00000 or R600 00000, and so on ): represents a quantitative - continuous
variable.
E. Rate the lecturer (very effective, effective, not too effective, not at all effective): represents a qualitative
- ordinal variable.
Option (5)
3 STA1610/1
Question 3
1. nominal variable
2. ordinal variable
3. discrete variable
4. continuous variable
Solution
The variable country (1 ’South Africa’, 2 ’USA’, 3 ’UK’, 4 ’Zimbabwe’) represents a qualitative -
nominal variable.
Option (1)
Question 4
1. The average marks for STA1610, the values could be 75%, 748% and 7489% is a continuous variable.
4
Solution
1. Correct
The average marks for STA1610 are 075; 0748; and 07489
2. Correct
The number of children in the family can be for example 1, 10 or 15 or 25 and so on: Integers numbers.
3. Incorrect
The median is the middle value while the outlier is an extreme value from the dataset.
4. Correct
The mean is the sum of values in the dataset divide by the total number.
5. Correct
When we have one mode we say unimodal, when we have two mode we say bimodal and so on.
Option (3)
Question 5
5. The amount of money a person withdraws from an ATM today is a discrete variable.
5 STA1610/1
Solution
1. Correct
2. Incorrect
3. Correct
4. Correct
5. Correct
The withdrawal can only be referred to integers number representing the number of money notes with-
drawal at a time.
Option (2)
Question 6
6
Which one of the following statements is quantitative variable?
1. Only B
2. B and D
3. B , C and D
4. C and D
5. A, B and E
Solution
A. A Panasonic television set (own or not own): represents a qualitative - nominal variable.
B. The status of the students (full time or part time): represents a qualitative - nominal variable.
C. The number of pupils who attended your primary school: represents a quantitative - discrete variable
E. Variable condition (poor, fair, good and excellent): represents a qualitative - ordinal variable.
Option (4)
7 STA1610/1
Question 7
3. The number of times a mouse makes a wrong turn in a laboratory represents a continuous variable.
Solution
1. Correct
2. Correct
3. Incorrect
The statement about the number of a mouse makes a wrong turn represents a discrete variable.
4. Correct
The position one finishes in a race can be first, 2nd, 3rd, ... as this represents a qualitative - ordinal
variable.
5. Correct
Option (3)
8
Question 8
Solution
1. Correct
2. Correct
3. Correct
4. Correct
5. Incorrect
Option (5)
9 STA1610/1
Question 9
E. Rate the availability of parking space as excellent, good, fair, poor or very poor.
1. A and C
2. C and D
3. Only D
4. A , C and D
5. B and E
Solution
E. The variable "rate the availability of parking" (excellent, good, fair or poor): represents a qualitative -
ordinal variable.
Option (4)
10
Question 10
The question “what is your marital status?” had the following responses
A. Married
B. Widowed
C. Divorced
D. Separated
E. Never married
1. Bar chart
2. Pie chart
3. Histogram
4. Stem–and–leaf
Solution
The variable "response to the question about marital status" (Married, widowed, divorced, separated, never
married): represents a categorical variable.
There are two graphical techniques to summarize the data with the qualitative variable: Bar and Pie chart.
The best between the two is the bar chart.
Option (1)
11 STA1610/1
Question 11
1. A and B
2. B , C and D
3. C and E
4. A , C , D and E
5. A , B , C , D and E
Solution
Option (4)
12
Question 12
3. The number of oil cans sold at a given petrol station is a continuous variable.
Solution
1. Correct
2. Correct
3. Incorrect
The number of cans sold gives us a countable characteristics of data therefore we can only obtain
integers. This is a discrete variable.
4. Correct
5. Correct
Option (3)
13 STA1610/1
Question 13
Before leaving a particular restaurant, customers are asked to respond to the questions listed below.
D. Which of the following attributes of the restaurant do you find most attractive: service, prices, quality
of food, or varied menu?
1. Only B
2. B and C
3. B and D
4. A and C
5. B C and D
Solution
B. Qualitative - nominal variable, the variable is ’have you eaten previously’ (the response is yes or no).
D. Qualitative - nominal variable, the variable is ’attribute of the restaurant do you find more attrative’
(service, prices, quality of food, or varied menu).
Option (3)
14
Question 14
The following information is collected from an application form for a car-loan to a certain bank
A. Marital status
1. A and B
2. B and C
3. C and D
4. A, B and E
5. B and E
Solution
A. Qualitative - nominal
B. Quantitative - continuous
C. Quantitative - discrete
D. Qualitative - nominal
E. Qualitative nominal
Option (2)
15 STA1610/1
Question 15
Solution
1. Correct
2. Correct
3. Correct
4. Correct
5. Incorrect
Option (5)
16
Question 16
E. How would you rate the quality of instruction? (excellent, very good, good, fair, poor).
Identify the type of variable for each question, which one represents the quantitative variable?
1. Only B
2. Only D
3. A, C and E
4. B and D
5. B, C, and D
Option (3)
17 STA1610/1
1.2 Population and Sample
SUMMARY
Activities
Question 1
1. parameter
2. statistic
3. mean
4. mean X
5. size N
18
Solution
Option (1)
Question 2
A. The mean
B. The median
C. The mode
D. The range
1. Only A
2. C and D
3. A, B and D
4. C, D and E
5. B and C
Solution
Option (5)
19 STA1610/1
Question 3
4. Statistical inference is used to draw conclusions or inferences about characteristic of populations based
on sample data.
5. A summary measure that is computed from a sample to describe a characteristic of the population is
called a statistic.
Solution
1. Correct
2. Correct
3. Incorrect
4. Correct
5. Correct
Option (3)
20
Question 4
3. In a pie chart, the size of segments varies according to the percentage in each category.
4. The difference between the histogram and bar chart is that the bars of the histogram are jointed together
whereas those of a bar graph are not.
Solution
1. Correct
2. Incorrect
Population is the entire or complete set of objects.
3. Correct
4. Correct
5. Correct
Option (2)
21 STA1610/1
22
Chapter 2
SUMMARY
Visualising data involves using various tables and charts to help draw conclusions about data. The
tables and charts depend on the type of data we have.
Stem-and-leaf allows us to see how a data set are distributed and where the concentration of data exist.
Histogram is a way of summarising data that are measured on continuous or discrete scale variable.
Frequency polygon is a graph that is made by joining the centre of the top of the columns of a frequency.
Bar and pie charts are used of summarising data that are measured on categorical scale variable.
23 STA1610/1
Activities
Question 1
Given the following stem–and–leaf display for midterm exam score on information systems:
5 0
7 4 4 6
8 1 9
9 2
Solution
5 0
6
7 4 4 6
8 1 9
9 2
1. Correct
2. Correct
24
3. Correct
4. Incorrect
The median is 76
5. Correct
Option (4)
Question 2
Consider the nine e-mail receipts data of two-digit integers given below
11 33 28 32 13 24 28 22 17
1. 1 1 3 7
2 8 4 8 2
3 3 2
2. 1 1 3 7
3 3 2
2 8 4 8 2
3. 1 1 3 7
2 2 4 8 8
3 2 3
4. 1 1 3 7
3 2 3
2 2 4 8 8
5. 11 13 17
22 24 28 28
32 33
25 STA1610/1
Solution
Option (3)
Question 3
2 0 2 2 7
3 1 1 3 5 9
1. 0 2 2 7 1 1 3 5 9
3. 20 22 22 27 31 31 33 35 39
4. 20 22 27 31 33 35 39
5. 2 22 22 72 13 13 33 53 59
Solution
Given a stem-and-leaf
2 0 2 2 7
3 1 1 3 5 9
Option (3)
26
Question 4
The ages of a sample of 40 workers are shown below using a stem–and–leaf diagram
2 5 6 6 8 8 9
3 0 1 2 3 3 3 4 5 5 6 6 7 7 8 8 9 9
4 0 0 1 1 1 1 6 6 6 8 9
5 0 0 1 2 3
6 1
5. The median is 38
Solution
2 5 6 6 8 8 9
3 0 1 2 3 3 3 4 5 5 6 6 7 7 8 8 9 9
4 0 0 1 1 1 1 6 6 6 8 9
5 0 0 1 2 3
6 1
1. Correct
2. Incorrect
Because there are 15 values that are more than 40 and 25 values that are less or equal to 40.
27 STA1610/1
3. Correct
4. Correct
5. Correct
n1 40 1
The median position : 205
2 2
Because the position 205 falls between the 20th and 21st value therefore the median is the average of
38 38
the 20th value and 21st value in increasing order: 38
2
Option (2)
Question 5
The following data gives the marks obtained in a statistics exam as a percentage:
3 5
4 3 8 8 9 9
5 4 5 5 9
6 1 3 6
7 3
9 5
3. About 33% of the marks lie between the marks 40 and 50
4. The median is 5.
28
Solution
3 5
4 3 8 8 9 9
5 4 5 5 9
6 1 3 6
7 3
9 5
1. Incorrect
2. Incorrect
3. Correct
Between 40 and 50, we have five values among 15 values and the five values are 43 48 48 49 and
5
49. These values correspond to the proportion equals to 03333 or 3333%
15
4. Incorrect
35 43 48 48 49 49 54 55 55 59 61 63 66 73 95
5. Incorrect
The majority of the students have passed the statistics exam as they are nine among the 15 students.
Option (3)
Question 6
29 STA1610/1
Which one of the following statements is incorrect?
Solution
1. Incorrect
30 42 46 55 56 57 60 62 64 66 72 77 77 83
60 62
the median is the average of 60 and 62 61
2
2. Correct
7
Between 50 and 70 we have 7 values that correspond to the proportion of 05 or 50%
14
3. Correct
30
4. Correct
5. Correct
The range equals to the largest value minus the smallest value in increasing order: 83 30 53
Option (1)
31 STA1610/1
32
Chapter 3
SUMMARY
The central tendency or location defines the location of the middle or the centre of a distribution.
The central tendency allows us to assign a value to what is the most representative value of the group.
The measures of the central tendency are the mean, median and the mode.
x
The mean is a measure of
the average data values for a given dataset: The sample mean X n
and
the population mean Nx
The mode is the most occurring value in a set of discrete data set.
A distribution is symmetrical when the value of the mean, median and mode are all equal. The distrib-
ution is called normal (bell-shaped).
33 STA1610/1
A distribution is positively skewed when the value of the mean is more than the value of the median
and the mode. The tail is on the right hand side.
A distribution is negatively skewed when the value of the mean is smaller than the value of the median
and the value of the mode. The tail is on the left hand side
Dispersion is a degree to which values are spread out on the number line.
The range is the difference between the largest and the smallest data values. it shows how widely spread
the data are by considering the distance between the largest and smallest values.
The variance: Related to the sum of the squared distances of observations from their sample mean.
This sum of squared values measures the dispersion of the observations.
2
2
xi X 2 xi 2
The sample variance S The population variance
n1 N
The standard deviation is the square root value of the variance = ariance
Activities
Question 1
1. the value of the mean, median and mode are all equal.
4. the value of the mean, median and mode are not equal.
5. we are able to calculate the mean and the standard deviation through the calculator.
34
Solution
Option (1)
Question 2
31 27 26 30 28 31
1. The range is 5
Solution
1. Correct
35 STA1610/1
2. Incorrect
28 30
The average of 28 and 30 which equals to 29
2
3. Incorrect
The mode is 31
4. Incorrect
31 27 26 30 28 31 173
The sample mean equals to 288333
6 6
5. Incorrect
Because the mean 288333, the median 29 and mode 31 are not all equal the distribution is not
symmetrical.
Option (1)
Question 3
23 11 15 26 12
1. 174
2. 637
3. 673
4. 051
5. 453
36
Solution
2
2
xi X
The sample variance S
n1
xi 23 11 15 26 12
The sample mean X 174
n 5
The standard deviation S 453 67305
Option (3)
Question 4
1. the value of the mean, median and mode, when they have all the same value.
37 STA1610/1
Solution
The measure of central tendency (or location) is mean, median and mode.
Option (5)
Question 5
Consider the data collected on the mass of six laboratory rats as follows:
27 31 28 30 26 27
Solution
1. The range 31 26 5
27 28
2. The median is the average of 27 and 28 which is 275
2
27 31 28 30 26 27 169
3. The mean is X 338
5 5
38
4. The mode is 27
Option (4)
Question 6
24 27 36 48 52 52 53 59
8
Given that the 2 1202875 the sample variance equal to:
xi x
i1
1. 15036
2. 1311
3. 17184
4. 43875
5. 1784
Solution
24 27 36 48 52 52 53 59
8 2
If the i1 xi X 1202875
2
2
xi X 1202875 1202875
The variance is S 171839
n1 81 7
Option (3)
39 STA1610/1
Question 7
A. The mean
B. The median
C. The mode
D. The range
1. Only A
2. C and D
3. A , B and D
4. C, D and E
5. B and C
Solution
Option (1)
40
Question 8
For a sample of eight employees, the most recent hourly wage increases were 18 5 7 2 10 6 12 15 cents
per hour. The sample variance is
1. 2913
2. 20388
3. 540
4. 9213
5. 2391
Solution
2
2
xi X
The sample variance S
n1
18 5 7 2 10 6 12 15 75
The sample mean X 9375
8 8
18 93752 5 93752 7 93752 2 93752
10 93752 6 93752 12 93752 15 93752
S2
81
203875
7
29125
Option (1)
Question 9
41 STA1610/1
1. 5
2. 2
3. 2 and 5
4. 2 and 13
5. 13
Solution
Option (5)
Question 10
1. The middle score if you line the score up from the lowest to the highest.
Solution
Option (3)
42
Question 11
1. The mode
2. The mean
3. The median
5. The range
Solution
Option (4)
Question 12
A social psychologist asked 15 College students how many times they fell in love before they were eleven
years old. The number of times were as follows:
2 0 6 0 3 1 0 4 9 0 5 6 1 0 2
1. The range is 9
2. The median is 2
5. The value of the mode is smaller than the value of the median?
43 STA1610/1
Solution
1. Correct
Range: 9 0 9
2. Correct
n1 15 1
Middle value is obtained by 8, the 8th value in ordered array is 2.
2 2
3. Correct
xi 000001122345669 39
The mean X 26
n 15 15
because the mean 26 is greater than the median 2, the distribution is positively skewed on the
right hand side.
4. Incorrect
Zero is also a value counted amongst the values that were given.
5. Correct
The mode is 0
Option (4)
Question 13
44
Solution
Option (1)
Question 14
The following performance scores have been recorded for 10 jobs applicants who have taken a pre-employment
aptitude test at UNISA:
22 19 22 24 21 22 17 20 17 21
3. The mode is 22
4. The range is 7
Solution
Given data 22 19 22 24 21 22 17 20 17 21
1. Correct
xi 22 19 22 24 21 22 17 20 17 21 205
The sample mean X 205
n 10 10
2. Incorrect
3. Correct
45 STA1610/1
4. Correct
The range 24 17 7
5. Correct
Because the mean, median and mode values are completely different.
Option (2)
Question 15
The following data gives the ages in years of a sample of 8 employees from a government department:
31 43 56 23 49 42 33 61
1. xi 410
2. X 41
3. xi2 15540
Solution
1. xi 31 43 56 23 49 42 33 61 338
xi 31 43 56 23 49 42 33 61 338
2. X 4225
n 8 8
46
3. xi2 312 432 562 232 492 422 332 612 15450
The standard deviation S 1670714 129256
Option (4)
Question 16
The following data gives the typing speeds (in words per minute) for several stenographers.
125 140 170 155 132 175 225 210 125 310
10
1. xi 1767
i1
10
2. xi2 1 7672
i1
47 STA1610/1
Solution
1. Correct
xi 125 140 170 155 132 175 225 210 125 310 1767
2. Incorrect
2
xi 1252 1402 1702 1552 1322 1752 2252 2102 1252 3102 342649
3. Correct
4. Correct
125 140 170 155 132 175 225 210 125 310
xi
The sample mean X n
10
1767
1767
10
5. Correct
The range the largest value the smallest value 310 125 185
6. Correct
The mode 125
Option (2)
Question 17
Which one of the following statements is not true about the mean?
2. In a symmetric distribution, the mean, the median and the mode are all equal.
5. To calculate the mean, sum all values and divide by the count.
Solution
Option (3)
48
Question 18
A summary measure that is computed from a sample to describe a characteristic of the population is called
1. a parameter
2. a population
3. a statistic
4. inferential statistics
5. box–plot
Solution
Option (1)
Question 19
The statistics Department randomly selected 15 students and recorded their marks (in percentage) during the
last examination. The data of the 15 students are as follows:
50 72 63 60 80 79 41 48 23 72 96 34 43 55 51
5. The mode is 2
49 STA1610/1
Solution
1. Correct
xi 50 72 63 60 80 79 41 48 23 72 96 34 43 55 51
The mean X
n 15
867
578
15
2. Correct
n1
The median: The middle value when the data are arranged. The position of the median is
2
15 1
8 this means that the median is the 8th value in the increasing order.
2
23 34 41 43 48 50 51 55 60 63 72 72 79 80 96
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th
3. Correct
The range the largest value the smallest value in increasing order
96 23 73
4. Correct
66% of 15 066 15 99 10 students pass the exam.
Alternatively, we count the number of students who pass the exam and we divide by 15. This gives us
10
15
06667 as this can be expressed as about 66%.
5. Incorrect
The mode is 72
Option (5)
Question 20
In a survey of office workers in a large city, each worker in a random sample of 10 workers was asked to
report the number of times during the previous month he or she has eaten an evening meal at a restaurant.
50
1. 5
2. 1022
3. 320
4. 283
5. 803
Solution
xi 5091985526 50
The mean X 5
n 10 10
The standard deviation S V ariance 102222 31972
Option (3)
Question 21
1. Sample mean
2. Sample mode
3. Sample median
4. Sample proportion
51 STA1610/1
Solution
Option (5)
Question 22
2. When a distribution has more values to the left and tails to the right we say, it is skewed negatively.
3. When there is no difference in the values of the mean, median and mode we say, it is a normal distrib-
ution.
4. When a distribution is bell–shaped with the left half identical to the right half, it is symmetrical.
5. For the following data values: 9 7 8 6 9 10 14 the mean, median and mode are all equal.
Solution
1. Correct
2. Incorrect
The distribution is skewed positively.
3. Correct
4. Correct
5. Correct
xi 9 7 8 6 9 10 14 63
The sample mean X 9
n 7 7
The median is the middle value. The ranked data are: 6 7 8 9 9 10 14
Therefore the median is 9
The mode is the most repeated value which is 9
Option (2)
52
3.3 Quartile and Coefficient of Variation
The first quartile Q 1 , which divides the smallest 25% of the values. The position of the first quartile
n1
Q1
4
n1
The third quartile Q 3 represents 75% of the values. the position of the third quartile Q 3 3
4
The boxplot provides a graphical representation of the data based on the five -number summary: The
smallest value, the value of Q 1 , the value of Q 2 = median, the value of Q 3 and the largest value.
Activities
Question 1
The bounced check fees (in Rand) for a sample of 10 banks are:
26 28 20 21 22 25 18 23 15 30
4. To calculate the value of Q 3 , ranked first the values from the smallest to the largest.
53 STA1610/1
Solution
26 28 20 21 22 25 18 23 15 30
n1 10 1 11
1. The position Q 1 275
4 4 4
10 1
The position of Q 2 2 2 275 55
4
To get the value of Q 2 , because 5.5 falls between the 5th and 6th value than the value of Q 2 is the
22 23 45
average of the 5th value and the 6th value: 225
2 2
n1 10 1
3. The position of Q 3 3 3 3 275 825
4 4
4. The value of Q 3 , we have to round the position of Q 3 825 to 8 and consider the eight value equals
to 26
The value of Q 1 , we have to round the position of Q 1 275 to 3 and consider the third value in the
ordered array which equals to 20
Option (2)
54
Question 2
51 50 47 33 37 43 61 55 44 41
4. The value of Q 3 51
Solution
n1 10 1 1
1. The position of Q 1 275
4 4 4
To calculate the value of Q 1 , we have to round the position of Q 1 275 to 3 and consider the third
value which is 41
n1 10 1
2. The position of Q 2 2 2 2 275 55
4 4
44 47 91
The value of Q 2 is the average of the 5th and 6th value which gives 455
2 2
n1 10 1
3. The position of Q 3 3 3 3 275 825
4 4
55 STA1610/1
4. The value of Q 3 , we have to round the position of Q 3 and consider the 8th value in the ranked data
which is equal to 51
Option (1)
Question 3
Solution
n1 13 1 14
1. The position of Q 1 35
4 4 4
For the value of Q 1 , we have to calculate the average of third and the fourth value in an ordered array
66 12
which is 6
2 2
56
n1 13 1
2. The position of Q 2 2 2 2 35 7
4 4
3. The median is always equals to the value of Q 2 as they are representing the 50% of the values, alterna-
tively you can show with the calculation.
n1 13 1
4. The position of Q 3 3 3 3 35 105
4 4
For the value of Q 3 , we have to calculate the average of 10th and the eleventh value in an ordered array
9 10 19
which is 95
2 2
Option (5)
Question 4
The following sample shows the starting salary for new university graduates (in thousands of rand):
57 STA1610/1
Solution
307 288 291 311 301 297 307 300 306 305
The ranked data 288 291 297 300 301 305 306 307 307 311
n1 10 1 11
1. The position of Q 1 275
4 4 4
n1 10 1
2. The position Q 3 3 3 3 275 825
4 4
To calculate the value of Q 3 we have to round the position of Q 3 and consider the 8th value in the
ranked data which is equal to 307.
3. Correct
n1 10 1
4. The position of Q 3 3 3 3 275 825
4 4
To calculate the value of Q 1 we have to round the position of Q 1 275 to 3 and we consider the 3rd
value in increasing order which is 297.
Option (3)
Question 5
The following data give the amount paid in rentals (in hundreds) for a random sample of 14 one–bedroomed
apartments in Arcadia Pretoria.
14 20 28 16 18 17 21 20 17 18 25 15 20 30
58
1. The location of the first quartile is Q 1 375
2. The location of the second quartile is Q 2 75 and the value of Q 2 190
4. The value of the second quartile represents the value of the median.
Solution
14 20 28 16 18 17 21 20 17 18 25 15 20 30
n1 14 1 15
1. The position of Q 1 375
4 4 4
n1 13 1
2. The position of Q 2 2 2 2 375 75
4 4
The value of Q 2 we have to calculate the average of 7th and the eight value in an ordered array which
18 20 38
is 19
2 2
n1 14 1
3. The position of Q 3 3 3 3 375 1125
4 4
4. Correct
Option (5)
59 STA1610/1
Question 6
The sample size n 105, the value of Q 1 2100, the value of Q 3 2400, the median 22000 and the
sample mean X 22238.
1. symmetric
2. positively skewed
3. asymptotic
4. negatively skewed
5. normal
Solution
Because the value of the sample mean X 22238 is greater than the value of the median 22000, the
distribution is positively skewed.
Option (2)
60
Chapter 4
Introduction to Probability
SUMMARY
Probability is a way of expressing the likely occurrence of a particular event as a number between 0
and 1.
Probability calculations allows us to quantify uncertainty. That means to allow us to express our uncer-
tainty numerically.
In order to determine any probability, you first need to obtain data. We can obtain data through an
experimentation, observation or experience.
Outcome: The potential result of a random experiment, where the exact value is unknown before the
experiment, but known after the experiment has been concluded.
An event is any subset of the collection of outcomes of an experiment. That means an event is a
subset of the sample space. In this module we are interested in determining the probability of an event
occurring.
61 STA1610/1
The probability of each individual outcome lies between 0 and 1.
The sum of the probabilities of the outcomes in the sample space equals 1.
The relative frequence is a probability. This is because it is a ratio of the frequency of occurrence of
each outcome or event X to the total number of times the experiment was repeated n.
X
Relative frequency of an event A to occur
N
The compliment rule: PA or AC 1 that means PA P AC 1 therefore PAC 1 PA
Two events A and B are said to be mutually exclusive if they do not intersect in any way, that means
PA and B 0
The general additive rule: PA or B PA PB PA and B
Independent events: Two events are independent if the occurrence of one of the events has no influence
on the occurrence of the other event, this means PA and B PA PB
Conditional probabilities will look how to calculate probabilities of events that are conditional on out-
comes of other experiments.
P B and A
PB A PA and B P B and A
PA
Activities
Question 1
S 1 2 3 4 5 6
If the die is balanced each simple event has the same probability. Let event A represents an even number and
let event B represents a number less than or equal to 4
62
1. Event A 2 4 6
2. Event B 1 2 3 4
3 4
4. P A P B
6 6
2
5. P A and B
4
Solution
B a number less than or equal to 4 from the sample space S, that means B 1 2 3 4
1. Correct
2. Correct
3. Correct
4. Correct
number of outcome of event of A 3
P A 05
total number in S 6
5. Incorrect
number of outcome of event of A and B 2
PA and B
total number in S 6
Option (5)
63 STA1610/1
Question 2
The following table lists the joint probabilities of achieving grades of A and not achieving grades of A in two
MBA courses.
3. Event does not achieve a grade of A in marketing and does not achieve a grade A in statistics are
mutually exclusive events.
4. Events achieve a grade of A in marketing and achieve a grade of A in statistics are independent events.
5. The probability that a student achieves a grade of A in marketing, given that he or she does not achieve
a grade of A in statistics is 02901
Solution
1. Correct
2. Correct
64
3. Incorrect
Two events are mutually exclusive if the probability of the two events equals to zero i.e. PA and B
0
PDoes not achieve a grade of A in Statistics and Does not achieve a grade of A in marketing
0580
0580
10
Since the result of the probability 0580 which is different than zero, therefore the two events are not
mutually exclusive.
4. Correct
Two events are independent events if PA and B PA P B where A represent the first event
and B the second event.
0053
PA and B 0053 PA 029 P B 0183
10
Because PA and B 0053 is approximately equal to PA P B 005307, therefore the two
events are independent.
5. Correct
P A and B
Conditional probability rule: PBA where A represents the first event and B the
PA
0237
P A and B 0237 PA 029
10
P B and A 0237
PB A 02901
PA 0817
Option (3)
65 STA1610/1
Question 3
Consider a sample space from an experiment in which a die is rolled. The sample space is S 1 2 3 4 5 6
Let event A represents the event of rolling an odd number, A 1 3 5; let B represent the event of rolling
a number less than or equal to 4, B 1 2 3 4 and C be the event that a 5 or 6 is rolled, C 5 6.
The probability that A and B both occur when the die is rolled is:
3
1.
6
4
2.
6
2
3.
6
1
4.
6
5
5.
6
Solution
Option (3)
66
Question 4
A1 A2
B1 04 03
B1 02 01
1. PA1 06
2. PB1 07
4. PA1 and A2 1
5. PB1 and B2 0
A1 A2 Total
B1 04 03 07
B2 02 01 03
Total 06 04 10
1. Correct
2. Correct
3. Correct
04
P A1 and B1 04
10
67 STA1610/1
4. Incorrect
P A1 and A2 0
5. Correct
Option (4)
Question 5
The following table represents gas well completions during 1986 in North and South America.
Complete the numbers missing in the above table and calculate PA and C:
1. 14
2. 027
3. 041
4. 064
5. 020
Solution
68
number of outcome of event of A and C 14
PA and C 02
total number 70
Option (5)
Question 6
1. PB 05
2. PAB 02
3. PA or B 08
Solution
1. Correct
P B 1 P B C 1 05 05
2. Correct
P A and B
PA B where A represents the first event and B the second event under the
PB
conditional of outcome of A.
P A and B 01
PA B 02
PB 05
69 STA1610/1
3. Correct
4. Incorrect
Because PA and B 01 therefore the two events A and B are not mutually exclusive. Two events
are mutually exclusive when the probability of the two events equals to zero i.e. PA and B 0
5. Correct
Option (4)
Question 7
1. The probability refers to a number between 0 and 1 which expresses the change that an event will occur.
3. If the event of interest is A, then the probability that A will not occur is the compliment of A.
5. The sex of the students (males and females) cannot be used as an example of mutually exclusive events.
70
Solution
1. Correct
2. Correct
3. Correct
4. Correct
5. Incorrect
Two events are mutually exclusive when P A and B 0. In this case the probability of having sex
of the students male and sex of students female equals to zero.
Option (5)
Question 8
71 STA1610/1
Solution
1. Correct
If A and B are mutually exclusive than PA or B PA PB PA and B
05 03 0
08
The two events are mutually exclusive when PA and B 0
2. Incorrect
3. Correct
P A and B 0
PA B 0 Because the two events are mutually exclusive PA and B
PB 03
0
4. Correct
P A and B PA P B
PA B PA 05
PB PB
5. Correct
Option (2)
72
Question 9
The following table shows a sample of voters cross-classified according to place of residence and their pref-
erence for two candidates for parliament:
Place of residence
Preferred candidate Urban Suburban
A 100 30
B 60 40
What is the probability that a voter picked at random will be a suburban dweller and will prefer candidate B?
1. 01739
2. 03043
3. 04348
4. 40
5. 07139
Solution
40
Psuburban and B 01739
230
Option (1)
73 STA1610/1
Question 10
1 3
Two events A and B are independent such that PA and PB . What is the value of PA B?
4 4
1. 075
2. 025
3. 000
4. 10
Solution
1 3
Given that PA PB
4 4
P A and B PA P B 1
PA B PA 025
PB PB 4
Option(2)
Question 11
The physical science degrees conferred by a school between 1992 and 1995 were broken down as follows:
Gender
Major Male (M) Female (F) TOTAL
Physics (P) 25 25 50
Chemistry (C) 60 40 100
Geology (G) 30 20 50
TOTAL 115 85 200
74
Suppose that a person is selected at random from these graduates.
1. P M 0575
2. P MP 05
3. P M and P 0125
4. P C or F 0275
Solution
1. Correct
115
PM 0575
200
2. Correct
P M and P
PM P
PP
25
PM and P 0125
200
50
PP 025
200
P M and P
PM P
PP
0125
025
05
75 STA1610/1
25
3. PM and P 0125
200
4. Correct
100 85 40
200 200 200
100 85 40
200
145
200
0725
5. Correct
25
Because PM and P 0125 and not zero
200
Option (4)
Question 12
Given that P A 07 P B 06 and P A and B 035 which one of the
following statements is incorrect?
1. P B 04
3. P BA 050
4. P A or B 095
76
Solution
1. Correct
2. Correct
035 042
P B and A 035
3. PBA 05
PA 07
5. Incorrect
Events A and B are mutually exclusive when PA and B 0 but because PA and B 035 ,
Option (5)
Question 13
Assume A and B are independent events with PA 040 and PB 030
77 STA1610/1
1. PA 060
2. P A and B 012
3. PA or B 058
4. P B A 03
Solution
1. Correct
2. Correct
3. Correct
4. Correct
P B and A 012
PB A 03
PA 04
78
5. Incorrect
Events A and B are independent as PA and B 012 not Zero to satisfy the mutually exclusive
rule.
Option (5)
Question 14
1 3 5 7, Event B “Number greater than 4” 5 6 7 8 and Event C “1 or 2” 1 2, then
PA B is
1. 05
2. 025
3. 10
4. 075
5. 0125
Solution
P A and B
PA B
PB
79 STA1610/1
number of outcome of event of A and B 2
PA and B 025
total number S 8
P A and B 025
PAB 05
PB 05
Option (1)
80
Chapter 5
SUMMARY
Discrete probability distribution means probability distribution when dealing with discrete random
variables.
The sum of the probabilities of the random variables equals to 1. Mathematically that means pxi
1
The mean () or the expected value denoted by EX = xi pxi
The variance 2 xi 2 Pxi
The standard deviation ariance
81 STA1610/1
Activities
Question 1
1.
x 0 1 2 3
p x 0512 0384 0096 0008
2.
x 0 1 2 3
p x 01 03 04 01
3.
x 0 1 2 3
p x 001 001 001 098
4.
x 0 1 2 3
p x 025 046 004 024
5.
x 0 1 2 3
p x 015 025 05 03
Solution
x 0 1 2 3
Px 0512 0384 0096 0008
pxi 0512 0384 0096 0008 1
Option (1)
Question 2
The number of pizzas delivered to university students each month is a random variable with the following
probability distribution
x 0 1 2 3
p x 01 03 ? 02
82
Which one of the following statements is incorrect?
1. p 2 04
2. P 0 X 2 08
3. P 1 X 3 04
4. P X 1 06
Solution
x 0 1 2 3
px 01 03 ? 02
1.
5.
xi pxi
0 01 1 03 2 04 3 02
0 03 08 06
17
Option (5)
83 STA1610/1
Question 3
Based on past experience, a researcher knows that the probability distribution for X= the number of students
who come to her office on Wednesdays is given as
x 0 1 2 3 4
px 010 020 050 015 005
1. PX 2 02
2. PX 1 010
3. EX 185
4. P1 X 4 07
5. P2 X 4 065
Solution
1. Incorrect
2. Incorrect
3. Incorrect
EX xi pxi
0 010 1 020 2 050 3 015 4 005
0 020 1 045 02
185
84
4. Correct
P1 X 4 050 015 005 07
5. Incorrect
P2 X 4 P3 015
Option (4)
Question 4
Suppose that the number of defective welds in a length of pipe has the probability distribution given below.
X, represents the number of defective welds
x 0 1 2 3 4 5 6
p x 060 020 010 005 003 001 001
1. 35
2. 017
3. 078
4. 138
5. 10
Solution
x 0 1 2 3 4 5 6
px 060 020 010 005 003 001 001
The mean xi pxi
0 060 1 020 2 010 3 005 4 003 5 001 6 001
0 020 020 015 012 005 006
078
Option(3)
85 STA1610/1
Question 5
Suppose X represent the number of the students in STA1610. The probability distribution of X is as follows:
x 1 2 3 4 5
p x 025 033 017 015 010
If the mean 252 then the variance of the best students in STA1610 is
1. 16496
2. 16006
3. 30409
4. 17438
5. 12844
Solution
x 1 2 3 4 5
252
px 025 033 017 015 010
The variance
2 xi 2 pxi
1 2522 025 2 2522 033 3 2522 017 4 2522 015 5 2522 010
05776 008923 003917 032856 061504
16496
Option (1)
86
Question 6
The probability distribution of a discrete random variable X is shown below, where X represents the number
of cars owned by a family:
x 0 1 2 3
px 025 040 020 015
1. PX 1 035
2. PX 2 085
3. P1 X 2 060
4. PX 1 025
5. P0 X 1 065
Solution
x 0 1 2 3
px 025 040 020 015
1. Correct
2. Correct
3. Correct
87 STA1610/1
4. Correct
5. Incorrect
Option(5)
Question 7
After analyzing the frequency with which cross-country skiers participate in their sport, a sportswriter created
the following probability distribution for X number of times per year cross–country skiers ski.
x 0 1 2 3 4 5 6 7 8
p x 004 009 019 021 016 012 008 006 005
1. P X 3 012
2. P X 5 019
3. P 5 X 7 014
4. P X 3 032
Solution
x 0 1 2 3 4 5 6 7 8
px 004 009 019 021 016 012 008 006 005
88
1. Incorrect
PX 3 021
2. Incorrect
3. Incorrect
4. Incorrect
5. Correct
The mean X i PX i 0 004 1 009 2 019 3 021 4 016 5 012
6 008 7 006 8 005 364
Option (5)
Question 8
The mean length of stay in hospital is useful for planning purposes. Suppose that the following is the distrib-
ution of the length of stay in a hospital after a minor operation:
Days 2 3 4 5 6
Probability 005 020 040 020 ?
1. 015
2. 017
3. 33
4. 40
5. 42
89 STA1610/1
Solution
Days 2 3 4 5 6 Total
Probability 005 020 040 020 015 1
The mean xi pxi 2 005 3 020 4 040 5 020 6 015 420
Option (5)
Question 9
x 0 1 2 3
p x 0512 0382 ? 0008
1. p2 0098
2. P0 x 2 048
3. P1 x 3 0488
4. PX 0 1
5. EX 00602
90
Solution
1. Correct
2. Correct
3. Correct
4. Correct
5. Incorrect
EX x px
0 0512 1 0382 2 0098 3 0008
0602
Option (5)
91 STA1610/1
5.2 Discrete Probability Distribution
SUMMARY
Binomial distribution is a discrete random variable that describes the number of successful outcomes
of n simple independent trials that can either succeed or fail.
Binomial distribution is associated with questions that allow "yes" or "no" type of answers. This means
we have two outcomes as "success" or "failure".
The probability of success for each simple trial is the same and is denoted by
n!
To calculate the binomial probability we can use either the formula px x 1 nx
x!n x!
5! 5 4 3 2 1 120 or we can use the binomial statistics tables enclosed at the end of the
section 2.
The mean n
The standard deviation ariance
2. Each trial results in one of two outcomes, which we can define as either a success or a failure.
4. The probability of success () is the same for each trial. In other hand the probability of failure
is 1
5. The random variable equals the number of successes in the trials, and can only take on whole
number values between 0 and 1.
92
Activities
Question 1
Given a binomial random variable with n 6 and 020 (Hints: Use the formulae or the binomial Table
1 to calculate the following probabilities).
1. P X 2 02458
2. P X 5 00005
3. P X 1 06553
4. P X 5 00016
Solution
1. Correct
PX 2?
93 STA1610/1
Using the formulae: 6! 720 2! 2 4! 24 0202 004 0804 04096 so
6!
p2 0202 1 02062
2!6 2!
720
004 04096
2 24
2. Incorrect
PX 5?
3. Correct
4. Correct
5. Correct
Option (2)
94
Question 2
A certain type of tomato seed germinates 90% of the time. A backyard farmer planned 25 seeds. The expected
number (mean) of seeds that germinate is
1. 225
2. 259
3. 0036
4. 2778
5. 225
Solution
090 n 25 ?
Option(5)
Question 3
Suppose that 10% of butterflies have damaged wings. If a random sample of 10 butterflies is selected, what
is the probability that more than four have damaged wings?
1. 00112
2. 00128
3. 00016
4. 09372
5. 09984
95 STA1610/1
Solution
1.
PX 4 P X 5
PX 5 PX 6 PX 7 PX 8 PX 9 PX 10
00015 00001 00000 00000 00000 00000 (binomial tables)
00016
2.
PX 4 P X 5
1 PX 4
1 PX 0 PX 1 PX 2 PX 3 PX 4
1 03487 03874 01937 00574 00112 (binomial tables)
1 09984
00016
Option (3)
Question 4
1. 01536
2. 09728
3. 01808
4. 00272
5. 08192
96
Solution
Option (5)
Question 5
In the Limpopo province about 30% of adults have four–year college degrees. Suppose five adults are ran-
domly selected. Calculate the expected value (mean) and the standard deviation of this binomial distribution.
Solution
The standard deviation n1 5 030 1 030 105 10247
Option (3)
97 STA1610/1
Question 6
According to a report from the center for studying health system change, 20% of South Africans delay or go
without medical care because of concerns about cost. Suppose six individuals are randomly selected. The
probability that more than 4 will delay or go without medical care is
1. 00015
2. 09984
3. 00154
4. 00016
5. 00017
Solution
Option (4)
Question 7
Suppose that an admission test for a certain university is designed so that the probability of passing it is 45%
Find the probability that among 5 candidates who take the test, more than 3 will pass.
1. 02757
2. 01128
3. 01313
4. 00185
5. 00102
98
Solution
Option (3)
Question 8
The probability that a certain machine will produce a defective item is 025. If a random sample of 6 items is
taken from the output of this machine, what is the probability that there will be at least five defectives items
in the sample?
1. 02373
2. 00044
3. 00046
4. 025
5. 15
Solution
Option (3)
99 STA1610/1
Question 9
A motor company has purchased steel parts form a supplier for several years and has found that 10% of the
parts must be returned because they are defective. An order of 5 parts is received. What is the probability
1. 00004
2. 00001
3. 05905
4. 00729
5. 00086
Solution
Option (1)
100
5.2.2 Poisson Probability Distribution
SUMMARY
Poisson distribution is a discrete probability distribution that describes the probability of X events to
occur during a specified interval that could be time, distance, area or volume, if the average occurrence
is known and the events are independent of the specified interval denoted since the last event occurred.
To calculate the probability of a Poisson random variable. we can use either the formula or the poisson
statistics tables provided,
e x
The formula is PX x
x!
x, represents the number of occurrences of an event.
x! is the factorial of x
is a positive number that represents the expected number (or the mean ) of occurrences for a given
interval.
The mean and variance for Poisson distribution are closely identical, that why we can say V arianceX
meanX
Activities
Question 1
Calculate the probability that three bank robberies occurred in a day where the number of bank robberies that
occur in a large Gauteng city is Poisson distributed with a mean equals to 18 per day. (Hints: Use the
Poisson tables to calculate the probability).
1. 01653
2. 02975
3. 01607
4. 02768
5. 03329
101 STA1610/1
Solution
27182818 183
PX 3 where 27182818 01653
3!
183 5832 3! 3 2 1 6
01653 5832
PX 3
6
PX 3 01607
X 11 12 13 14 15 16 17 18 19 20
0
1
2
3 00738 00867 00998 01128 01255 01378 01496 01607 01710 01804
4
:
9
Option (3)
Question 2
The number of accidents that occur at a busy intersection is Poisson distributed with a mean of 3.5 per week.
The probability of no accidents in one week is
102
1. 00302
2. 00000
3. 01359
4. 03020
5. 10000
Solution
X 31 32 33 34 35 36 37 38 39 40
0 00450 00408 00369 00334 00302 00273 00247 00224 00202 00183
Option (1)
103 STA1610/1
Question 3
Poisson distribution was used to model the number of faults in the gearboxes of buses. Suppose the faults
occur at an average of 2.5 per month. The probability that at a least 2 faults are found in a month is
1. 07127
2. 00821
3. 02565
4. 09692
5. 07172
Solution
PX 2 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
02565 02138 01336 00668 00278 00099 00031 00009 00002 00000
07126
104
Using the Poisson distribution tables
X 21 22 23 24 25 26 27 28 29 30
0 01225 01108 01003 00907 00821 00743 00672 00608 00550 00198
1 02052
2 02565
3 02138
4 01336
5 00668
6 00278
7 00099
8 00031
9 00009
10 00002
11 00000
12 00000
Alternatively we know that the total number of each column equals to 1. In this case we will count from 0 up
to 12. To easily resolve this problem we have to add the probability 0 and 1 and the result we subtract from
1, this means
PX 2 1 PX 1
1 P0 P1
1 00821 02052
1 02873
07127
Option (1)
105 STA1610/1
Question 4
1. P X 0 00025
3. P X 3 0062
4. P X 5 05543
5. P 2 X 5 02824
106
Solution
X 51 52 53 54 55 56 57 58 59 60
0 00061 00055 00050 00045 00041 00037 00033 00030 00027 00025
1 00149
2 00446
3 00892
4 01339
5 01606
6 01606
7 01377
8 01033
9 00688
10 00413
11 00225
12 00113
13 00052
14 00022
15 00009
16 00003
17 00001
18 00000
107 STA1610/1
1. Correct
PX 0
2. Incorrect
3. Correct
4. Correct
There are two-way of doing it, we can add the values from PX 6 01606 up to PX 18
00000 as indicated by the above tables or we can use the second procedure demonstrated below
PX 5 P X 6
1 PX 5
1 PX 0 PX 1 PX 2 PX 3 PX 4 PX 5
1 00025 00149 00446 00892 01339 01606
1 04457 05543
5. Correct
Option(2)
108
Question 5
A random variable Y has the Poisson distribution with parameter 38 The probability PY 5 is then equal
to
1. 01477
2. 08156
3. 01944
4. 08344
5. 01844
Solution
We are given:
Option (2)
Question 6
Tomatoes for January in Kansas follow a Poisson distribution with an average of 32 per month. The proba-
bility that in the next January Kansas will experience exactly 2 tomatoes is:
1. 07913
2. 04076
3. 02087
4. 01304
5. 02226
109 STA1610/1
Solution
Option (3)
Question 7
A cashier at Caster’s cafeteria can total an average of 12 trays per minute.
3. The probability that the cashier will total exactly zero tray per minute is 03012.
4. The probability that the cashier will total at least four trays per minute is 00338.
5. The probability that the cashier will total at most three trays per minute is 08795.
Solution
1. Correct
The occurrence number is given as 1.2 per interval of time
2. Correct
The variance is equal to the mean
3. Correct
PX 0 03012
110
4. Correct
5. Incorrect
Option (5)
Question 8
Poisson distribution was used to model the number of faults that arise in the gearboxes of buses. Suppose the
faults occur at an average rate of 25 per month. The probability that at least one PX 1 fault is found in
a month is
1. 02052
2. 02873
3. 07127
4. 09179
5. 09197
Solution
PX 1 1 PX 0
1 00821
09179
Option (4)
111 STA1610/1
112
Chapter 6
SUMMARY
In this module we explore the concept of normal probability alone without discussing other continuous
probability distribution.
A normal distribution is symmetric, bell shaped curve and centred at its mean value.
A normal distribution has two parameters: The mean and the variance 2 , we denote a normal
random variable by N 2
To calculate the probability of a random variable, we have to perform the concept of standardisation.
The standardisation is a way to standardise the values from normal population so that they have a
standard mean and variance. By so doing we remove the unit of measurement of the variable (for
example kilograms, meters or second) to the standardised values that have no unit.
By standardising a normal random variable X with a mean and the standard deviation , we create a
random variable called Z that has a standard normal distribution.
113 STA1610/1
The standard normal variable is a normal random variable with a mean equal to zero and the standard
deviation equal to 1. We denote the standard normal variable by Z0 1
The calculation performed to convert a normal population N 2 into a standard normal Z denoted
X
by Z0 1 we use the formula Z
The standard normal variable is symmetric, bell-shape, asymptotic and the total area under a normal
curve equal to 1.
Activities
Question 1
Using the standardise normal table Z, calculate PZ 165 where Z is normally distributed with a mean
equals to 0 and the variance equals to 1. The correct answer is
1. 00495
2. 00548
3. 09452
4. 09505
5. 06139
Solution
PZ 165?
Z is normally distributed with the value of the mean equal to Zero and the variance equals to 1.
0.9505
_ 1.65 0 Z
114
Because we shade a big area, we use the positive Z- standardised normal table.
Z 000 001 002 003 004 005 006 007 008 009
00
01
02
16 09452 09463 09474 09484 09495 09505 09515 09525 09535 09545
Option (4)
Question 2
The long–distance calls made by the employees of a company are normally distributed with a mean of 63
minutes and a standard deviation of 22 minutes. Use the normal standardised table to calculate the probability
that a call last less than 7 minutes.
1. 06179
2. 06293
3. 03821
4. 04880
5. 05120
115 STA1610/1
Solution
The population mean 63 The population standard deviation 22 PX 7?
X
Let us convert the random variable X into Z, using the standardised formulae Z
763
PX 7 P Z PZ 003
22
0.5120
0 0.03 Z
Because we shade a big area under a normal curve, we use the positive Z-standardised tables
Z 000 001 002 003 004 0.05 006 007 008 009
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
01
02
3.0
Option (5)
116
Question 3
Suppose Z is normally distributed with a mean 0 and the variance 2 1, calculate PZ 159.
1. 09441
2. 00559
3. 00668
4. 01469
5. 00559
Solution
PZ 159?
0.0559
_ 1.59 0 Z
Because we shade a small area under a normal curve, we use the negative Z-standardised tables
Z 000 001 002 0.03 004 0.05 006 007 008 009
- 3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
29
28
:
-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
-0.0
Option (2)
117 STA1610/1
Question 4
X is normally distributed with mean 100 and the standard deviation 20. What is the probability
that X is greater than 145?
1. 09878
2. 00139
3. 09778
4. 09861
5. 00122
Solution
The population mean 100 The population standard deviation 20 PX 145?
X
Let us convert the random variable X into Z, using the standardised formulae Z
145 100
PX 145 P Z PZ 225
20
0.0122
0 2.25 Z
118
Because we shade a small area under a normal curve, we use the negative Z-standardised tables.
Z 000 001 002 0.03 004 0.05 006 007 008 009
31
29
:
22 00139 00136 00132 00129 00125 00122 00119 00116 00113 00110
Option (5)
Question 5
The distribution that has a mean of zero and a standard deviation of one is called the
4. frequency distribution.
5. skewed distribution.
Solution
Option (3)
119 STA1610/1
Question 6
If the Z-score is given as Z 196, and the distribution of X is normally distributed with a mean 60
and a standard deviation 6, then the x-value that this Z-score corresponds to is
Solution
Z 196 6 60 X?
X
The standardised Z
X 60
196
6
X 60
6 196 6 Let multiply both side by 6 to solve the equation
6
1176 X 60
1176 60 X
7176 X
Option (1)
120
Question 7
For a random variable Z from the standard normal distribution with a mean 0 and a standard deviation
1, which one of the following is incorrect?
Solution
1. Correct
PZ 152 00643
0.0643
_ 1.52 0 Z
Because we have shaded the a area than we use the negative Z-standardised normal table.
2. Correct
PZ 148 09306
0.9306
0 1.48 Z
121 STA1610/1
3. Correct
0.6664
_ 0.43 Z
0
4. Incorrect
0.2296
0 0.74 Z
5. Correct
122
0.9821
0 2.10 Z
2.34
0.9904
In this case we read the values based on the sign of the number that means a negative Z value from
the negative Z-tables and a positive Z value from a positive Z-tables than we subtract the two values
obtained. We have used the positive Z-tables because the two Z values are positive 210 and 234.
Option (4)
Question 8
For a particular group of scores, the calculated mean and standard deviation are 20 and 5 respectively. The Z
score for a raw score of 30 is
1. 55
2. 2
3. 26
4. 10
5. 2
123 STA1610/1
Solution
20 5 X 30
X
The standardised Z
30 20
Z 2
5
Option (2)
Question 9
A psychologist has been studying eye fatigue using a particular measure, which she administers to students
after they have worked for 1 hour writing on a computer. On this measure she has found that the distribution
follows a normal curve. Using a normal probabilities table, what is the probability of students having Z–score
below 15?
1. 00668
2. 09394
3. 09332
4. 00606
5. 09345
Solution
PZ 15?
0.9332
0 1.5 Z
Because we have shaded a big area, we have to use the positive Z-standardised tables.
Option (3)
124
Question 10
The average high school teacher annual salary is R43 000 Let teacher salary be normally distributed with a
standard deviation of R18 000 Calculate P X R80 000?
1. 00228
2. 09803
3. 206
4. 09772
5. 00197
Solution
The population mean 43000 The population standard deviation 18000 PX 80000?
X
Let us convert the random variable X into Z, using the standardised Z
80000 43000
PX 80000 P Z PZ 206
18000
0.0197
0 2.06 Z
Because we shade a small area, we have to use the negative Z-standardised tables.
Option (5)
125 STA1610/1
Question 11
If the area to the right of a positive Z 1 is 00869 then the value of z 1 must be
1. 136
2. 136
3. 05319
4. 008
5. 180
Solution
0.0869
0 Z ?
We were given the value of the area called probability equals to 00869, we have been asked to get the
corresponding value of Z. Because the value of the area is small, we use the negative Z-standardised tables
knowing that a standardised Z normal distribution is symmetric.
Z 000 001 002 0.03 004 0.05 006 007 008 009
31
29
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0853 0.0838 0.0823
0.0869
Option (2)
126
Question 12
5. Cannot be calculated
Solution
Option (3)
Question 13
Suppose that Z is normally distributed with a mean 0 and the variance equals 1.
127 STA1610/1
Solution
1. Correct
0.0041
0 2.64 Z
2. Correct
0.1922
_ 0.87 0 Z
3. Correct
09192 00808
08384
128
0.0808
_ 1.4 0 1.4 Z
0.9192
4. Incorrect
0.0026
_ 2.8 0 Z
5. Correct
0.2296
_ 0.74 0 Z
129 STA1610/1
0.2296
0 0.74 Z
Option (4)
Question 14
The owner of an appliance store uses a normal distribution with mean 10 and variance 9 to model the weekly
net sales Calculate PX 35 ?
1. 2167
2. 00062
3. 02358
4. 09850
5. 00150
Solution
The population mean 10 The population standard deviation 93 PX 35?
X
Let us convert the random variable X into Z, using the standardised Z
130
35 10
PX 35 P Z PZ 217 00150
3
0.0150
_ 2.17 0 Z
Because we shade a big area under a normal curve, we use the positive Z-standardised tables
Z 000 001 002 0.03 004 0.05 0.06 007 008 009
31
29
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
:
Option (5)
Question 15
1. P Z 160 00548
2. P Z 155 00606
131 STA1610/1
4. P Z 209 00019
5. P Z 167 09525
Solution
1. Incorrect
0.9452
_ 1.60 0 Z
2. Correct
0.0606
0 1.55 Z
3. Incorrect
07257 00808
06449
132
0.0808
_ 1.40 Z
0 0.6
0.7257
4. Incorrect
0.9817
0 2.09 Z
5. Incorrect
0.0475
_ 1.67 Z
0
Option (2)
133 STA1610/1
Question 16
The distribution of weights of a large group of high school students is normally distributed with a mean of
55 kg and a standard deviation of 5 kg. What is the probability of weights of a large group of high school
students will be more than 63 kg?
1. 09452
2. 00458
3. 01446
4. 00548
5. 08554
Solution
X
Let us convert the random variable X into Z, using the standardised Z
63 55
PX 63 P Z PZ 160 09452
5
0.9452
0 1.60 Z
Option (1)
134
Question 17
A courier service company has found that their delivery time of parcels to clients is normally distributed with
a mean of 45 minutes and a standard deviation of 8 minutes. The probability that a randomly selected parcel
will take less than 48 minutes to deliver is
1. 0375
2. 032
3. 00648
4. 03520
5. 06480
X
Let us convert the random variable X into Z, using the standardised Z
48 45
PX 48 P Z PZ 0375 PZ 038 06480
8
0.6480
0 0.38 Z
Option (5)
135 STA1610/1
136
Chapter 7
SUMMARY
We explore the concept of taking a sample from a population, and discuss the distributional properties
of the sample mean and sample proportion calculated from these samples.
The sample mean is normally distributed when the underlying data are normally distributed.
The idea behind probability sampling is random selection, and for this reason we typically call these
random samples.
In this module will not discuss the categories of probability samples such as sample random, systematic,
stratified and cluster.
Sampling distribution describes how a sample statistic varies when calculated from all samples of size
n drawn from a single population.
137 STA1610/1
7.1 Sampling Distribution of the Mean
SUMMARY
A sample mean X is an unbiased estimator for the population mean , because the mean (expected
value) of all sample means of size n selected from the population is equal to the population mean, .
This is denoted by EX
Unbiased means when the mean (expected value) of the sampling distribution of a statistic is equal to
the population parameter, that statistic is said to be unbiased estimator of that parameter, that is, on
average, we say that the statistic estimates the correct value.
If the variance of the variable of X is 2 , then the variance of the sample mean X , denoted by
2
2X .
n
The standard deviation of the mean X , which we call the standard error of X is denoted by X
n
If a random variable X is normally distributed with the mean and the variance 2 , this is denoted
by X~N 2 , then the random variable
is also normally distributed with the mean and the
X
2 2
variance this is denoted by X ~N .
n n
To calculate the probability under a random variable X , we have to convert X into the standardised
X
Z-value given by the formulae Z
n
X
The test statistic is Z
n
138
Activities
Question 1
A sample of n 16 observations is drawn from a normal population with a mean 1000 and a standard
deviation 200 Use the standardised normal Z tables to calculate the probability P X 1050
1. 08413
2. 01587
3. 05398
4. 08438
5. 04602
Solution
The sample size n 16 The population mean 1000 The population standard deviation 200
PX 1050?
Step 1
Because we are given a random variable X we need to convert X into Z by using the formulae of the test
X
statistic Z as it is developed below
n
X 1050 1000 50
If X 1050what is the corresponding of Z? Z 1
200 50
n
16
139 STA1610/1
Step 2
PX 1050? can be written as PZ 1? and this takes us back to the concept of chapter 6. To solve this
probability we draw a normal graph as shown below
0.1587
0 1.00 Z
Because we shade a small area under a normal curve, we use the negative Z-table to calculate the probability.
Option (2)
Question 2
A normally distributed population has a mean of 40 and a standard deviation 12. The standard error of the
sample mean if the sample size is 100 equal to:
1. 4
2. 28
3. 12
4. 02
5. 12
140
Solution
12
The standard error X 12
n 100
Option (3)
Question 3
For a random variable that is normally distributed, with a population mean 80 and a population standard
deviation 10, the probability that a simple random sample of 25 items will have a mean that is less than
85 is
1. 09798
2. 09938
3. 00062
4. 00054
5. 00202
141 STA1610/1
Solution
Step 1
Because we are given a random variable Xwe need to convert X into Z by using the formulae of the test
X
statistic Z as shown below
n
X 85 80 5
If X 85what is the corresponding of Z? Z 25
10 2
n 25
Step 2
PX 85? can now be written as PZ 25? and this takes us back to the concept of chapter 6. To solve
this probability we have a normal graph as shown below
0.9938
0 2.5 Z
Because we shade a big area under a normal curve, we use the positive Z-table.
Option (2)
142
Question 4
1. The sampling distribution of the mean will have the mean as the original population from which the
samples were drawn.
2. The standard deviation of the sampling distribution of the mean is also called the standard error.
3. When the population mean 160 the population standard deviation 25 n 64 the standard
error is 3125
4. A confidence interval is an estimate for which there is a specified degree of certainty that the population
parameter will be in the interval.
5. We use the t–distribution for the statistical inference of the population mean when the population
standard deviation is known under the assumption that the population is normally distributed.
Solution
1. Correct
Because the mean (expected value) of all sample means of size n selected from the population is equal
to the population mean, . This is denoted by EX
2. Correct
3. Correct
4. Correct
5. Incorrect
We use the t-distribution when the population standard deviation is unknown but the sample standard
deviation is known.
Option (5)
143 STA1610/1
Question 5
For a random variable that is normally distributed, with a population mean 80 and a population standard
deviation 10, the probability that a simple random sample of 25 items will have a mean that is between
79 and 85 is
1. 09938
2. 03085
3. 06853
4. 04997
5. 06835
Solution
We are given
The population mean 80 The population standard deviation 10 The sample size n 25
Step 1
Because the random variable X is given, we need to convert X into Z by using the formulae of the test
X
statistic Z as shown below
n
X 79 80
When X 79what is the corresponding value of Z? Z 1
2
05
10
n 25
X 85 80 5
When X 85what is the corresponding value of Z? Z 25
10 2
n 25
144
Step 2
P79 X 85? can now be written as P05 Z 25? and this takes us back to the concept of
chapter 6. To solve this probability we have to draw a standardised normal graph as shown below
Because the shade is between the two Z-values, the probability is equal to the difference between P(Z 2.5)
and P(Z -0.5) as we read based on the sign of the Z- value. A negative value to the negative Z and a
positive value to the positive Z normal tables.
09938 03085
06853
0.6853
0.3085
_ 0.5 0 2.5 Z
0.9938
Option (3)
Question 6
A random sample of size n 12 was selected from a population and the data are as follows:
36 61 47 23 51 82 71 12 71 65 42 50
The sample mean 5092 and the sample standard deviation 2064
The standard error (or the standard deviation of the mean) is equal to
145 STA1610/1
1. 1720
2. 35490
3. 0379
4. 20637
5. 59583
Solution
The data : 36 61 47 23 51 82 71 12 71 65 42 50 n 12
X 36 61 47 23 51 82 71 12 71 65 42 50 611
The sample mean X
n 12 12
509166
2 XX2
The sample variance S
n1
The sample standard deviation is S 4259015 206374
S 206374
The standard error X 59575
n 12
Option (5)
146
Question 7
The amount of time it takes to complete a final examination is normally distributed with a mean of 75
minutes and a standard deviation of 8 minutes. If 64 students were randomly sampled, the probability that
the sample mean of the sampled students exceeds 76 minutes is
1. 08413
2. 04602
3. 01587
4. 05398
5. 01578
Solution
X 76 75 1
PX 76 PZ P Z PZ PZ 1 01587
864 1
n
0.1587
0 1.00 Z
Option (5)
147 STA1610/1
Question 8
Employers in a large manufacturing plant worked an average of 620 hours of overtime last year, with a
standard deviation of 150 hours . For a random sample of 36 employees, the probability that the average
1. 04452
2. 03936
3. 00548
4. 09452
5. 04364
Solution
X 58 62 4
PX 58 P Z P Z P Z PZ 16 00548
1536 25
n
0.0548
0 1.6 Z
Option (2)
148
Question 9
Given a normal population whose mean is 50 and whose standard deviation is 5, find the probability that a
random sample of 25 has a mean greater than 52.
1. 09772
2. 09778
3. 00228
4. 00222
5. 02280
Solution
50 5 n 25 PX 52?
X 52 50 2
PX 52 P Z P Z P Z PZ 2 00228
525 1
n
0.0228
0 2.00 Z
Option (3)
149 STA1610/1
Question 10
A random sample of size n 12 was selected from a population and the data are as follows:
36 61 47 23 51 82 71 12 71 65 42 50
mean x 5092 Standard deviation S 2064
1. 1720
2. 35490
3. 0379
4. 20637
5. 59583
Solution
S 2064
The standard error of the sample mean is equal to 59583
n 12
Option (5)
150
7.2 Sampling Distribution of the Proportion
SUMMARY
The sampling distribution for the sample proportion is approximated using the normal distribution by
making use of the formula p
The approximation for the sampling distribution of the proportion is valid if:
The stardardised sample mean Z-value is given by the formula commonly called the test statistics
p
Z
1
n
We can now conduct probability calculations for the sample proportion in the same way as for the
sample mean.
151 STA1610/1
Activities
Question 1
In a binomial experiment with the sample size n 300 and the sample proportion p 05 the standard
error for proportion p is
1. 00283
2. 00008
3. 01238
4. 00016
5. 02886
Solution
The sample size n 300 The sample proportion p 05 The standard error for proportion p ?
p 1 p 05 1 05
The standard error of the proportion p 00283
n 300
Option (1)
Question 2
Determine the probability that in a sample of 100 the sample proportion is less than 075 if 080:
1. 125
2. 004
152
3. 01056
4. 01469
5. 08944
Solution
Step 1
Because the random variable p is given, we need to convert p into Z by using the formulae of the test statistic
p
Z as shown below
1
n
Step 2
P p 075? can now be written as PZ 125? and this takes us back to the concept of chapter 6. To
solve this probability we have to draw a standardised normal graph as shown below
0.1056
_ 1.25 0 Z
Option (3)
153 STA1610/1
Question 3
A random sample of size n = 400 was selected from a binomial population with the population proportion
02. The number of observed successes in the sample is 96.
4. P p 024 00228
Solution
The sample size n 400 The population proportion 02 The number of successes X 96
1. Correct
X 96
The sample proportion p 024
n 400
2. Correct
1 02 1 02
The standard error p 00004 002
n 400
3. Incorrect
P p 024?
p
Let us convert p into Z by using the formulae of the test statistic Z as shown below
1
n
5. Correct
Option (3)
Question 4
Consider a population proportion 068 The standard error of the proportion for n 20 is
1. 01043
2. 00109
3. 02176
4. 01404
5. 0034
Solution
1 068 1 068
The standard error p 00109 01044
n 20
Option (1)
155 STA1610/1
Question 5
A simple random sample with n 300 is drawn from a binomial process in which 04. The test statistic
for the proportion of success p 035 is
1. 17668
2. 00283
3. 137843
4. 17668
5. 16786
Solution
p
The test statistic Z
1
n
Option (4)
156
Question 6
In a random sample of 85 people from a population, X is the number of left-handed people. In the population
a proportion p 020 of the people are left-handed. The standard error for proportion equals to:
1. 00019
2. 00434
3. 00024
4. 080
5. 04338
Solution
p 1 p 020 1 020
The standard error p 00019 00436
n 85
Option (2)
Question 7
The probability of success on any trial of a binomial experiment is 25%. Find the probability that the pro-
portion of success in a sample of 500 is less than 22%.
1. 00606
2. 09332
3. 09394
4. 00668
5. 06060
157 STA1610/1
Solution
P p 022?
Step 1
Because the random variable p is given, we need to convert p into Z by using the formulae of the test
p
statistic Z as demonstrated below
1
n
Step 2
P p 075? can now be written as PZ 125? and this takes us back to the concept of chapter 6. To
solve this probability we draw a standardised normal graph as indicated below
0.1056
_ 1.25 0 Z
Option (3)
158
Question 8
A random sample of 50 households was selected for a telephone survey. The key question asked was, “Do you
or any member of your household own a cellular telephone with a built–in camera?” Of the 50 respondents,
15 said yes and 35 said no.
The population standard error of households with cellular telephones with built–in camera is
1. 03
2. 00042
3. 00648
4. 07
5. 01684
Solution
X 15
The sample proportion p 03
n 50
p 1 p 03 1 03
The standard error S p 00042 00648
n 50
Option (3)
159 STA1610/1
160
Chapter 8
In many population, we do not know the value of the population mean and proportion. fortunately, we
can use the sample mean (or proportion) to provide an estimate of the population value. We can use
the information provided in a sample. This is called estimation.
The objective of estimation is to determine the appropriate value of a population parameter on the basis
of the sample statistic.
To estimate the population value we can proceed by point estimate and interval estimates.
Confidence interval estimate provide a range of possible values that the true parameter value can assume
along with the degree of confidence that the parameter value lies within the interval.
We often talk about a 95% or 90% or 99% confidence interval for a parameter value.
161 STA1610/1
Confidence level is represented by the probability value 1 associated with a confidence interval,
that means the interval contains the specified parameter with probability (1 ).
In this section we will discuss the confidence interval of the mean and the confidence interval of the
proportion.
When the population standard deviation is known, the confidence interval of the mean is given by
the formulae
X Z 2
n
where Z 2 is called the critical value of the confidence interval and this value can be expressed by the
standardized normal Z-tables.
When the population standard deviation is unknown (but given the sample standard deviation S of
the data ), the confidence interval of the mean is determined by the formulae.
S
X t n 1 2
n
where t n 1 is called the critical value of the the confidence interval and this value can be
2
expressed by the t-student table. In most cases the sample size n 30
162
Activities
Question 1
A statistics practitioner took a random sample of 50 observations from a population with a standard deviation
of 25 and a sample mean of 100. The 95% confidence interval of population mean is
1. 9307 10693
2. 100 69296
3. 931 10693
4. 9418 10582
5. 3907 10629
Solution
The confidence interval of the mean is given by the formula below with 005
X Z 2
n
25
100 Z 005
2
50
163 STA1610/1
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233
100 69296
930704 1069296
Option (1)
Question 2
From the information given below, determine the 95% confidence interval of the population mean:
1. 18902 2098
2. 109008 20192
3. 199008 200992
4. 200 0992
5. 198 209
164
Solution
The confidence interval of the mean is given by the formulae as shown below with 005
S
X t n 1
2 n
005 5
200 t 100 1
2 100
The critical value table value at 99 degrees of freedom and 005 equals to 1984
200 0992
199008 200992
Option (3)
165 STA1610/1
Question 3
A random sample of 25 was drawn from a normal distribution with a population standard deviation of 5. The
sample mean is 80. The 95% confidence interval estimate of the population mean is:
1. 7804 8196
2. 77936 82064
3. 80 196
4. 7840 8159
5. 7804 8196
Solution
The confidence interval of the mean is given by the formulae below with 005
X Z 2
n
5
80 Z 005
2
25
80 Z 0025 1
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233
166
The corresponding value of 0025 equals to Z 19 006 196
80 196 1
80 196
80 196 80 196
7804 8196
Option (1)
Question 4
It is know that the ages are normally distributed with a mean x 4375 and a sample standard deviation
S 1505 for a random sample of 8 men in a bar. The 95% confidence interval of the mean is:
1. 4375 125841
2. 311659 563341
3. 131659 653344
4. 4375 104291
5. 333209 541791
Solution
167 STA1610/1
The confidence interval of the mean is given by the formulae as shown below with 005
S
X t n 1
2 n
005 1505
4375 t 8 1
2 8
The critical value table value at 7 degrees of freedom and 005 equals to 2365
4375 125842
311658 563342
Option (2)
Question 5
A random sample of 25 was drawn from a population. The sample mean and the sample standard deviation
are 510 and 125 respectively.
When calculating a confidence interval of the population mean, the t-test should be used because
168
1. the population standard deviation is known.
Solution
Option (4)
Question 6
A simple random sample of 30 has been collected from a population for which it is known that the population
standard deviation is 100 . The sample mean has been calculated as 240. The 95% confidence interval
for the population mean is
1. 2364215 2435785
2. 240 35785
3. 2362664 2437336
4. 2362451 2437558
5. 2435785 2364251
Solution
169 STA1610/1
The sample mean X 240
The confidence interval of the mean is given by the formulae as shown below with 005
X Z 2
n
10
240 Z 005
2
30
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233
240 35784
2364216 2435784
Option (1)
Question 7
A random sample of 24 observations is used to estimate the population mean. The sample mean and the
sample standard deviation are calculated as 1046 and 288 respectively. The critical value (table value) to
be used to construct a 95% confidence interval for the population mean is
170
1. 196
2. 1645
3. 2069
4. 2064
5. 1711
Solution
005
The critical value for the confidence interval equals t n 1 t 24 1 t 23 0025
2 2
2069
The critical value table value at 23 degrees of freedom and 005 equals to 2069.
Option (3)
171 STA1610/1
Question 8
A research firm conducted a survey to determine the mean amount steady smokers spend on cigarettes dur-
ing a week. They found the distribution of amounts spent per week followed a normal distribution with a
population standard deviation of R5 A random sample of 49 steady smokers revealed that the sample mean
X R20 Determine the 95% confidence interval for
1. 1860 2140
2. 1937 2063
3. 1980 2020
4. 1983 2017
5. 18825 21175
Solution
The confidence interval of the mean is given by the formula below with 005
X Z 2
n
5
20 Z 005
2
49
20 Z 0025 07143
172
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233
20 196 07143
20 14000
20 14000 20 14000
186 214
Option (1)
Question 9
A random sample of 40 men drank with an average of 20 cups of coffee per week during a final examination,
with a sample standard deviation Sx equal to 6 cups. A lower limit of an appropriate 90% confidence interval
for the population average number of cups of coffee drunk, is
1. 20000
2. 217674
3. 19708
4. 182326
5. 18014
173 STA1610/1
Solution
The confidence interval of the mean is given by the formulae as below with 010
S
X t n 1
2 n
010 6
20 t 40 1
2 40
The critical value table value at 39 degrees of freedom and 005 equals to 1683
20 1683 09487
20 17674
20 17674 20 17674
182326 217674
Option (4)
174
Question 10
4. to provide an interval that covers 95% of the individual values in the population.
Solution
Option (3)
Question 11
The average cost per night of a hotel room in Port Elizabeth township is R273. Assume this estimate is based
on a sample of 46 hotels and that the sample standard deviation is R65. The 95% confidence interval of the
population mean cost per night is
1. 2536984 2923016
2. 2567195 2892805
3. 2540083 2919917
4. 2651248 2872631
5. 2925285 2534715
175 STA1610/1
Solution
The confidence interval of the mean is given by the formulae below with 005
S
X t n 1
2 n
005 65
273 t 46 1
2 46
The critical value table value at 45 degrees of freedom and 0025 equals to 2014.
273 193016
2536984 2923016
Option (1)
176
Question 12
The mean X 10 for a sample of 100 and the population standard deviation found as 1. The upper limit of
the 90% confidence for the population mean estimate is
1. 0196
2. 10196
3. 9804
4. 101645
5. 00196
Solution
The confidence interval of the mean is given by the formula below with 010
X Z 2
n
1
10 Z 010
2
100
10 Z 005 01
177 STA1610/1
Z 000 001 002 003 004 005 006 007 008 009
30
:
16 00548 00537 00526 00516 00505 00495 00485 00475 00465 00455
00505 00495
we can see that 005is between 005
2
164 165
In the same way the corresponding value of Z is 1645
2
10 1645 01
10 01645
10 01645 10 01645
98355 101645
Option (4)
Question 13
A simple random sample of 30 has been collected from a population for which it is known that the population
standard deviation 100. The sample mean has been calculated as 2400. The 95% confidence intervals
for the population mean is
1. 2364216 2435784
2. 240 35785
3. 2362664 2437336
4. 2632451 2347558
5. 2435785 2634251
178
Solution
n 30 10 X 240 005
X Z 2
n
10
240 Z 005
2
30
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233
240 35784
2364216 2435784
Option (1)
179 STA1610/1
8.2.2 Confidence Interval of the Proportion
p 1 p
The standard error of the proportion is S p
n
p
The test statistic Z
1
n
Activities
Question 1
An airline has surveyed a simple random sample of travelers to find out whether they would be interested
in paying a higher fare in order to have access to e-mail during their flight. Of the 400 travelers surveyed,
80 said e-mail access would be worth a slight extra cost. The manager wants to construct a 95% confidence
interval for the population proportion of air travelers who are in favor of the airline’s e-mail idea.
180
4. The lower confidence limit is 01608
Solution
1. Correct
number of successes 80
The sample proportion p 02
Total 400
2. Incorrect
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233
3. Correct
p 1 p 02 1 02
The standard error of the proportion is S p 002
n 400
4. Correct
181 STA1610/1
5. Correct
p 1 p
The confidence interval of the proportion is p Z 2
n
02 00392
01608 02392
Option (2)
Question 2
A random sample of 80 observations results in 50 successes. The 95% confidence interval for the population
proportion of successes is
1. 0625 01060
2. 0536 0714
3. 0622 0628
4. 0519 0731
5. 0591 0371
182
Solution
number of successes 50
The sample proportion p 0625
Total 80
Z 000 001 002 003 004 005 006 007 008 009
30
:
19 00287 00281 00274 00268 00262 00256 00250 00244 00239 00233
p 1 p 0625 1 0625
The standard error of the proportion is S p 00541
n 80
p 1 p
The confidence interval of the proportion is p Z 2
n
0625 01060
0519 0731
Option (4)
183 STA1610/1
Question 3
According to statistics reported on STATSA, a surprising number of motor vehicles are not covered by in-
surance. The results, consistent with the STATSA report, showed 46 of 200 vehicles were not covered by
insurance. The 95% confidence interval estimate for the population proportion is
1. 01716 02884
2. 02317 02283
3. 023 02884
4. 02884 01716
5. 01810 02790
Solution
number of successes 46
The sample proportion p 023
Total 200
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
p 1 p 023 1 023
The standard error of the proportion is S p 00298
n 200
184
p 1 p
The confidence interval of the proportion is p Z 2
n
023 00584
01716 02884
Option (1)
185 STA1610/1
186
Chapter 9
SUMMARY
This statement needs to be substantiated using real data. This is because we do not want to take the
researcher or analyst statement simply like this but we need to collect the data in an attempt to confirm
the hypothesis.
The process of testing these hypotheses using sample data is called statistical hypothesis testing. This
involves the knowledge of probability to come to the conclusion about the stated hypothesis.
A series of five steps are used to determine whether we reject or not a null hypothesis based on a sample
data as follows:
Step 1. To formulate or to provide the formal hypothesis statements called null hypothesis H0 and the
alternative hypothesis H1
187 STA1610/1
Step 2. To determine the appropriate test for the given hypothesis statement. This can be between
one-tail test or two-tail test depending on the problem being considered. This step has to be correctly
specified.
When the symbol: or appear in the alternate hypothesis H1 , we refer to one tail-test.
When the symbol: appears in the alternate hypothesis H1 , we refer to two-tail test.
Step 3. To specify the level of significance denoted by . The level of significance is a fixed probability
of making the error of rejecting the null hypothesis even though it is true. This is specified by the
researcher and represents the degree of accuracy that the test should exhibit. In practice, we use
equals 5%, 10% or 1%.
Step 4. To calculate the relevant test statistic. This is a quantity calculated from a sample data. It is
used to determine if the null hypothesis H0 should be rejected or not.
When the test statistic is greater than the critical value, we will reject the null hypothesis H0
The critical value or the table value, it defines the range of possible values of the test statistic
for which we will reject H0 , otherwise we fail to reject H0
When the p-value is less than the level of significance , we will reject the null hypothesis H0
otherwise we fail to reject H0
p-value is the probability of getting a value of the test statistic as extreme as more extreme than
that observed by chance alone, if the null hypothesis is true.
In making use of the above (1) and (2), our decision will be based on the nature of the alternative
hypothesis (whether it is one-tail or two-tail test).
When the value zero lies between the two confidence limits, we fail to reject H0 and when the
value zero lies outside of the confidence limits, we reject H0 .
Type 1 error occurs when the null hypothesis H0 is rejected whereas it is in fact true.
Type error 2 occurs when the null hypothesis H0 is not rejected whereas it is in fact false.
188
9.1 Hypothesis Testing of the Mean
X
The test statistic when the population standard deviation is known equals to Z
n
X
The test statistic when the sample standard deviation is known equals to t
S
n
To calculate the p-value, we need to know both
the value of the test statistic Z and the level of significance
The calculation of p-value will be based on the nature of the alternative hypothesis H1 (Whether it is
one-tail or two-tail test). In case of a two-tail test, the value of the p-value is multiplied by two.
Activities
Question 1
H1 : 1000 vs H1 : 1000
189 STA1610/1
Solution
H0 : 1000 s H1 : 1000
1. Correct
2. Correct
200
The standard error X 20
n 100
3. Incorrect
4. Correct
The critical value depends on the nature of the alternative hypothesis H1 and level of significance
001
Because of two-tail test : 0005
2 2
Z 000 001 002 003 004 005 006 007 008 009
30
:
25 00062 00060 00059 00057 00055 00054 00052 00051 00049 00048
5. Correct
Option (1)
190
Question 2
H0 : 50 s H1 : 50
Solution
H0 : 50 s H1 : 50
10 n 64 X 535 0025
1. Correct
191 STA1610/1
2. Correct
The critical value depends on the nature of the alternative hypothesis H1 and level of significance
Because of one-tail test : 0025
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
3. Correct
The p-value depends on the nature of hypothesis and the test statistic Z(Because of one-tail test, the
value of p-value is not multiply by two)
p-value = PZ 28 This takes us back to chapter 6.
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
5. Correct
Reject H0 when p-value is less than 0025 Since 00026 0025, H0 is rejected at 5% level of
significance.
Option (4)
192
Question 3
For a sample of 35 items from a population for which the population standard deviation is 205, the
sample mean is x 4580. At the 005 level of significance, the tutor wants to test H0 : 450 against
H1 : 450.
Solution
H0 : 450 s H1 : 450
1. Correct
193 STA1610/1
2. Correct
The critical value depends on the nature of the alternative hypothesis H1 and level of significance
005
Because of two-tail test : 0025
2 2
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
3. Correct
4. Incorrect
The p-value depends on the nature of hypothesis and the test statistic Z(Because of two-tail test, the
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
5. Correct
Reject H0 when p-value is less than 005 Since 00208 005, H0 is rejected at 5% level of
significance.
Option (4)
194
Question 4
A professor of statistics refuses the claim that the average student spends 3 hours studying for the exam.
Which one of the following hypothesis is used to test the claim?
1. H0 : 3 H1 : 3
2. H0 : 3 H1 : 3
3. H0 : 3 H1 : 3
4. H0 : 3 H1 : 3
5. H0 : x 3 H1 : x 3
Solution
Option (2)
Question 5
If the value of the test statistic z equals 087 then the p–value is:
1. 01922
2. 08078
3. 03844
4. 04681
195 STA1610/1
Solution
H0 : 450 s H1 : 450
The p-value depends on the nature of hypothesis and the test statistic Z(Because of two-tail test, the value
of p-value will be multiplied by two)
Option (3)
Question 6
A laboratory tested a random sample of 30 chicken eggs and found that the mean amount of cholesterol
per egg is 235 milligrams and the standard deviation is 20 milligrams. If H0 : 230 is tested against
H1 : 230 at the 5% significant level, with the assumption that the cholesterol of chicken eggs is normally
distributed and suppose that the test statistic Z is equal to 137
1. 00853
2. 09147
3. 08533
4. 06125
5. 01706
196
Solution
H0 : 230 s H1 : 230
The p-value depends on the nature of hypothesis and the test statistic Z
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
Option (5)
Question 7
Which one of the following describe correctly a possible way to do hypothesis testing?
B. Calculate the p–value from the significance level, calculate the test statistic and compare the test sta-
tistic to the p–value.
C. Calculate the test statistic, find the corresponding p–value and compare the p–value to the significance
level.
D. Find the critical Z–value from the significance level, calculate the test statistic and compare the test
statistic to the critical Z–value.
197 STA1610/1
E. Find the critical Z–value from the significance level, calculate the p–value from the critical Z–value,
and compare the p–value to the significance level.
1. B only
2. B and D
3. E only
4. B and E
5. C and D
Solution
A. Incorrect
B. Incorrect
The p-value is calculated from the test statistic and compare the p-value to the level of significance.
C. Correct
D. Correct
p-value is calculated from the test statistic and the critical value is calculated from the level of significance
and we can compare the test statistic and the critical value.
Option (5)
198
Question 8
Solution
Option (5)
Question 9
A bakery stated that the average number of breads sold daily is 3000. An employee thinks that the actual
value might differ from this and wants to test this statement. The correct hypotheses are:
1. H0 : 3000 H1 : 3000
2. H0 : 3000 H1 : 3000
3. H0 : 3000 H1 : 3000
4. H0 : 3000 H1 : 3000
5. H0 : 3000 H1 : 3000
199 STA1610/1
Solution
Option (1)
Question 10
1. 0025
2. 196
3. 0975
4. 062
5. 196
Solution
H0 : 15 s H1 : 15
5 n 10 X 181 003
Option (2)
200
Question 11
A researcher wants to carry out a hypothesis test involving the mean for a sample of size n 18. The popula-
tion standard deviation is unknown, but she is reasonably sure that the underlying population is approximately
normally distributed. The test statistic she should use in carrying out the analysis is:
1. Z–test
2. t–test
3. Binomial test
4. Poisson test
5. Normal test
Solution
Option (2)
p
The test statistic when the population standard deviation is known Z
1
n
201 STA1610/1
Activities
Question 1
Calculate the p–value of the test of the following hypothesis given that the sample proportion p 063 n
100 and the calculated test statistic z 005 The null hypothesis and the alternative hypothesis are
H0 : 060 vs H1 : 060
1. 04801
2. 05000
3. 05199
4. 06915
5. 07088
Solution
H0 : 060 s H1 : 060
The p-value depends on the nature of hypothesis and the test statistic Z . This is one -tail test, the value of
202
The standardized Z normal table
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
p-value = 0.4801
Option (1)
Question 2
H0 : 070 vs H1 : 070
A random sample of 100 produced p 073 with a test statistic of 066, the p-value is:
1. 003
2. 00003
3. 05239
4. 07454
5. 02546
203 STA1610/1
Solution
H0 : 070 s H1 : 070
The p-value depends on the nature of hypothesis and the test statistic Z (This is a one-tail test).
p-value = 0.2546
Option (5)
Question 3
A popular weekly magazine asserts that fewer than 40% of households in South Africa have changed their
lifestyles because of escalating gas prices. A recent survey of 100 households finds that 67 households have
made lifestyle changes due to escalating gas prices.
1. H0 : 040 vs H1 : 040
204
Solution
H0 : 040 s H1 : 040
number o f successes 67
The sample proportion p 100
067
T otal
1. Correct
Because the statement speaks about the fewer of households therefore H1 is using the symbol
2. Correct
3. Correct
The critical value depends on the nature of the alternative hypothesis H0 and level of significance
010
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.0
:
-1.2 0.1151 0.1131 0.1093 0.1075 0.1076 0.1056 0.1038 0.1020 0.1003 0.0985
205 STA1610/1
4. Correct
The p-value depends on the nature of hypothesis and the test statistic Z
5. Incorrect
Rule: Reject H0 when the p-value is less than the level of significance . Since 0.0000 0.10, we
reject H0 at 10% of level of significance.
Option (5)
Question 4
In testing the hypothesis H0 : 040 H1 : 040 at the 5% significance level, if the sample proportion
is 045 and the p–value is 00764 the appropriate conclusion would be:
1. to reject H0
2. not to reject H0
3. to reject H1
Solution
206
Rule : Fail to reject H0 when the p-value is greater than the level of significance . Since 0.0764 0.05,
we fail to reject H0 at 5% of level of significance.
Option (2)
207 STA1610/1
208
Chapter 10
Chi–squared Test
SUMMARY
Chi-squared test is applied when dealing with data that are categorical ( nominal or qualitative) in
nature.
Chi-squared test makes used of a contingency tables for two or more qualitative variables.
The Chi-squared test operates by comparing the observed values in a frequency tables or contingency
tables to the values that we would expect if a given hypothesis is true.
A series of five steps are used to determine whether we reject or not a null hypothesis based on a sample
data as follows:
Step 1. To formulate or to provide the formal hypothesis statements called null hypothesis H0 and the
alternative hypothesis H1
H0 : The two variables are independent or the two variables are not related
H1 : The two variables are dependent or the two variables are related
Step 2. To determine the appropriate degrees of freedom df : The number of rows minus 1 times the
number of column minus 1 denoted by r 1c 1 for the given frequency tables called observed
frequency tables.
209 STA1610/1
Step 3. To specify the level of significance denoted by . The level of significance is a fixed probability
of making the error of rejecting the null hypothesis even though it is true. This is specified by the
researcher and represents the degree of accuracy that the test should exhibit. In practice, we use
equals 5%, 10% or 1%.
Step 4. To calculate the relevant test statistic. This is a quantity calculated from an observed frequency.
It is used to determine if the null hypothesis H0 should be rejected or not.
2
OE 2
The test statistic
E
Step 5. To make a decision by using the test statistic and the critical value.
Rule : When the test statistic is greater than the critical value, we will reject the null hypothesis H0
The critical value or the table value, it defines the range of possible values of the test statistic for which
we will reject H0 , otherwise we fail to reject H0
Activities
Question 1
If a contingency table has 4 rows and 5 columns, how many degrees of freedom are there for the chi–square
2 test for independence?
1. 20
2. 12
3. 15
4. 9
5. 10
210
Solution
Option (2)
Question 2
A B Total
Yes 40 25 65
No 35 45 80
Total 75 70 145
The professor wants to test the independence for the two variables given in columns and in rows at the 5%
level of significance.
5. Suppose that the calculated test statistic 2 45455 the null hypothesis H0 is rejected at 5% level of
significance.
211 STA1610/1
Solution
A B Total
Yes 40 25 65
005
No 35 45 80
Total 75 70 145 n
1. Correct
2. Correct
3. Correct
The critical value is determined by knowing the value of the degrees of freedom and the level of
significance 005
The degrees of freedom df = r 1c 1 2 12 1 1 1 1
Using the Chi-squared tables.
4. Incorrect
65 70
313793
145
5. Correct
Rule : Reject H0 when the value of the test statistic is greater than the critical value. Since 4.5455
3.841, H0 is rejected at 5% level of significance.
Option (4)
212
Question 3
A sport preference poll showed the following data for men and women:
Favorite sport
Gender Basketball Football Golf Tennis Total
Male 24 17 30 18 89
Female 21 20 22 12 75
Total 45 37 52 30 164
Use the 5% level of significance and test to determine whether sport preferences depend on gender.
5. Suppose that the calculated test statistic 2 330, the conclusion is to reject the null hypothesis H0
Solution
1. Correct
2. Correct
3. Correct
213 STA1610/1
4. Correct
5. Incorrect
Rule : Reject H0 when the value of the test statistic is greater than the critical value. Since 330
7815, H0 is not rejected at 5% level of significance.
Option (5)
Question 4
In a test for the independence of two variables, one of the variables has two possible categories and the other
has three possible categories. Suppose that the calculated test statistic is 2 3456 at the 5% level of
significance.
214
Solution
1. Correct
2. Correct
3. Incorrect
4. Correct
5. Correct
Rule : Reject H0 when the value of the test statistic is greater than the critical value. Since 3.456
5.991, H0 is not rejected at 5% level of significance.
Option (3)
215 STA1610/1
Question 5
Solution
Option (2)
Question 6
Two employees (Peter and John) are monitored to determine whether there is any difference in the proportions
of acceptance parts produced by the employees. The sample of parts produced is given below:
5. Suppose the calculated test statistic 2 40 H0 is not rejected at 5% level of significance.
216
Solution
1. Incorrect
The two variables are independent
2. Incorrect
The observed frequency is 685.
3. Incorrect
The Row total of cell Column total of the cell
The expected frequency
Total number n
950 300
285
1000
4. Correct
The degrees of freedom d f r 1c 1 2 12 1 1 1 1
5. Incorrect
Rule : Reject H0 when the value of the test statistic is greater than the critical value.
The critical value is determined by knowing the value of the degrees of freedom and the level of
significance 005
Using the Chi-squared tables
Option (4)
217 STA1610/1
Question 7
Let X level of income and Y political preference. Use the results shown in the table below and test on
a 1% level of significance whether the political preference and the level of income are independent.
Political party
A B C
Level 1 23 11 1
of 2 40 75 31
income 3 16 107 60
4 2 14 10
Suppose that the test statistic calculated is 693875, Which statement is incorrect?
Solution
A B C T otal
1 23 11 1 35
2 40 75 31 146
3 16 107 60 183
4 2 14 10 26
T otal 81 207 102 390 n
218
1. Correct
2. Incorrect
3. Correct
4. Correct
5. Correct
Option (2)
Question 8
The trustee of a company’s pension plan has solicited the opinions of a sample of the company’s employees
about a proposed revision of the plan. A breakdown of the responses is shown in the table below. We want
to test if there is enough evidence to infer that the response differ among the three groups of employees.
Which option provides the correct expected observations in order to calculate the 2 value?
219 STA1610/1
1.
Responses Blue collar White collar Managers
For 11 32 67
Against 9 18 63
2.
Responses Blue collar White collar Managers
For 715 275 11
Against 585 225 9
3.
Responses Blue collar White collar Managers
For 72 27 11
Against 59 22 9
4.
Responses Blue collar White collar Managers
For 71 28 11
Against 58 23 9
5.
Responses Blue collar White collar Managers
For 71 28 11
Against 59 22 9
Solution
Group of employees
Responses Blue collar White collar Managers Total
For 67 32 11 110
Against 63 18 9 90
Total 130 50 20 200
Expected frequencies
Responses Blue collar White collar Managers Total
For 71.5 27.5 11 110
Against 58.5 22.5 9 90
Total 130 50 20 200
220
The calculation of the expected frequencies are as follows using the formula
The Row total of cell Column total of the cell
Total number n
T he Ro total o f For Column total o f the Blue
For Blue collar
T otal number n
110 130
715
200
T he Ro total o f For Column total o f the W hite
For White collar
T otal number n
110 50
275
200
T he Ro total o f For Column total o f the Managers
For Managers
T otal number n
110 20
11
200
T he Ro total o f Against Column total o f the Blue
Against Blue collar
T otal number n
90 130
585
200
T he Ro total o f Against Column total o f the W hite
Against White collar
T otal number n
90 50
225
200
T he Ro total o f Against Column total o f the Managers
Against Blue collar
T otal number n
90 20
9
200
Option (2)
Question 9
Consider the following EXCEL output testing for independence of two variables:
Contingency table
Column 1 Column 2 Column 3 Total
Row 1 93 91 174 358
Row 2 907 909 1826 3642
Total 1000 1000 2000 4000
Chi–squared stat 0331
df 2
p–value 0847
Chi–squared critical 5992
221 STA1610/1
Which one of the following statements is incorrect?
2. Since the p–value 0847 005 the two variables are dependent.
Solution
Contingency tables
Column 1 Column 2 Column 3 Total
Row 1 93 91 174 358
Row 2 907 909 1826 3642
Total 1000 1000 2000 4000
Chi-squared stat 0.331
df 2
p-value 0.847
Chi-squared critical value 5.992
1. Correct
2. Incorrect
When the p-value is greater than the level of significance, we fail to reject H0 and therefore the two
variables are independent.
3. Correct
4. Correct
5. Correct
The Row total of row 1 Column total of col 3 358 2000
The expected frequency 179
Total number n 4000
Option (2)
222
Question 10
A sample of 500 shoppers was selected in a large metropolitan area to determine various information con-
cerning consumer behavior. Among the questions asked was, “Do you enjoy shopping for clothing”. The
results are summarized in the following contingency table:
1. 15766
2. 23547
3. 53214
4. 157662
5. 02135
Solution
Male Female T otal
Y es 12951 232549 35
No 101249 363351 46
T otal 22 59 81 n
2
OE 2
The test statistic
E
OE 2
2 2 12 951 2 23 2549 2 10 1249 2 36 3351 2
E 951 2549 1249 3351
Option (1)
223 STA1610/1
Question 11
1. The expected frequency of buyers above 45 and bought medium car is 396
2. The observed frequency of buyers under the age of 30 and bought large car is 34
5. Suppose that the test statistic 2ST AT is 1835, H0 cannot be rejected at the 5% level of significance.
Solution
224
1. Correct
3. Correct
4. Correct
The critical value is determined by having the value of the degrees of freedom and the level of signifi-
cance 005
5. Incorrect
Since the test statistic = 18.35 is greater than the critical value, we reject H0
Option (5)
225 STA1610/1
Question 12
5. The expected frequency for cell Autumn and Spar is equal to 18648.
Solution
1. Incorrect
H0 :The two variables are independent.
2. Incorrect
H1 :The two variables are dependent.
3. Incorrect
The degrees of freedom d f r 1c 1 2 14 1 1 3 3
4. Incorrect
The critical value is 7815
5. Correct
The expected value is
Option (5)
226
Question 13
Conduct a test to determine whether the two classifications L and M are independent, using the data given in
the table below (use 005).
M1 M2 Total
L1 28 68 96
L2 56 36 92
Total 84 104 188
2
5. Suppose that the test statistic X cal 19094 H0 is rejected at 5% level of significance.
Solution
1. Correct
2. Correct
3. Incorrect
4. Correct
5. Correct
Option (3)
227 STA1610/1
228
Chapter 11
SUMMARY
We explore methods that define possible relationships or association between two interval (or ordinal)
scaled data.
When dealing with two data variables, we can visually explore whether there is association by plotting
a scatter plot of one variable against another variable. This gives us hints at the possible form of
association. The form of association can be linear or non-linear. In this module we will focus on linear
relationship alone.
To calculate the strength of the association we have to calculate the correlation coefficient using a
simple data.
Because we are interested in studying the association between two variables known as the dependent
variable denoted by Y and independent variable denoted by X.
The independent variable provides the basis for calculating the value of the dependent variable.
229 STA1610/1
b0 b1 X
The estimated regression model is given by the equation Y
The least squares provide the formulae for these estimators of b0 and b1
Y X
The formulae are b0 Y b1 X Y X
n n
n XY X Y n X i Yi n1 Xi Yi
b1 2 2 or b1 2 2
n X X n Xi Xi
XY
X Y
n
r
2 2 2
X n X 2
Y n Y
If r is close to zero, this indicates there is a little linear relationship between X and Y . That means
there is no relationship between X and Y .
If r is in the range between 03 to 05, this indicate there a weak linear relationship between X and Y
or a medium relationship between X and Y .
If r is in the range between 05 to close to 1, this indicates a strong relationship between X and Y .
i
The residual or error term ei Yi Y
SS R
The coefficient of determination r 2
SST
2
Y
SS R b0 Yi b1 XY
n
2
Y
SST Yi2
n
230
Activities
Question 1
The manager wants to analyse the relationship between advertising and sales, the manager of a furniture store
recorded the monthly advertising budget (thousand of rands) for a sample of 6 months. The data are presented
below:
Advertising X 23 46 60 54 28 33
Sales Y 96 113 128 98 89 125
1.
y 358 132X
2.
y 1328 358X
3.
y 4066 894X
4.
y 1491 4019X
5.
y 048 023X
Solution
X Y X2 Y2 XY
23 96 529 9216 2208
46 113 2116 12769 5198
60 128 3600 16384 7680
54 98 2916 9604 5292
28 89 784 7921 2492
33 125 1089 15625 4125
Total 244 5365 11034 5605025 232825
Y 5365 X 244 XY 232825 X 2 11034
Y 2 5605025 b1 13176
231 STA1610/1
b0 Y b1 X
Y 5365
Y 894167
n 6
X 244
X 406667
n 6
b0 Y b1 X
358343
b0 b1 X therefore Y
The regression line is Y 358343 13176X
Option (1)
Question 2
A study was conducted to determine the effects of sleep deprivation on people’s ability to solve problems.
The results were obtained as follows:
Number of hours X 8 12 16 20 24
Number of errors Y 86 610 814 1412 1612
1. 05765
2. 10616
3. 1392
4. 63246
5. 22022
232
Solution
b1 05765
Y 86 6010 814 1412 1612 5308
Y 10616
n 5 5
X 8 12 16 20 24 80
X 16
n 5 5
b0 Y b1 X
10616 05765 16
1392
Option (3)
Question 3
Consider a data set containing number of pages and price for n 15 books on a professor’s bookshelf.
233 STA1610/1
Solution
1. Correct
2. Correct
Because r is positive, it means that when the variable X increases, the variable Y increases as well.
3. Correct
4. Incorrect
5. Correct
Option (4)
Question 4
If all the points in a scatter diagram lie on the regression line, then the correlation coefficient r:
1. must be 1
2. must be 1
3. must be either 1 or 1
4. must be 0
234
Solution
Option (3)
Question 5
XY 1082 Y 122 X 39 X 2 341 Y 2 3446
2
SS XY Xi X Yi Y 1304 SS X Xi X 368
321 354X
3. The regression line is Y
354
5. When X 10, then the estimated Y
235 STA1610/1
Solution
1. Incorrect
SS XY 1304
b1 35435
SS X 368
X Y
SS XY XY
n
Y 122
Y 244
n 5
X 39
X 78
n 5
b0 Y b1 X
32393
2. Incorrect
3. Correct
b0 b1 X
The regression line is Y than 321 354X
Y
4. Incorrect
5. Incorrect
Option (3)
236
Question 6
1. If r 086, it implies that the relationship between the two variables examined is strong enough.
2. If r 2 0 70, it implies that 70% of the variation in Y is explained by the regression line.
Solution
1. Correct
2. Correct
3. Incorrect
It can be reliable in the negative way. When one variable increases than the other one decreases.
4. Correct
5. Correct
Option (3)
237 STA1610/1
Question 7
A production manager has compared the dexterity scores of five assembly line employees with their hourly
192 30X If a job
productivity (units per hour). A least-squares regression equation is calculated as Y
applicant has a dexterity score of 15, his predicted productivity per hour will be
1. 192 units
2. 222 units
3. 642 units
4. 150 units
5. 450 units
Solution
192 30X
The estimated equation is Y
Option (3)
Question 8
The following data represent marks obtained in a mathematics test, X and an economics test, Y :
X 22 14 17 7 10
Y 7 17 12 27 22
X 70 Y 85 XY 1005 X 2 1118
238
Suppose that the value of SS XY 185, the slope is equal to
1. 1341
2. 35768
3. 134
4. 138
5. 134
Solution
X Y
SS XY XY
n
2
X
SS X X2
n
702
1118
5
1118 980
138
SS XY 185
The slope equals to b1 13406
SS X 138
Option (5)
239 STA1610/1
Question 9
4. Using the regression line, the estimated value of Y when X 20 equals to 4932
Solution
1. Correct
2. Correct
3. Incorrect
4. Correct
5. Correct
Option (3)
240
Question 10
X: 81 75 71 61 96 56 85 18
Y : 80 82 83 57 100 30 68 56
Use the summary statistics below and calculate the coefficient of determination
X 2 40849 X i 543 Yi2 41922
i
Yi 556 X i Yi 40068 SS R 1359063
1. 04143
2. 06437
3. 05124
4. 01464
5. 04673
Solution
SS R
The coefficient of determination r 2
SST
2
Y
2
SST Y
n
5562
41922
8
41922 38642
3280
SS R 1359063
r2 04143
SST 3280
Option (1)
241 STA1610/1
Question 11
1. The coefficient of correlation r is a number that indicates the direction of the relationship between the
dependent variable Y and the independent variable X
2. The coefficient of correlation r is a number that indicates the strength of the relationship between the
dependent variable Y and the independent variable X
3. If the coefficient of correlation r 1 then the best–fit linear equation will include all the data points.
4. If the coefficient of correlation r 0 then there is a linear relationship between the dependent variable
y and the independent variable X
5. If the coefficient of determination r 2 081 the coefficient of correlation, r can be 090 or 090
Solution
1. Correct
2. Correct
3. Correct
4. Incorrect
5. Correct
When r 2 081 than r 081 09 and r 09.
Option (4)
242
Question 12
X 12 23 11 23 14
Y 28 43 21 40 33
Suppose that b0 96 b1 141 X 83 Y 165 XY 2938 and SST 318 then the
coefficient of determination is
1. 094
2. 089
3. 066
4. 078
5. 049
Solution
SS R
The coefficient of determination r 2
SST
2
Y
SS R b0 Y b1 XY
n
1652
96 165 141 2938
5
1584 414258 5445
28158
SS R 28158
r2 08855
SST 318
Option (2)
243 STA1610/1
Question 13
The following are the number of minutes it took mechanics to assemble a piece of machinery in the morning,
X and in the afternoon, Y :
X 111 120 137 173 148 153
Y 142 215 211 193 190 174
1. 0238
2. 01667
3. 0057
4. 0488
5. 078
Solution
SS R 2112
r2 00587
SST 35975
Option (3)
244
Question 14
2. If most of the points fall close to the line, we say that there is a linear relationship.
3. If one variable increases when the other does, we say that there is a negative linear relationship.
4. The objective addressed by the model is to analyze the relationship between two variables, X and Y .
Solution
1. Correct
2. Correct
3. Incorrect
4. Correct
5. Correct
Option (3)
245 STA1610/1