You are on page 1of 131

1 of 131

STATISTICS

INTRODUCTION:
In its modern meaning, Statistics can be understood as the application of scientific methods to inductive
research.
The application of statistical methods in the textile field began shortly before the Second World War.
The manufacture of textile is largely a system of mass production. For an effective quality control system for
a mass production unit, Statistical Techniques must analyze the data collected by testing laboratory. The
analysis can involve either simple techniques such as average, standard deviation, co-efficient of variation or
advanced methods such as variance, correlation, regression etc. Simple analysis is common practice in the
quality control section of all textile mills particularly for commodity textiles. Advanced statistical analysis is
used for detailed and more accurate interpretation of test data. Advanced analysis is often used for value
added technical textiles.
All forms of textiles have measurable qualities. These qualities are extremely important because they have a
profound effect upon the performance characteristics of finished product.
In the last few decades, the role of Statistical Methods has increasingly become to be seen as a body of
propaedeutic knowledge for the most varied fields of industrial technique. It has expanded from the original
sectors of Quality Control and Market Research to embrace more general problems of process management
in conditions of uncertainty (Phenomena of wear, reliability, allocation and interference of machinery, etc.).
Statistical methods also represent an indispensable aid in planning and the correct interpretation of
experiments and thus for applied research.
In particular, many specific applications have been developed in the textile sector. For some time now, both
the physical methods of measurements on fibres and theory of fibre structure are firmly based on principles
of Mathematical Statistics.
What is Statistics?
“Statistics is the science that deals with the collection, analysis, and interpretation of numerical
information”

“The science of collecting and analyzing data for the purpose of drawing conclusions and making
decisions.”
The Growth and Development of Modern Statistics:
Historically, the growth and development of modern statistics can be traced separate phenomena-the needs
of government to collect data on its citizenry, the development of mathematics of probability theory, and the
advent of the computer.
Data have been collected throughout recorded history. During the ancient Egyptian, Greek, and Roman
civilizations, data were obtained primarily for the purposes of taxation and military conscription. In the
middle Ages, church institutions often kept records concerning birth, deaths, and marriages. In America,
various records were kept during colonial times and beginning in 1790, the federal Constitution required the
taking of a census every ten years. In fact, the expanding needs of census helped spark the development of
tabulating machines at the beginning of the twentieth century. This led to the development of large-scale
mainframe computers and eventually to the personal computer revolution.
These developments have profoundly changed the field of statistics in the last 30 years. Mainframe packages
such as SAS and SPSS became popular during the 1960s and 1970s. during the 1980s, statistics software
experienced a vast technical revolution. Besides the usual improvements manifested in periodic updates, the
availability of personal computers led to the development of new packages. In addition, personal computer
versions of existing packages such as SAS, SPSS, and Minitab quickly became available, and the increasing
use of popular spreadsheet packages such as Lotus-1-2-3 and Microsoft Excel led to the incorporation os
statistical features in these packages.

Descriptive and Inferential Statistics:


Statistics as a subject may be divided into descriptive statistics and inferential statistics.
Descriptive Statistics:
2 of 131
In descriptive Statistics, techniques are provided for processing raw numerical data into usable forms.
These techniques include methods for collecting, organizing, summarizing, describing and presenting
numerical information.
Descriptive Statistics includes the chapters; the collecting, organizing, graphing, describing of numerical
data, probability theory and sampling theory.
Inferential Statistics:
Inferential Statistics is the study of procedures by which we draw conclusions and make decisions about a
population on the basis of a sample. Inferential Statistics includes the chapters: Estimation and Hypothesis
Testing

Population (or Universe):


Population is a totality of items or observations under consideration.
E. g. all students at NTU
Sample:
Sample is a portion of the population that is selected for analysis. E. g. students in this room.

Why Statistical Sampling?

Sampling is the selection of part of an aggregate or totality known as population, on the basis
of which a decision concerning the population is made.
The following are the advantages and/or necessities for sampling in statistical decision-
making:

1. Cost: Cost is one of the main arguments in favor of sampling, because often a sample
can furnish data of sufficient accuracy and at much lower cost than a census.
2. Accuracy: Much better control over data collection errors is possible with sampling
than with a census, because a sample is a smaller-scale undertaking.
3. Timeliness: Another advantage of a sample over a census is that the sample produces
information faster. This is important for timely decision making.
4. Amount of Information: More detailed information can be obtained from a sample
survey than from a census, because it take less time, is less costly, and allows us to
take more care in the data processing stage.
5. Destructive Tests: When a test involves the destruction of an item under study,
sampling must be used. Statistical sampling determination can be used to find the
optimal sample size within an acceptable cost.

Parameter:
Any numerical value calculated from an entire population is called parameter. Parameter is a constant
value it is represented by Greek Letters. e. g. proportion opposed to war
Statistic:
Any numerical value calculated from a sample is called statistic. Or numerical function of sample used to
estimate population parameter. Statistic varies from sample to sample that why it may call a variable.
Statistics are denoted by English alphabet.
Variable:
A property or attribute of each unit, e. g age, height
Observation:
Values of all variables for an individual unit
Precision:
Spread of estimator of a parameter
Accuracy:
How close estimator is to true value - opposite of
Bias:
Systematic deviation of estimate from true value. Numerical function of sample used to estimate
population parameter.
Precision:
Spread of estimator of a parameter
3 of 131
Accuracy:
How close estimator is to true value - opposite of
Bias:
Systematic deviation of estimate from true value
Importance of Statistics:
Statistics is perhaps a subject that is used by everybody. The following functions and uses of statistics in
most diverse fields serve to indicate its importance.
i. Statistics assists in summarizing the larger sets of data in a form that is easily understandable.
ii. Statistics assists in the efficient design of laboratory and field experiments as well as surveys.
iii. Statistics assists in a sound and effective planning in any field of inquiry.
iv. Statistics assists in drawing general conclusions and in making predictions of how much of a
thing will happen under given conditions.
v. Statistics techniques being powerful tools for analyzing numerical data are used in almost
every branch of learning. In the Textile, Biological and Physical Sciences, Genetics,
Agronomy, Anthropometry, Astrnomy, Physics, Geology, etc. are the main areas where
statistical techniques have been developed and are increasingly used.
vi. A businessman, an industrialist and a research worker all employ statistical methods in their
work. Banks, Insurance companies and Governments all have their statistics departments.
vii. A modern administrator whether in public or private sector, leans on statistical data to
provide a factual basis for decision.
viii. A politician uses statistics advantageously to lend support and credence to his arguments
while elucidating the problems he handles.
ix. A social scientist uses statistical methods in various areas of socio-economic life of a nation.
It is sometimes said “A social scientist without an adequate understanding of statistics, is
often like a blind man groping in a dark room for a black cat that is not there”.
Statistical Studies:
Descriptive:
One group, e. g. survey, poll
Comparative:
more groups, e. g. compare effectiveness of different teaching methods. (Textile)
Experimental:
Investigator actively intervenes to control study conditions. Look at relationship between predictor
(explanatory) and response (outcome) variables Establish causation, e. g. drug trial
Observational:
Investigator records data without intervening. Difficult to distinguish effects of predictors and
confounding
variables (lurking variables). Establish association, e. g. Framingham Heart Study

Summation Notation: 
The notation  is called summation notation, and is a symbolic representation of the series:
x1+ x2+ x3+  + xn. The symbol  is the capital letter sigma in Greek, and it indicates that the
sequence function to its right should be summed. If xi represents a measurement of variable X, then
in statistics an entire set of n such measurements is typically summed from x1 to xn. This series is
n

x
i =1
i
indicated by . However, where it is clear that it is the entire set being summed, the lower and
n

x x x
i =1
i
i
upper limits of summation are often omitted. When this is clear, then = = .
Examples:
4

 ( x − 1) = ( 2 −1) + ( 3 −1) + (4 −1)


x=2
=6
4 of 131
5

 3x = 3 ( 2) + 3 ( 3) + 3 (4) + 3 (5) = 42
x=2
3

(x − a ) = ( x
i =1
i 1 − a ) + ( x2 − a) + ( x3 − a).
2

(x
i =1
− i +1) = ( x −1+1) + ( x − 2 +1) = 2 x −1

Useful Theorems Relating to Sums:


Consider the summation
3

5
x =1
The typical element is 5 and it does not change. The sequence is therefore 5, 5, 5…,
3
And  5 = 5 + 5 + 5 =15
x =1
n
Theorem: Let c be a constant and x be the variable of summation. Then,  c = nc
x =1
7
e.g.  3a = 6 (3a ) =18a
x=2
n n
Theorem: Let c be a constant. Then  cxi = c  xi
x =1 i =1
3 3
e.g.  ( 3x − 5 ) = 3  x − 3(5)
x =1 i =1
= 3 (1 + 2 + 3) – 15
= 18 – 15
=3
n n n n
Theorem:  ( xi  yi  zi ) =  x   yi   zi
i =1 i =1 i =1 i =1
i
4 4 4 4
e.g. Equation Section 1  ( x 2 + ax + 5) =  x2 +  ax +  5
x =1 i =1 i =1 i =1
4 4
= xi =1
2
+ a x + 4(5)
i =1
= (1 + 4 + 9 + 16) + a (1 + 2 + 3 + 4 ) + 20
= 50 + 10a

4 4 4

( X
i =1
2
+ 3i) =  X 2 +  3i
i =1 i =1
4
= 4 X 2 + 3i
i =1
= 4X2 + 3(1+2+3+4)
= 4X2 + 30
n m

 x
i =1 j =1
ij means that we first sum over the subscript j, using the theory for single summation and we

perform the second summation by allowing I to


Exercise # 1
1. Write each of the following as summations:
a) x 21 + x 22 + x 23 + + x 210
b) y1 + y2 + + y12
c) y5 z5 + y6 z6 + y7 z7 + + yn z n
5 of 131
d) 2a1 + 2a2 + 2a3 + + 2an
e) (a1 − b1 ) + (a2 − b2 ) + + (an − bn )
f) (a32 )b5 + (a42 )b6 + (a52 )b7 + + (a102 )b12
2. Evaluate the following summations
3 3 3 4
a)  X i3
i =1
b) 6
i =1
c)  (1+ 3x + x2 )
x =2
d)  ( x + xy
x =1
2
)
5 12 5 n
e)  (x 2
+ 2i) f)  ( x − i) i g)  (x
y =0
2
+ y2 ) h)  ( x − a) i
i =1 i =4 i =1
5 n 6
i2 50
i)  (x
x =0
3
+ 2ix) j)  ( x − a)
i =1
i
2
k)  ( + 3i 2 )
i =1 4
l) x
x =1
3

7 3
m) Evaluate  a
j =3 i = 0
2
ij

3. Let q + p = 1 then show that

( )p q
n
n− x
a)
n
x
x
=1
x =0

 x( ) p q
n
n− x
b)
n
x
x
= np
x =0

 x( x −1) ( ) p q
n
n− x
c)
n
x
x
= n(n − 1) p 2
x =0
n
a(1 − r n )
d)  ar
x =1
x −1
=
1− r

Types of Data:
Data: A Set of counts or measurements
Character
Nominal, e. g. color: red, green, blue
Binary e. g. (M, F), (H, T), (0,1)
Ordinal, e. g attitude to war: agree, neutral disagree
Numeric
Discrete, e. g. number of children
Continuous. e. g. distance, time, temperature
also:
Interval, e. g. Fahrenheit temperature
Ratio (real zero), e. g distance, number of children
There is basically two types of random variables, which can be studied that the observed outcomes or
data: categorical and numerical. Categorical random variable yield categorical responses, while
numerical random variables yield numerical responses. For example, the response to the question
“Do you currently own U. S. Government Saving Bonds?” is categorical. The choices are limited to
“yes” or “no”. On the other hand, responses to questions such as ‘To how many magazines do you
currently subscribe?” or “How tall are you?” are clearly numerical. In the first case, the numerical
random variable may be constructed as discrete, while in second case, it can be thought of as
continuous.
Discrete data: are numerical responses that arise from a counting process, while continuous data
are numerical responses that arise from a measuring process. “The number of magazines subscribed
to” is an example of a discrete numerical variable, since the response takes on one of a (finite)
number of integers. On the other hand, “The height of an individual” is an example of a continuous
numerical variable, since the response can take on any value within a continuum or interval,
depending on the precision of the measuring instrument. For example, a person whose height is
6 of 131
1 7 58
reported as 67 inches may measured as 67 inches, 67 inches, or 67 inches if more
4 32 250
precise instrumentation is available. Therefore, we can see that height is a continuous phenomenon
that can take on any value within an interval.

Categorical Variable Categories


Yes No
Automobile ownership
Type of life insurance owned Term Endowment Straight-Life
Democratic Republican Independent
Political party affiliation
Other
Very Unsatisfied Fair Unsatisfied
Product satisfaction
Neutral Fairly Satisfied Very Satisfied
Student grades A B C D E F

Numerical Variable Level of Measurement


Temperature (in degrees Celsius or Fahrenheit) Interval
Calendar time (Gregorian, Hebrew, or Islamic) Interval
Height (in inches or centimeters) Interval

GRAPHICAL REPRESENTATION:
Tabulation is a good method of condensing and representing data in a readily understandable form,
but many people have no taste for figures. They would prefer a way of representation where figures
could be avoided. This purpose is achieved by the presentation of statistical data in a visual form.
The visual display of statistical data in the form of points, lines, areas and other geometrical forms
and symbols, is the most general terms known as Graphical Representation. Statistical data can be
studied with this method without going through figures, presented in the form of tables.
Such visual representation can be described in the sections that follow. The basic difference between
a graph and a diagram is that a graph is a representation of data by a continuous curve, usually
shown on a graph paper while a diagram is any other one, two or three-dimensional form of visual
representation

DIAGRAMS:
Diagrammatic representation is best suited to spatial series and data split into different categories.
Whenever a comparison of the same type of data at different places is to be made, diagrams will be
the best way to do that. Diagrammatic representation has several advantages over tabular
representation of figures. Beautifully and neatly constructed diagrams are more attractive than simple
figures. Diagrams, being a visual display, leave more effective and long lasting impression on the
mind of a reader. They make unwieldy data intelligible at a glance. Comparison is made easier with
diagrams. Diagrams have some disadvantages too. Diagrams are less accurate than tables; cost
money and time and the amount of information conveyed is limited. However, this method of
representation is excessively used in business and administration.
Different types of diagrams or charts commonly used for displaying statistical data are described
below:

1) Linear or One-Dimensional Diagrams: They consist of Simple Bars, Multiple Bars and
Component Bar charts. Here the values are represented only by one dimension, generally the length
of the bar.
2) Areal or Two-Dimensional Diagrams: They consist of Rectangles, Sub-divided Rectangles and
Squares, the areas of which are proportional to the values of given quantities. This device is used to
represent data having moderately large variations.
3) Cubic or Three-Dimensional Diagrams: They are in the form of Cubes and Cylinders, whose
volumes are proportional to the values they represent. These diagrams are used when the variation
7 of 131
among the values of the data to be portrayed is so large they even the square roots of the values
concerned fail to reduce the variation appreciably.
4) Pi-Diagrams: They are in the form of circles and sectors. Here the areas of Circles or Sectors
are in proportion to the values they represent or compare.
5) Pareto-Diagrams:
6) Pictograms: They consist of pictures or small symbolic figures representing the statistical data.

Simple Bar Chart: Simple bar diagrams are made to represent geographical, historical, numerical
and the qualitative data. The vertical or horizontal bars are made to represent the data when the
difference between different quantities is not very large. The different quantities may be arranged in
ascending or descending order but the time series data (A time series consists of numerical data
collected, observed or recorded at more or less regular intervals of time each hour, day, month, quarter
or year.) are not arranged.

Suggestions for Constructing Bar Chart:


1. The bars should be constructed horizontally when the categorized observations are the
outcomes of a categorical variable. The bars should be constructed vertically when the
categorized observations are outcomes of a numerical variable.
2. All bars should have the same width so as not to mislead the reader. Only the length
should differ.
3. Spaces between bars should range from one-half the width of a bar to the width of a
bar
4. Scales and guidelines are useful aids in reading a chart and should be included. The
zero point or origin should be indicated.
5. The axes of the chart should be indicated.
6. Any “keys” to interpreting the chart may be included within the body of the chart or
below the body of the chart.
7. Footnotes or source notes, when appropriate, are presented after the title of the chart
or at the bottom edge of the chart’s frame.

Example: Construct a simple bar diagram for the given data

School Location Number of Schools Percentage of Schools


Rural 8 10.0
Suburban 23 28.8
Urban 49 61.2
Totals 80 100.0
8 of 131

Urban 61.2

Suburban 28.8

Rural 10.0

0 10 20 30 40 50 60 70
Percentage of Schools

Example: Draw simple bar diagram to frequency distribution of all faults present on 70 pieces of the
article mentioned in the following table

No. of faults Frequency f%


0 8 11.4
1 10 14.3
2 15 21.4
3 12 17.2
4 10 14.3
5 9 12.9
6 4 5.7
7 1 1.4
8 0 0
9 1 1.4

Frequency Distribution of faults

20
15
Frequency

15 12
10 10 9
10 8
4
5 1 1
0
0
1 2 3 4 5 6 7 8 9 10
No. of faults

Example: Construct the bar diagram for the following data

flaw f f%
missing weft 4 25.00
incorrect interweaving 1 6.25
hole 2 12.50
group of broken threads 9 56.25
9 of 131

group of broken threads 9

hole 2

incorrect interweaving 1

missing weft 4

0 2 4 6 8 10

Example: Draw a simple bar diagram to represent the turnover of a company for 5 years
Years: 1965 1966 1967 1968 1969
Turnover (Rupees): 35,000 42,000 43,500 48,000 48,500

Bar Diagram Show ing the Turnover of a Com pany for 5


Years

60
48 48.5
50 42 43.5
40 35
Thousand

30
20
10
0
1965 1966 1967 1968 1969
Year

Multiple Bar Chart: A multiple bar chart shows two or more characteristic
corresponding to the values of a common variable in the form of grouped bars, whose lengths
are proportional to the values of the characteristics, and each of which is shaded or coloured
differently to aid identification. This is a good device for the comparison of two or more
kinds of information. For example, imports, exports and productions of a country can be
compared from year to year by grouping the three bars together.
Example: Draw multiple bar charts to show the area and production of cotton in the
Punjab from the following data:

Year Area (000 acres) Production (000 bales)


1965-66 2866 1588
1970-71 3233 2229
1975-76 3420 1937
10 of 131

Area & Production of Cotton in the


Punjab

3420
3233
4000

2866
Production

2229
3000

1937
1588
2000 Area
Production
1000
0
1965-66 1970-71 1975-76
Year

Component Bar Chart: A component bar chart is an effective technique in which


each bar is divided into two or more sections, proportional in size to the component parts of a
total being displayed by each bar. The various component parts shown as sections of the bar
are shaded or coloured differently to increase the overall effectiveness of the diagram. The
component bar charts are used to present the cumulation of the various components of data
and the percentages. They are also known as sub-divided bars.

Example: Draw a component bar chart for the following data

Division Both Sexes Male Female


Peshawar 64 33 31
Rawalpindi 40 21 19
Sargodha 60 32 28
Lahore 65 35 30

Component Bar Chart Showing Population of 4


Divisions

80
60 Female
31 28 30
40 Male
19
20 33 32 35
21
0
Peshawar Rawalpindi Sargodha Lahore
Example: Place the data for female and male hair colour together in a component
and multiple parts frequency bar chart and a consecutive-parts frequency bar chart.

Colour Female Male


Black 3 10
Blonde 8 14
Brown 4 20
Red 1 4
11 of 131
Component Bar Chart
30
25

Frequency
20 Male
15 14 20 Female
10 10
5 8 4
3 4 1
0
Black Blonde Brown Red
Colour

Rectangles and Sub-divided Rectangles: The area of a rectangle is equal to the


product of its length and breadth. To represent a quantity by a rectangle, both length and
breadth of the rectangle are used. Sub-divided rectangles are drawn from the data where the
quantities generally drawn to compare the budgets of various families. In the construction of
sub-divided rectangles, we are required to
i. Change each component into the percentage of the corresponding total,
ii. Draw one rectangle for each total, taking equal lengths (100 units) and breadths
proportional to the totals,
iii. Divide every rectangle so drawn into parts equal in number to the number of
components. Each part shaded or coloured will represent percentage size of one
component.

Example: Compare the budgets of families A and B with a suitable diagram.

Items of Expenditure Family A Family B


Food 24 60
Clothing 4 14
House Rent 4 16
Education 3 6
Litigation 2 10
Conventional Needs 1 6
Miscellaneous 2 8
Total 40 120

The necessary computations required for the drawing of sub-divided rectangles are given
below and the diagram is shown:

Family A Family B
Items of
Actual Percentage Actual Percentage
Expenditure
Expenses Expenses Expenses Expenses
Food 24 60.0 60 50.0
Clothing 4 10.0 14 11.7
House Rent 4 10.0 16 13.3
Education 3 7.5 6 5.0
Litigation 2 5.0 10 8.3
Conventional 1 2.5 6 5.0
12 of 131
Miscellaneous 2 5.0 8 6.7
Total 40 100.0 120 100.0

Diagram from p-34 of Ch. Sher M.

Pictographs: A pictograph or pictogram is a form of bar graph in which stylized,


easily recognizable figures are used in place of rectangular bars.

Example: The following table shows the number of employees in a certain Textile Mills.
Represent the data by means of a pictogram.

Year No. of Employees


1950 2,004
1955 2,990
1960 4,240
1965 5,380

Representing 1,000 employees by one picture, the pictogram is drawn below

Fig from Ch. Sher M. p-35

Pie Diagrams: A pie-diagram, also known as sector diagram, is a graphic device


consisting of a circle divided into sectors or pie-shaped pieces whose areas are
proportional to the various parts into which the whole quantity is divided. The sectors are
shaded or coloured differently to show the relationship of parts to the whole.
Procedure for construction of pie chart: Draw a circle of any convenient radius. As
a circle consisting of 360 , the whole quantity to be displayed is equated to 360. the
proportion that each component part or category bears to the whole quantity will be the
corresponding proportion of 360 . These corresponding proportions, i.e. angles, are
calculated by
component part
Angle =  3600
wholequantity
Then divided the circle into different sectors by constructing angles at the centre by
means of a protector and draw the corresponding radii.
Example: Represent the total expenditure and expenditures on various items of a family
by pie diagram.
13 of 131
Items: Food Clothing House Rent Fuel and Light Misc.
Expenditure:
50 30 20 15 35
(in Rs.)
The corresponding angles needed to draw the chart are computed below.
Items Expenditure (in Rs.) Angles of sectors (in degrees)
Food 50 50
 360 = 120
150
Clothing 30 72
House Rent 20 48
Fuel and Light 15 36
Miscellaneous 35 84
Total 150 360

Pi-Chart

Misc.
23% Food
34%
Fuel and
Light
10%
Clothing
House Rent 20%
13%

Profit and Loss Chart: This is virtually a percentage component bar chart in which
profits can be shown above the normal base line and losses below the base line. Since the
bars are to be extended from the zero line to show losses, we start from the top. For an
illustration, the following data are represented:

Cost, Proceeds, Profit or Loss per Chair


Particulars 1960 1970
(i) Materials Rs. 10 Rs. 16
(ii) Wages 6 8
(iii) Polishing, etc. 2 4
Total cost 18 28
Proceeds 20 25
Profit (+) or loss (-) +2 -3

Profit and Loss Chart

Diagram from p-37 ch. Sher M.

The Stem-and-Leaf Display:


A stem-and-leaf display separates data entries into “leading digits” or “stem” and “trailing digits” or
“leaves.” For example, since the annual cost (in $000) in the private institution data set all have two-
digit integer numbers, the tens and units columns would be the leading digits, and the remaining
column (the tenths column) would be the trailing digit. Thus, an entry of 26.4 (corresponding to
$26,400) has a stem of 26 and a trailing digit or leaf of 4.
14 of 131
The following figure depicts the stem-and-leaf display of the annual cost of attending the 50
sampled private colleges and universities.

Steps to follow in constructing a Stem and Leaf Display

1. Divide each observation in the data set into two parts, the Stem and the Leaf.
2. List the stems in order in a column, starting with the smallest stem and ending with the
largest.
3. Proceed through the data set, placing the leaf for each observation in the appropriate stem
row.

Depending on the data, a display can use one, two or five lines per stem. Among the different
stems, two-line stems are widely used.

Example 2.5 The quantity of glucose in blood of 100 persons is measured and recorded in
Table2.0b (unit is mg%). Using SPSS we obtain the following Stem-and-Leaf display for this data
set.

70 79 80 83 85 85 85 85 86 86
86 87 87 88 89 90 91 91 92 92
93 93 93 93 94 94 94 94 94 94
95 95 96 96 96 96 96 97 97 97
97 97 98 98 98 98 98 98 100 100
101 101 101 101 101 101 102 102 102 103
103 103 103 104 104 104 105 106 106 106
106 106 106 106 106 106 106 107 107 107
107 108 110 111 111 111 111 111 112 112
112 115 116 116 116 116 119 121 121 126
Table 2.0b Quantity of glucose in blood of 100 students (unit: mg%)
15 of 131
GLUCOSE

GLUCOSE Stem-and-Leaf Plot

Frequency Stem & Leaf

1.00 Extremes (=<70)

1.00 7 . 9

2.00 8 . 03

11.00 8 . 55556667789

15.00 9 . 011223333444444

18.00 9 . 556666677777888888

18.00 10 . 001111112223333444

16.00 10 . 5666666666677778

9.00 11 . 011111222

6.00 11 . 566669

2.00 12 . 11

1.00 Extremes (>=126)

Stem width: 10

Each leaf: 1 case(s)

The stem and leaf display of Figure partitions the data set into 12 classes corresponding to 12
stems. Thus, here two-lines stems are used. The number of leaves in each class gives the class
frequency.

Advantages of a stem and leaf display over a frequency distribution (considered in the next
section):

1. the original data are preserved.


2. a stem and leaf display arranges the data in an orderly fashion and makes it easy to
determine certain numerical characteristics to be discussed in the following chapter.
3. the classes and numbers falling in them are quickly determined once we have selected the
digits that we want to use for the stems and leaves.

Disadvantage of a stem and leaf display:


Sometimes not much flexibility in choosing the stems.
16 of 131
Ordered array of the annual cost (in $000) per student attending 50 private colleges and
universities
17.5 17.6 17.6 18.0 18.8 18.9 19.5 19.8 20.6 20.8
20.8 21.3 21.6 21.8 22.0 22.2 22.9 23.4 23.4 23.7
23.8 24.2 24.3 24.4 24.8 25.4 25.6 25.7 26.4 26.4
26.4 26.6 26.7 26.7 26.8 26.8 26.8 27.3 27.5 27.5
27.6 27.8 27.8 27.9 27.9 28.6 28.6 28.8 28.9 28.9
The column of numbers to the left of the vertical line is called the “stem.” These numbers
correspond to the leading digits of the data. In each row the “leaves” branch out to the vertical line,
and these entries correspond to trailing digits

17 665
18 809
19 85
20 688
21 836
22 209
23 4487
24 4238
25 467
26 488647478
27 98965853
Pareto Diagram 28 96869

Definition: A bar graph used to arrange information in such


a way that priorities for process improvement can be established.

A Pareto diagram is used to determine what characteristic is the major contributor in a


process. The diagram is constructed by ranking the data in frequency of occurrence and
plotting the bars in descending order.

Purposes:

To display the relative importance of data.


To direct efforts to the biggest improvement opportunity by highlighting the vital
few in contrast to the useful many.

Pareto diagrams are named after Vilfredo Pareto, an Italian sociologist and economist, who
invented this method of information presentation toward the end of the 19th century. The
chart is similar to the histogram or bar chart, except that the bars are arranged in decreasing
order from left to right along the abscissa. The fundamental idea behind the use of Pareto
diagrams for quality improvement is that the first few (as presented on the diagram)
contributing causes to a problem usually account for the majority of the result. Thus,
targeting these "major causes" for elimination results in the most cost-effective improvement
scheme.

How to Construct:
17 of 131
1. Determine the categories and the units for comparison of the data, such as frequency, cost,
or time.
2. Total the raw data in each category, then determine the grand total by adding the
totals of each category.
3. Re-order the categories from largest to smallest.
4. Determine the cumulative percent of each category (i.e., the sum of each category
plus all categories that precede it in the rank order, divided by the grand total and
multiplied by 100).
5. Draw and label the left-hand vertical axis with the unit of comparison, such as
frequency, cost or time.
6. Draw and label the horizontal axis with the categories. List from left to right in rank
order.
7. Draw and label the right-hand vertical axis from 0 to 100 percent. The 100 percent
should line up with the grand total on the left-hand vertical axis.
8. Beginning with the largest category, draw in bars for each category representing the
total for that category.
9. Draw a line graph beginning at the right-hand corner of the first bar to represent the
cumulative percent for each category as measured on the right-hand axis.
10. Analyze the chart. Usually the top 20% of the categories will comprise roughly 80%
of the cumulative total.

Tips:

Create before and after comparisons of Pareto charts to show impact of


improvement efforts.
Construct Pareto charts using different measurement scales, frequency, cost or time.
Pareto charts are useful displays of data for presentations.
Use objective data to perform Pareto analysis rather than team members opinions.
If there is no clear distinction between the categories -- if all bars are roughly the
same height or half of the categories are required to account for 60 percent of the
effect -- consider organizing the data in a different manner and repeating Pareto
analysis.
Pareto analysis is most effective when the problem at hand is defined in terms of
shrinking the PV to a customer target. For example, reducing defects or elimination
the non-value added time in a process.

Exercise:

Construct a Pareto diagram from the data given in the table below.

Category Frequency Percent of total Cumulative %


Wrong dose 100 50 50
Wrong time 70 35 85
Wrong medicine 15 7.5 92.5
Wrong patient 8 4 96.5
Medicine dc'd 4 2 98.5
Missed dose 3 1.5 100
Grand Total 200 100% 100%
18 of 131

Error Category Frequency Percent of Total


Punctuation 22 44%
Grammar 15 30%
Spelling 10 20%
Typing 3 6%
TOTAL 50 100%

HISTOGRAM
A Histogram is used to display in bar graph format measurement data distributed by categories.
A HISTOGRAM IS USED FOR:

1. Making decisions about a process, product, or procedure that could be improved after examining the
variation (example: Should the school invest in a computer-based tutoring program for low achieving
students in Algebra I after examining the grade distribution?; are more shafts being produced out of
specification that are too big rather than too small?)
19 of 131
2. Displaying easily the variation in the process (example: Which units are causing the most difficulty
for students?; is the variation in a process due to parts that are too long or parts that are too short?)

STEPS IN CONSTRUCTING A HISTOGRAM:

1. Gather and tabulate data on a process, product, or procedure. This could be time, weight, size,
frequency of occurrences, test scores, GPA's, pass/fail rates, number of days to complete a cycle,
diameter of shafts built, etc.
2. Calculate the range of the data by subtracting the smallest number in the data set from the largest.
Call this value R.
3. Decide about how many bars (or classes) you want to display in your eventual histogram. Call this
number K. This number should never be less than four and seldom exceeds 12. With 100 numbers,
K=7 generally works well. With 1000 pieces of data, K=11 works well.
4. Determine the fixed width of each class by dividing the range, R by the number of classes K. This
value should be rounded to a "nice" number, generally a number ending in a zero. For example 11.3
would not be a "nice" number. 10 would be considered a "nice" number. Call this number i, for
interval width. It is important to use "nice" numbers else the histogram created will have wierd scales
on the X axis.
5. Create a table of upper and lower class limits. Add the interval width i to the first "nice" number less
than the lowest value in the data set to determine the upper limit of the first class. This first "nice"
number becomes the lowest lower limit of the first class. The upper limit of the first class becomes
the lower limit of the second class. Adding the internal width (i) to the lower limit of the second class
determines the upper limit for the second class. Repeat this process until the largest upper limit
exceeds the biggest piece of data. You should have approximately K classes or categories in total.
6. Sort, organize, or categorize the data in such a way that you can count or tabulate how many pieces
of data fall into each of the classes or categories in your table above. These are the frequency counts
and will be plotted on the Y axis of the histogram.
7. Create the framework for the horizontal and vertical axes of the histogram. On the horizontal axis
plot the lower and upper limits of each class determined above. The scale on the vertical axis should
run from zero to the first "nice" number greater than the largest frequency count determined above.
8. Plot the frequency data on the histogram framework by drawing vertical bars for each class. The
height of each bar represents the number or
9. Frequency of values occurring between the lower and upper limits of that class.
10. Interpret the histogram for skew and clustering problems:

Interpreting skew problems:


Data may be skewed to the left or right. If the histogram shows a long tail of data on the left side of
the histogram, the data is termed left or negatively skew. If a tail appears on the right side, the data is
termed right or positively skew. Most process data should not typically appear skew. Data that is
seriously skew either to the left or right may be an indication that there are inconsistencies in the
process or procedures, etc. Decisions may need to be made to determine the appropriateness of the
direction of the skew.
It should be noted, however, that some process data is, by its very nature, skew. This situation occurs
in arrival processes (for example, people arriving at a McDonalds within a fixed unit of time) and in
service processes (for example, the time it takes to wait on a customer in a bank).
Interpreting clustering problems:
Data may be clustered on opposite ends of the scale or display two or more peaks indicating serious
inconsistencies in the process or procedure or the measurement of a mixture of two or more distinct
groups or processes that behave very differently.

EXAMPLE
The data below are the spelling test scores for 20 students on a 50 word-spelling test. The scores
(number correct) are: 48, 49, 50, 46, 47, 47, 35, 38, 40, 42, 45, 47, 48, 44, 43, 46, 45, 42, 43, 47.
The largest number is 50 and the smallest is 35. Thus, the range, R = 15. We will use 5 classes, so
K=5. The interval width i= R/K = 15/5=3.
20 of 131
The we will make our lowest lower limit, the lower limit for the first class 35. Thus the first upper
limit is 35+3 or 38. The second class will have a lower limit of 38 and an upper limit of 41. The
completed table (with frequencies tabulated) will look like the following:
Class Lower Limit Upper Limit Frequency
1 35 38 1
2 38 41 2
3 41 44 4
4 44 47 5
5 47 50 8

In 1977, John Tukey published an efficient method for displaying a five-number data summary. The graph is
called a boxplot (also known as a box and whisker plot).
Box and whisker diagrams are often used to display statistically analyzed data. The
traditional box and whisker diagram displays the range (maximum to minimum) of the
data, the median, and the 1st and 3rd quartile about the median. The second quartile is also
the median. A review of quartiles is provided below. An alternate form of the box and
whisker diagram is to show the mean and 1 standard deviation about the mean. This latter
form of the box and whisker diagram is easier to compute. This document will describe
the two variations of the box and whisker diagram.
21 of 131

This simplest possible box plot displays the full range of variation (from min to max), the likely
range of variation (the IQR), and a typical value (the median). Not uncommonly real datasets will
display surprisingly high maximums or surprisingly low minimums called outliers. John Tukey has
provided a precise definition for two types of outliers:

• Outliers are either 3×IQR or more above the third quartile or 3×IQR or more below the first quartile.

• Suspected outliers are slightly more central versions of outliers: either 1.5×IQR or more above the
third quartile or 1.5×IQR or more below the first quartile.

If either type of outlier is present the whisker on the appropriate side is taken to 1.5×IQR from the quartile
(the "inner fence") rather than the max or min, and individual outlying data points are displayed as unfilled
circles (for suspected outliers) or filled circles (for outliers). (The "outer fence" is 3×IQR from the quartile.)
22 of 131

If the data happens to be normally distributed,

IQR = 1.35

Suspected outliers are not uncommon in large normally distributed datasets (say more than 100
data-points). Outliers are expected in normally distributed datasets with more than about 10,000
data-points. Here is an example of 1000 normally distributed data displayed as a boxplot:

Note that outliers are not necessarily "bad" data-points; indeed they may well be the most
important, most information rich, part of the dataset. Under no circumstances should they be
23 of 131
automatically removed from the dataset. Outliers may deserve special consideration: they may be
the key to the phenomenon under study or the result of human blunders.

For example, suppose you were to catch and measure


the length of 13 fish in a lake:

A box and whisker plot is based on medians. The first step is to rewrite the data in order, from smallest
length to largest:

Now find the median of all the numbers. Notice that since there are 13 numbers, the middle one will be the
seventh number:

This must be the median (middle number) because there are six numbers on each side.

The next step is to find the lower median. This is the middle of the lower six numbers. The exact centre is
half-way between 8 and 9 ... which would be 8.5
Now find the upper median. This is the middle of the upper six numbers. The exact centre is half-way
between 14 and 14 ... which must be 14

Now you are ready to construct the actual box & whisker graph. First you will need to draw an ordinary
number line that extends far enough in both directions to include all the numbers in your data:

First, locate the main median 12 using a vertical line just above your number line:
24 of 131

Now locate the lower median 8.5 and the upper median 14 with similar vertical lines:

Next, draw a box using the lower and upper median lines as endpoints:

Finally, the whiskers extend out to the data's smallest number 5 and largest number 20:

This is a box & whisker plot!

The shading below, as an example, shows the quarter of the numbers that are between 12 and 14:

Here is a picture of the quarter of the data that is between 8.5 and 12. Notice that the data is more spread out
here:

This picture is showing where half the data numbers are. Half of all the fish caught had a length between
8.5 and 14 centimetres:
Id No. Current Salary Salary Begin
1 $57,000 $27,000
2 $40,200 $18,750
25 of 131
3 $21,450 $12,000
Q. Employees’ salary data from SPSS data Editor is
4 $21,900 $13,200 given here construct the box-and-whisker plot by using
5 $45,000 $21,000 SPSS program
6 $32,100 $13,500 61 $22,500 $9,750
7 $36,000 $18,750 62 $48,000 $21,750
8 $21,900 $9,750 63 $55,000 $26,250
9 $27,900 $12,750 64 $53,125 $21,000
10 $24,000 $13,500 65 $21,900 $14,550
11 $30,300 $16,500 66 $78,125 $30,000
12 $28,350 $12,000 67 $46,000 $21,240
13 $27,750 $14,250 68 $45,250 $21,480
14 $35,100 $16,800 69 $56,550 $25,000
15 $27,300 $13,500 70 $41,100 $20,250
16 $40,800 $15,000 71 $82,500 $34,980
17 $46,000 $14,250 72 $54,000 $18,000
18 $103,750 $27,510 73 $26,400 $10,500
19 $42,300 $14,250 74 $33,900 $19,500
20 $26,250 $11,550 75 $24,150 $11,550
21 $38,850 $15,000 76 $29,250 $11,550
22 $21,750 $12,750 77 $27,600 $11,400
23 $24,000 $11,100 78 $22,950 $10,500
24 $16,950 $9,000 79 $34,800 $14,550
25 $21,150 $9,000 80 $51,000 $18,000
26 $31,050 $12,600 81 $24,300 $10,950
27 $60,375 $27,480 82 $24,750 $14,250
28 $32,550 $14,250 83 $22,950 $11,250
29 $135,000 $79,980 84 $25,050 $10,950
30 $31,200 $14,250 85 $25,950 $17,100
31 $36,150 $14,250 86 $31,650 $15,750
32 $110,625 $45,000 87 $24,150 $14,100
33 $42,000 $15,000 88 $72,500 $28,740
34 $92,000 $39,990 89 $68,750 $27,480
35 $81,250 $30,000 90 $16,200 $9,750
36 $31,350 $11,250 91 $20,100 $11,250
37 $29,100 $13,500 92 $24,000 $10,950
38 $31,350 $15,000 93 $25,950 $10,950
39 $36,000 $15,000 94 $24,600 $10,050
40 $19,200 $9,000 95 $28,500 $10,500
41 $23,550 $11,550 96 $30,750 $15,000
42 $35,100 $16,500 97 $40,200 $19,500
43 $23,250 $14,250 98 $30,000 $15,000
44 $29,250 $14,250 99 $22,050 $10,950
45 $30,750 $13,500 100 $78,250 $27,480
46 $22,350 $12,750 101 $60,625 $22,500
47 $30,000 $16,500 102 $39,900 $15,750
48 $30,750 $14,100 103 $97,000 $35,010
49 $34,800 $16,500 104 $27,450 $15,750
50 $60,000 $23,730 105 $31,650 $13,500
51 $35,550 $15,000
52 $45,150 $15,000
53 $73,750 $26,250
54 $25,050 $13,500
55 $27,000 $15,000
56 $26,850 $13,500
57 $33,900 $15,750
58 $26,400 $13,500
59 $28,050 $14,250
60 $30,900 $15,000
26 of 131

The output from SPSS program:

$140,000
29

$120,000

32

18

$100,000
103

34

35
29 71 100
$80,000

66

$60,000

32

34
$40,000
103
71
35
88
66

$20,000

$0

Beginning Salary Current Salary


27 of 131
Fishbone diagram
Dr. Kaoru Ishikawa, a Japanese quality control statistician, invented the fishbone diagram. Therefore, it may
be referred to as the Ishikawa diagram. The fishbone diagram is an analysis tool that provides a systematic
way of looking at effects and the causes that create or contribute to those effects. Because of the function of
the fishbone diagram, it may be referred to as a cause-and-effect diagram. The design of the diagram looks
much like the skeleton of a fish. Therefore, it is often referred to as the fishbone diagram.
Whatever name you choose, remember that the value of the fishbone diagram is to assist teams in
categorizing the many potential causes of problems or issues in an orderly way and in identifying root
causes.
When should a fishbone diagram be used?
Does the team...

• Need to study a problem/issue to determine the root cause?


• Want to study all the possible reasons why a process is beginning to have difficulties, problems, or
breakdowns?
• Need to identify areas for data collection?
• Want to study why a process is not performing properly or producing the desired results?

How is a fishbone diagram constructed?


Basic Steps:

1. Draw the fishbone diagram....


2. List the problem/issue to be studied in the "head of the fish".
3. Label each ""bone" of the "fish". The major categories typically utilized are:

▪ The 4 M’s:
• Methods, Machines, Materials, Manpower
▪ The 4 P’s:
• Place, Procedure, People, Policies
▪ The 4 S’s:
• Surroundings, Suppliers, Systems, Skills

Note: You may use one of the four categories suggested, combine them in any fashion or
make up your own. The categories are to help you organize your ideas.

4. Use an idea-generating technique (e.g., brainstorming) to identify the factors within


each category that may be affecting the problem/issue and/or effect being studied. The
team should ask... "What are the machine issues affecting/causing..."
5. Repeat this procedure with each factor under the category to produce sub-factors.
Continue asking, "Why is this happening?" and put additional segments each factor
and subsequently under each sub-factor.
6. Continue until you no longer get useful information as you ask, "Why is that
happening?"
7. Analyze the results of the fishbone after team members agree that an adequate amount
of detail has been provided under each major category. Do this by looking for those
items that appear in more than one category. These become the 'most likely causes".
8. For those items identified as the "most likely causes", the team should reach
consensus on listing those items in priority order with the first item being the most
probable" cause.

This tool is called a fishbone diagram because it looks like the skeleton of a fish. The purpose of the
fishbone diagram is to get to the main causes for something. A fishbone diagram can help to figure out why
a process works well. The fishbone diagram could also help to explain outcomes like grades; it is a way to
look at a process. This tool helps to show a cause effect relationship.
28 of 131

Fishbone diagrams work with cause and effect. They help to discover the root cause or main cause.
Fishbone diagrams help to get to the bottom of things. They solve the mystery of why?

By using the fishbone diagram, the problem will be addressed in a systematic, step-by-step way. This helps
to think through what needs to be done.

Suggestions for using fishbone diagrams:


There should be no judgment about ideas when people say them. The point is to create as many
causes a possible.
Everyone will have an opinion about what causes a problem. Organizing these ideas improves
the chance that good ideas can be tested.
Label the main bones of the fishbone in ways that are best for your problem or event.
You can use the fishbone diagram not only to get to the root of a problem, but also to help with
planning.

1. Total Length 2. Fork Length


3. Standard Length 4. Head Length
5. Snout Length 6. Caudal peduncle (where the body attaches to the tail)
7. Fin rays, spinous (unsegmented) and soft (segmented)
8. First (spinous) dorsal fin
9. Second (soft) dorsal fin 10. Pectoral fin
11. Pelvic (ventral) fin 12. Anal fin
13. Finlet 14. Caudal (tail) fin
15. Lateral line 16. Scutes (bone-like projections
17. Opercle (gill cover) 18. Preopercle (cheek)
19. Interopercle 20. Adipose eyelid
21. Supramaxilla (rear portion of upper jaw bone) 22. Premaxilla
(forward portion of upper jaw bone)

Fishbone Diagram

Purpose:

To identify all of the possible factors that contribute to a problem, i.e. the "effect."

Guidelines

Clearly describe the problem, i.e. the "effect," to be diagrammed. (For example: files out of
place, too many students in line, or job cost above estimate.)

Draw a box around the effect with an arrow heading to it.


EFFECT
29 of 131
1. Identify the major categories of factors that contribute to the problem. This will help the team
organize the causes.

Four often-used categories are people, equipment, methods and materials. These categories are only
suggestions. The team may use any category that helps them think creatively.

2. Draw a box around each category with an arrow pointing at the effect arrows.

3. Brainstorm the detailed factors that contribute to the problem (i.e. the "effect"). Ask for each factor,
"what causes this cause (i.e. factor)?" These are written on the diagram and connected to the
appropriate main category with arrows.

4. Each cause may have sub-causes, which should be shown on the diagram. Continue to ask "why" in
order to identify root causes.

Use the following criteria to evaluate your Fishbone Diagram:

1. Is the effect clearly stated?


2. Does it relate to the issue statement?
3. Are all potential causes listed?
4. Are all causes categorized?
5. Do causes actually reflect causes, not solutions?
6. Do all causes relate to the issue?
7. Is the diagram complete and understandable?
30 of 131

Exercise # 2
Q. 1 During 1995 a nationwide auto-leasing company sold 15,000 cars to the individuals who had originally leased
them. The types of cars involved are shown in the following frequency distribution.

Types of Car Number Sold


Manufactured by Japanese companies 3100
Manufactured by General Motors 4800
Manufactured by Chrysler Corp. 2000
Manufactured by Ford Motor Co. 1150
Manufactured by American Motors 850
Manufactured by German companies 2330
Manufactured by other companies 770

Draw the pie chart for these data.

Q. 2 It has been estimated that the number of barrels of oil in the Western Hemisphere, excluding Alaska, is given the pie
chart shown in the given figure. Assuming that there are 130 billion barrels of oil in reserve, answer the following:
a. How many barrels are in reserve in the United States?
b. How many barrels are there in Mexico?
c. How many barrels are there in Canada?

United States
29.7%

South America
Mexico
17.8%
46.1%

Canada
6.3%

Multiple Choices:
For Questions 1-3, refer to the following circle graph that gives the place of origin of the 150,000 immigrants
living in one of a large city in the southeast.

Russia China
10% 28%
Poland
7%
Other countries
6%
France
23% Italy Belgium
14% 12%

1. How many of the immigrants originated in Italy?


a. 21,028 b. 28,000 c. 2,100 d. 21,000 e. none of these
2. How many of the immigrants originated in France or Poland?
a. 34,500 b. 10,500 c. 45,000 d. 4,500 e. none of these
31 of 131
3. How many of the immigrants did not originate in Russia, China, or Belgium?
a. 72,000 b. 75,000 c. 42,000 d. 135,000 e. none of these

Rules for Constructing a Pie Chart or Circle Graph


1. Determine all the categories that are of interest from the data.
2. For each category determined in step 1, calculate its relative frequency.
3. Draw a circle and assign a slice of the circle to each category. The size of each slice should be
proportional to the fraction of observations in that category. Also, the central angle should be an angle
whose measure is 360o times the relative frequency for that category. The sum of the measures of all
the central angles should be 360o (except for possible rounding errors).
4. Place an appropriate label in each category and indicate the percentage of the total number of
observations in the category. This can be found by using the fact that for each category,
Percentage = relative frequency X 100
The sum of all the percentage must always be 100% (except possible rounding erros)
Q. 3 Determine animals react differently to the strangers. In a survey of how dogs and cats react to strangers, USA Today
obtained the following results

Types of Reaction Dogs (%) Cats (%)


Friendly 70 51
Aloof 13 35
Hostile 14 6
Neurotic 3 8

a. Draw a bar graph and pie chart to picture this information for both dogs and cats
b. Which graph makes comparisons easier? Explain your answer.
Q. 4 In the 1992 Gas Mileage Guide published by the U.S. Department of Energy, an estimate of 27 miles per gallon (mpg) is
given for the Ford Ranger pickup truck equipped with a 2.3-litter engine and two-wheel drive, five-speed transmission.
These estimates are based on repeated tests of the vehicles. In one survey of 50 Ford Ranger pickup trucks, the following
mpg results were obtained.

31 24 29 30 31 26 26 29 31
25 23 32 27 32 28 29 27 30
27 27 33 26 28 27 27 28
28 26 25 28 26 29 24 27
33 28 28 24 24 30 29 26
24 30 31 34 27 32 33 28

a. Draw a stem-and–leaf display for the above data


b. Construct the frequency polygon for the above data (Hint first construct frequency distribution by using
tally bar method)

Q. 5 A scientist from the Environmental Protection Agency took samples of the toxic substance polychlorinated biphenyl
(PCB) levels from the soil at 60 different waste disposal facilities located throughout the United States. The following
results (in 0.0001 grams per kilogram of soil) were obtained:

38.8 35.6 31.8 32.8 36.3 40.2 39.7 33.9 34.4 33.1 39.3 34.8
35.3 38.1 35.7 39.1 37.8 39.5 36.4 38.6 37.6 37.8 31.7 35.7
39.1 35.8 38.4 34.5 37.9 38.2 38.3 40.1 38.8 33.9 30.8 37.6
31.8 32.4 35.9 36.1 38.1 37.6 36.7 30.8 37.8 35.5 39.8 36.9
40.2 33.8 34.7 39.0 36.0 37.3 31.4 31.7 32.9 30.7 37.5 31.8

a. Draw a stem-and leaf diagram for the data.


b. Construct the histogram for the data

Data Array: The arrangement of raw data by observations in either ascending or descending order.
Data Point: A single observation from a data set.
Ogive: A graph of a cumulative frequency distribution.
Relative Frequency Distribution: The display of a data set that shows the fraction or percentage of the total data set that
falls into each of a set of mutually exclusively and collectively exhaustive classes
Representative Sample: A sample that contains the relevant characteristics of the population in the same proportions, as
they are included in that population.
Q. 6
Circle the correct answer or fill in the blank.
32 of 131
1. In comparison to a data array, the frequency distribution has advantage of representing data in compressed
form. T F
1. A histogram is a series of rectangles, each proportional in width to the number of items falling within a specific
class of data. T F
1. The classes in any relative frequency distribution are both all-inclusive and mutually exclusive. T F
1. When a sample contains the relevant characteristics of a certain population in the same proportions, as they are
included in that population, the sample is said to be representative sample. T F
1. A population is a collection of all the elements we are studying. T F
1. Before information is arranged and analyzed, using statistical methods, it is known as preprocessed data. T F
1. One disadvantage of the data array is that it does not allow us to easily find the highest and lowest value in the
data set. T F
1. Discrete data can be expressed only in whole numbers. T F
1. As a general rule, statisticians regard a frequency distribution as incomplete if it has fewer than 20 classes. T
F
1. It is always possible to construct a histogram from a frequency polygon. T F
1. Arranging raw data in order of time of observation forms a data array. T F
1. A baseball player’s batting average is computed using a sample. T F
1. The class widths of a frequency distribution are of equal size. T F
1. Which of the following represents the most accurate scheme of classifying data?
(a) Quantitative methods.
(b) Qualitative methods
(c) A combination of quantitative and qualitative methods
(d) A scheme can be determined only with specific information about the situation.
1. Which of the following is not an example of compressed data
(a) Frequency distribution
(b) Data array
(c) Histogram
(d) Ogive.
1. Why is it true that classes in frequency distributions are all-inclusive?
(a) No data point falls into more than one class
(b) There are always more classes than data points.
(c) All data fit into one class or another
(d) (a) and (c) but not (b).
1. When constructing a frequency distribution, the first step is
(a) Divide the data into at least five classes
(b) Sort the data points into classes and count the number of points in each class
(c) Decide on the type and number of classes for dividing the data.
(d) None of these.
1. As a general rule, statisticians tend to use which of the following number of classes when arranging data?

(a) Fewer than five.


(b) Between one and five.
(c) More than 30.
(d) Between 6 and 15.
1. A relative frequency distribution presents frequencies in terms of
(a) Fractions.
(b) Whole numbers.
(c) Percentages.
(d) All of the above.
(e) Both (a) and (c).
1. Graphs of frequency distributions are used because
(a) They have a long history in practical applications.
(b) They attract attention to data patterns.
(c) They account for biased or incomplete data.
(d) They allows for easy estimates for values.
(e) Both (b) and (d)
1. Continuous data are differentiated from discrete data in that
(a) Discrete data classes ate represented by fractions.
(b) Continuous data classes may be represented by fractions.
(c) Continuous data can take only whole numbers (d)
Discrete data can take on any real number.
1. Double counting is a result of ______________or ________________ data.
1. It is found that 50 of 1,000 customers in a survey contain the relevant characteristics of all customers in the
survey. The 50 customers are a ____________________sample.
1. The _________________and the ______________________are two methods of data arrangement.
1. A ____________________is a collection of all the elements in a group. A collection of some, but not all, of
these elements is a _______________________
33 of 131
1. Dividing data points into similar classes and counting the number observations in each class will give a
________________distribution.
1. If data can take on only a limited number of values, the classes of these data are
called_____________________. Otherwise, the classes are called____________________
1. A relative frequency distribution presents frequencies in terms of ________________or _____________.
1. A graph of a cumulative frequency distribution is called a ________________
1. If a collection of data is called a data set. A single observation would be called a______________

Q.7 The given table shows the 100 count-test results, which have been entered into a frequency distribution. Present the
above frequency distribution in frequency polygon and histogram

Class Interval Tally Bars Class Frequency


59-60 II 2
61-62 0
63-64 III 3
65-66 IIII I 6
67-68 IIII IIII I 11
69-70 IIII IIII IIII III 18
72-72 IIII IIII IIII IIII III 23
73-74 IIII IIII IIII I 16
75-76 IIII IIII II 12
77-78 IIII 4
79-80 III 3
81-82 II 2

Q.8 Circle the correct answer or fill in the blank.


1. The value of the every observation in the data set is taken into account when we calculate its median. T F
2. When the population is either negative or positively skewed, it is often preferable to use the median as the
best measure of location because it always lies between the mean and the mode. T F
3. Measures of central tendency in a data set refer to the extent to which the observations are scattered. T F
4. A measure of peakedness of a distribution curve is its skewness. T F
5. With ungrouped data, the mode is most frequently used as the measure of central tendency. T F
6. If we arrange the observations in a data set from highest to lowest, the data point lying in the middle of the
data set. T F
7. When working with grouped data, we may compute an approximate mean by assuming that each value in a
given class is equal to its midpoint. T F
8. The value most often repeated in a data set is called the arithmetic mean. T F
9. If the curve of a certain distribution tails off toward the left end of the measuring scale on the horizontal axis,
the distribution is said to be negatively skewed. T F
10. After grouping a set of data into a number of classes, we may identify the median class as being the one that
has the largest number of observations. T F
11. A mean calculated from grouped data always give a good estimate of the true value, although it is seldom
exact. T F
12. We can compute a mean for any data set once we are given its frequency distribution. T F
13. The mode is always found at the highest point of a graph of a data distribution. T F
14. The number of elements in a population is defined by n. T F
15. For a data array with 50% observations, the median will be the value of the 25 th observation in the array.
T F
16. Extreme values in a data set have a strong effect on the median. T F
17. The difference between the largest and smallest observations in a data set is called the geometric mean. T
F
18. The dispersion of a data set gives insight into the reliability of the measure of central tendency. T F
19. The standard deviation is equal to the square root of the variance. T F
20. The difference between the highest and the smallest observations in a data set is called the geometric mean.
T F
21. The interquartile range is based on only two values taken from the data set. T F
22. The standard deviation is measured in the same units as the observations in the data set. T F
23. A fractile is a location in a frequency distribution that a given population (or fraction) of the data lies at or
above. T F
34 of 131
24. The variance, like the standard deviation, takes into account every observation in the data set. T F
25. The coefficient of variation is an absolute measure of dispersion. T F
26. The measure of dispersion most often used by statisticians is the standard deviation. T F
27. One of the advantages of dispersion measures is that any statistic that measures absolute variation also
measure relative variation. T F
28. One disadvantage of using the range to measure dispersion is that it ignores the nature of the variations among
most of the observation. T F
29. The variance indicates the average distance of any observation in the data set from the mean. T F
30. Every population has a variance, which is signified by s 2. T F
31. According to Chebyshev’s theorem, no more than 11 percent of the observations in a population can have
population standard scores greater than 3 or less than –3. T F
32. The interquartile range is a specific example of an interfractile range. T F
33. It is possible to measure the range of an open-ended distribution. T F
34. The interquartile range measures the average range of the lower fourth of a distribution. T F
35. When calculating the average rate of debt expansion for a company, the correct mean to use is the
a. Arithmetic mean
b. Weighted mean
c. Geometric mean
d. Either (a) or (c).
36. The mode has all the following disadvantages except
a. A data set may have no model value.
b. Every value in a data set may be a mode.
c. A multimodal data set is difficult to analyze.
d. The mode is unduly affected by extreme values.
37. What is the major assumption we make when computing a mean from grouped data?
a. All values are discrete.
b. Every value in a class is equal to the midpoint.
c. No value occurs more than once.
d. Each class contains exactly the same number of values.
38. Which of the following statement is NOT correct?
a. Some data sets do not have means.
b. Calculation of a mean is affected by extreme data values.
c. A weighted mean should be used when it is necessary to take the importance of each value into account.
d. All these statements are correct.
39. Which of the following is the first step in calculating the median of a data set?
a. Average the middle two values of the data set.
b. Array the data.
c. Determine the relative weights of the data values in terms if importance.
d. None of these.
40. Which of the following is NOT an advantage of using a median?
a. Extreme values affect the median less strongly than they do the mean.
b. A median can be calculated for qualitative descriptions.
c. The median can be calculated for every set of data, even for all set containing open-ended classes.
d. The median is easy to understand.
e. All these are advantages of using a median.
41. Why is it usually better to calculate a mode from grouped, rather than ungrouped, data?
a. The ungrouped data tend to be bimodal.
b. The mode for the grouped data will be the same, regardless of the skewness of the distribution.
c. Extreme values have less effect on grouped data.
d. The chance of an unrepresentative value being chosen as the mode is reduced.
42. In which of these cases would the mode be most useful as an indicator of central tendency?
a. Every value in a data set occurs exactly once.
b. All but three values in a data set occur once; three values occur 100 times each.
c. Al values in a data set occur 100 times each.
d. Every observation in a data set has the same value.
43. Which of the following is an example if a parameter?
a. x.
b. n.
c. 
d. All of these
e. (b) and (c), but not (a)
44. Which of the following is NOT a measure of central tendency?
a. Geometric mean
b. Median
c. Mode
d. Arithmetic mean
35 of 131
e. All these are measures of central tendency.
45. When a distribution is symmetrical and has one mode, the highest point on the curve is called the
a. Range
b. Mode
c. Median
d. Mean
e. All of these
f. (b), (c), and (d), but not (a)
46. When referring to a curve that tails off the left end, you would call it
a. Symmetrical
b. Skewed right
c. Positively skewed
d. All of these
e. None of these
47. Disadvantages of using the range as a measure of dispersion include all of the following except
a. It is heavily influenced by extreme values
b. It can change drastically from one sample to the next
c. It is difficult to calculate
d. It is determined by only two points in the data set.
48. Why is it necessary to square the differences from the mean when computing the population variance?
a. So that extreme values will not affect the calculation.
b. Because it is possible that N could be very small.
c. Some of the differences will be positive and some will be negative.
d. None of these
49. Assume that a population has  = 100 and  = 10. If a particular observation has a standard score of 1, it
can be concluded that
a. Its value is 110
b. It lies between 90 and 110, but its exact value cannot be determined
c. Its value is greater than 110
d. Nothing can be determined without knowing N
50. Assume that a population has  = 100 and  = 10, and N = 1,000. According to Chebyshev’s theorem,
which of the following situations is NOT possible?
a. 150 values are greater than 130.
b. 93 values lie between 100 and 108.
c. 22 values lie between 120 and 125.
d. 70 values are less than 90.
e. All these situations are possible.
51. Which of the following is an example of a relative measure of dispersion?
a. Standard deviation.
b. Variance.
c. Coefficient of variation.
d. All of these.
e. (a) and (b), but not (c).
52. Which of the following is true?
a. The variance can be calculated for grouped or ungrouped data.
b. The standard deviation can be calculated for grouped or ungrouped data.
c. The standard deviation can be calculated for grouped or ungrouped data, but the variance can be calculated
only for ungrouped data.
d. (a) and (b), but not (c).
53. If one were to divide the standard deviation of a population by the mean of the same population and multiply
this value by 100, one would have calculated the same population and multiply this value by 100, one would
have calculated the
a. Population standard score.
b. Population variance.
c. Population standard deviation.
d. Population coefficient of variation.
e. None of these.
54. How does the computation of a sample variance differ form the computation of a population variance?
a.  is replaced by x .
b. N is replaced by n-1.
c. N is replaced by n.
d. (a) and (c), but not (b).
e. (a) and (b), but not (c).
55. The square of the variance of a distribution is the
a. Standard deviation.
36 of 131
b. Mean.
c. Range.
d. Absolute deviation.
e. (a) and (d).
f. None of these.
56. Chebyshev’s theorem says that 99 percent of the values will lie within  3 standard deviations from the mean
for
a. Bell shaped distributions
b. Positively skewed distributions.
c. Left-tailed distributions.
d. All distributions.
e. No distributions.
57. If a curve can be divided into two equal parts that are mirror images, it is _______________________. If it
cannot be divided in this way, it is _______________________.
58. The symbol x denotes the mean of a _____________________.  denotes the mean of a
______________________.
59. Assigning small-value consecutive integers to midpoints during calculation of the mean is called
__________________________.
60. When dealing with qualities that change over a period of times, it is better to calculate a _______________ a
_____________________ mean than a ______________________ mean.
61. If two values in a group of data occur more often than any others, the distribution of the data is said to be
________________________.
62. The extent to which values in a distribution are grouped together is a measure of ___________________.
63. In a frequency distribution, the median is the 0.5 _______________________ because half of the data values
are less than or equal to this value.
64. The difference between the values of the first and third quartiles is the _____________________ range.
65. The measure of the average squared distance between the mean and each item in the population is the
_______________________. The positive square root of this value is the ______________________.
66. The expression of the standard deviation as a percentage of the mean is the ________________________.
67. The number of standard deviation units that an observation lies above or below the mean is called the
_______________________.
68. Fractiles that divide the data into 100 equal parts are called _________________________.

Answers:

1. F 10. F 19. T 28. T 37. b 46. e 55. f 64. interquartile


2. T 11. T 20. F 29. T 38. a 47. c 56. e 65. variance, standard deviation
3. F 12. F 21. T 30. F 39. b 48. c 57. symmetrical, skewed 66. coefficient of variation
4. F 13. T 22. T 31. T 40. c 49. a 58. sample, population 67. standard score
5. F 14. F 23. F 32. T 41. d 50. a 59. coding 68. percentiles
6. T 15. F 24. T 33. F 42. b 51. c 60. geometric, arithmetic
7. T 16. F 25. F 34. F 43. c 52. d 61. bimodal
8. F 17. F 26. T 35. c 44. e 53. d 62. dispersion
9. T 18. T 27. F 36. d 45. f 54. e 63. fractile

Summary Statistics

1. Last year a small statistical consulting company paid each of its five statistical clerks $22,000, two
statistical analysts $50,000 each, and the senior statistician/owner $270,000. The number of
employees earning less than the mean salary is:
a. 0
b. 4
c. 5
d. 6
e. 7 (e)
2. The following table represents the relative frequency of accidents per day in a city.

Accidents 0 1 2 3 4 or more
Relative 0.55 0.20 0.10 0.15 0
Frequency
37 of 131
Which of the following statements are true?

I. The mean and modal number of accidents are equal.


II. The mean and median number of accidents are equal.
III. The median and modal number of accidents are equal.

a. I only
b. II only
c. III only
d. I, II and III
e. I, II (c)
3. During the past few months, major league baseball players were in the process of negotiating with
the team owners for higher minimum salaries and more fringe benefits. At the time of the
negotiations, most of the major league baseball players had salaries in the $100,000 · $150,000 a
year range. However, there were a handful of players who, via the free agent system, earned nearly
three million dollars per year. Which measure of central tendency of players' salaries, the mean or the
median, might the players have used in an attempt to convince the team owners that they (the
players) were deserving of higher salaries and more fringe benefits?
a. Not enough information is given to answer the question.
b. Either one, since all measures of central tendency are basically the same.
c. Mean.
d. Median.
e. Both the mean and the median.

4. A financial analyst's sample of six companies' book value were

$25, $7, $22, $33, $18, $15.

The sample mean and sample standard deviation are (approximately):

a. 20 and 79.2 respectively


b. 20 and 8.9 respectively.
c. 120 and 79.2 respectively.
d. 20 and 8.2 respectively.
e. 120 and 8.9 respectively. (not available)
5. A sample of underweight babies was fed a special diet and the following weight gains (lbs) were
observed at the end of three month.

6.7 2.7 2.5 3.6 3.4 4.1 4.8 5.9 8.3

The mean and standard deviation are:

a. 4.67, 3.82
b. 3.82, 4.67
c. 4.67, 1.95
d. 1.95, 4.67
e. 4.67, 1.84 (c)
6. The effect of acid rain upon the yield of crops is of concern in many places. In order to determine
baseline yields, a sample of 13 fields was selected, and the yield of barley (g/400m2) was
determined. The output from SAS appears below:

QUANTILES(DEF=4) EXTREMES
38 of 131
N 13 SUM WGTS 13 100% MAX 392 99% 392 LOW HIGH
MEAN 220.231 SUM 2863 75% Q3 234 95% 392 161 225
STD DEV 58.5721 VAR 3430.69 50% MED 221 90% 330 168 232
SKEW 2.21591 KURT 6.61979 25% Q1 174 10% 163 169 236
USS 671689 CSS 41168.3 0% MIN 161 5% 161 179 239
CV 26.5958 STD MEAN 16.245 1% 161 205 392

The mean, standard deviation, median, and the highest value are:

a. 220.231 3430.60 50% 225


b. 0.231 16.245 221 225
c. 220.231 58.5721 50% 392
d. 220.231 58.5721 221 392
e. 220.231 58.5721 234 392 (d)
7. The effect of salinity upon the growth of grasses is of concern in many places where excess irrigation
is causing salt to rise to the surface. In order to determine baseline yields, a sample of 24 fields was
selected, and the biomass of grasses in a standard sized plot was measured (kg). The output from
SAS appears below:

QUANTILES(DEF=4) EXTREMES
N 24 SUM WGTS 24 100% MAX 22.6 99% 22.6 LOW HIGH
MEAN 9.09 SUM 218.3 75% Q3 11.45 95% 22.52 0.7 15.1
STD DEV 6.64 VARIANCE 44.0 50% MED 8.15 90% 21.8 1 19.8
SKEWNE 0.924 KURTO -0.0209 25% Q1 3.775 10% 1.6 2.2 21.3
USS 2998 CSS 1012.73 0% MIN 0.7 5% 0.77 2.2 22.3
CV 72 STD MEAN 1.35 1% 0.7 2.8 22.6
T:MEAN=0 6.7153 PROB>|T| 0.0001 RANGE 21.9
SGN RANK 150 PROB>|S| 0.0001 Q3-Q1 7.675

The mean, standard deviation, tenth percentile, and the highest value are:

a. 9.09 44.0 10% 22.6


b. 9.09 6.64 1.6 15.1
c. 9.09 6.64 21.8 15.1
d. 9.09 6.64 1.6 22.6
e. 9.09 1.35 21.8 15.1 (d)
8. The heights in centimeters of 5 students are:

165, 175, 176, 159, 170.

The sample median and sample mean are respectively:

a. 170, 169
b. 170, 170
c. 169, 170
d. 176, 169
e. 176, 176 (a)
39 of 131
9. If most of the measurements in a large data set are of approximately the same magnitude except for
a few measurements that are quite a bit larger, how would the mean and median of the data set
compare and what shape would a histogram of the data set have?
a. The mean would be smaller than the median and the histogram would be skewed with a long
left tail.
b. The mean would be larger than the median and the histogram would be skewed with a long
right tail.
c. The mean would be larger than the median and the histogram would be skewed with a long
left tail.
d. The mean would be smaller than the median and the histogram would be skewed with a long
right tail.
e. The mean would be equal to the median and the histogram would be symmetrical. (b)
10. In measuring the centre of the data from a skewed distribution, the median would be preferred over
the mean for most purposes because:
a. the median is the most frequent number while the mean is most likely
b. the mean may be too heavily influenced by the larger observations and this gives too high an
indication of the centre
c. the median is less than the mean and smaller numbers are always appropriate for the centre
d. the mean measures the spread in the data
e. the median measures the arithmetic average of the data excluding outliers. (b)
11. In general, which of the following statements is FALSE?
a. The sample mean is more sensitive to extreme values than the median.
b. The sample range is more sensitive to extreme values than the standard deviation.
c. The sample standard deviation is a measure of spread around the sample mean.
d. The sample standard deviation is a measure of central tendency around the median.
e. If a distribution is symmetric, then the mean will be equal to the median. (d)
12. The frequency distribution of the amount of rainfall in December in a certain region for a period of
30 years is given below:

Rainfall Number
(in inches) of years
2.0 - 4.0 3
4.0 - 6.0 6
6.0 - 8.0 8
8.0 - 10.0 8
10.0 - 12.0 5

The mean amount of rainfall in inches is:

a. 7.30
b. 7.25
c. 7.40
d. 8.40
e. 6.50 (c)
13. A consumer affairs agency wants to check the average weight of a new product on the market. A
random sample of 25 items of the product was taken and the weights (in grams) of these items were
classified as follows:

Class Limits Frequency


74 - 77 3
77 - 80 6
80 - 83 9
40 of 131
83 - 86 3
86 - 89 4

The 3rd quartile of the weight in this sample is equal to:

a. 83.00
b. 75.00
c. 83.75
d. 18.75
e. 84.50 (c)
14. A random sample of 40 smoking people is classified in the following table:

Ages Frequency
10 - 20 4
20 - 30 6
30 - 40 12
40 - 50 10
50 - 60 8
Total 40

The mean age of this group of people.

a. 4.5
b. 8.0
c. 34.5
d. 38.0
e. 1520.0 (not available)
15. A frequency distribution of weekly wages for a group of employees is given below:

Weekly wages Frequency


50.00 - 75.00 10
75.00 - 100.00 15
100.00 - 125.00 60
125.00 - 150.00 40
150.00 - 175.00 10

The mean for this group is:

a. $112.50
b. $125.00
c. $105.41
d. $117.13
e. $118.50 (not available)
16. Consider the following cumulative relative frequency distribution:

Less than
or equal to Cum. rel. freq.
5.0 0.23
10.0 0.34
15.0 0.41
41 of 131
20.0 1.00

If this distribution is based on 800 observations, then the frequency in the second interval is:

a. 34
b. 272
c. 80
d. 88
e. 456 (d)

The following information will be used in the next three questions.


A sample of 35 observations were classified as follows:

Class Frequency
0-5 8
5 -10 2
10-15 6
15-20 8
20-25 5
25-30 5
30-35 0
35-40 1

17. The class mark of the third class is:


a. 10.0
b. 12.5
c. 15.0
d. 7.5
e. 17.5 (b)
18. The sample mean of the above grouped data is:
a. 14.89
b. 14.23
c. 15.35
d. 15.11
e. 14.74 (c)
19. The 80th percentile of the above grouped data is:
a. 27
b. 22
c. 19
d. 23
e. 24 (e)
20. Recently, the City of Winnipeg has been criticized for its excessive discharges of untreated sewage
into the Red River. A microbiologist take 45 samples of water downstream from the treated sewage
outlet and measures the number of coliform bacteria present. A summary table is as follows:

Number of Number of
Bacteria Samples
20-30 5
30-40 20
40-50 15
50-60 5
42 of 131
The 80th percentile is approximately:

a. 45
b. 47
c. 80
d. 48
e. 36 (b)
21. Recently, the City of Winnipeg has been criticized for its excessive discharges of untreated sewage
into the Red River. A microbiologist take 50 samples of water downstream from the treated sewage
outlet and measures the number of coliform bacteria present. A summary table is as follows:

Number of Number of
Bacteria Samples
50-60 5
60-70 20
70-80 10
80-90 10

The mean number of bacteria per sample is:

a. 70
b. 71
c. 66
d. 76
e. 65 (c)
22. Using the same data as in the previous question, the 75th percentile is approximately:
a. 76.5
b. 77.5
c. 75.0
d. 78.5
e. 78.0 (b)
23. A sample of 99 distances has a mean of 24 feet and a median of 24.5 feet. Unfortunately, it has just
been discovered that an observation which was erroneously recorded as "30" actually had a value of
"35". If we make this correction to the data, then:
a. the mean remains the same, but the median is increased
b. the mean and median remain the same
c. the median remains the same, but the mean is increased
d. the mean and median are both increased
e. we do not know how the mean and median are affected without further calculations; but the
variance is increased. (c)
24. The term test scores of 15 students enrolled in a Business Statistics class were recorded in ascending
order as follows:

4, 7, 7, 9, 10, 11, 13, 15, 15, 15, 17, 17, 19, 19, 20

After calculating the mean, median, and mode, an error is discovered: one of the 15's is really a 17.
The measures of central tendency which will change are:

a. the mean only


b. the mode only
c. the median only
d. the mean and mode
e. all three measures (d)
43 of 131
25. Suppose a frequency distribution is skewed with a median of $75.00 and a mode of $80.00. Which
of the following is a possible value for the mean of distribution?
a. $86
b. $91
c. $64
d. $75
e. None of these (c)
26. Earthquake intensities are measured using a device called a seismograph which is designed to be
most sensitive for earthquakes with intensities between 4.0 and 9.0 on the open-ended Richter scale.
Measurements of nine earthquakes gave the following readings:

4.5 L 5.5 H 8.7 8.9 6.0 H 5.2

where L indicates that the earthquake had an intensity below 4.0 and a H indicates that the
earthquake had an intensity above 9.0. The median earthquake intensity of the sample is:

a. Cannot be computed since all of the values are not known


b. 8.70
c. 5.75
d. 6.00
e. 6.47 (d)
27. Earthquake intensities are measured using a device called a seismograph which is designed to be
most sensitive for earthquakes with intensities between 4.0 and 9.0 on the open-ended Richter scale
Measurements of ten earthquakes gave the following readings:

4.5 L 5.5 H 8.7 8.9 6.0 H 5.2 7.2

where L indicates that the earthquake had an intensity below 4.0 and a H indicates that the
earthquake had an intensity above 9.0. One measure of central tendancy is the x% trimmed mean
computed after trimming x% of the upper values and x% of the bottom values. The value of the 20%
trimmed mean is:

a. Cannot be computed since all of the values are not known


b. 6.00
c. 6.60
d. 6.92
e. 6.57 (d)
28. When testing water for chemical impurities, results are often reported as bdl, i.e., below detection
limit. The following are the measurements of the amount of lead in a series of water samples taken
from inner city households (ppm).

5, 7, 12, bdl, 10, 8, bdl, 20, 6.

Which of the following is correct?

a. The mean lead level in the water is about 10 ppm.


b. The mean lead level in the water is about 8 ppm.
c. The median lead level in the water is 7 ppm.
d. The median lead level in the water is 8 ppm. (c)
e. Neither the mean nor the median can be computed because some values are unknown.
29. A clothing and textiles student is trying to assess the effect of a jacket's design on the time it takes
preschool children to put the jacket on. In a pretest, she timed 7 children as they put on her prototype
jacket. The times (in seconds) are provided below.
44 of 131
n n 65 92 n 43 39

The n's represent children who had not put the jacket on after 120 seconds (in which case the
children were allowed to stop). Which of the following would be the best value to use as the
"typical" time required to put on the jacket?

a. The mean time, which was 59.8 seconds.


b. The mean time, which was 85.6 seconds.
c. The median time, which was 54 seconds.
d. The median time, which was 92 seconds. (d)
e. The missing times (the n's) mean we can't calculate any useful measures of central tendency.
30. For the following histogram, what is the proper ordering of the mean, median, and mode? Note that
the graph is NOT numerically precise - only the relative positions are important.

a. I = mean II = median III = mode


b. I =mode II = median III = mean
c. I = median II = mean III = mode
d. I = mode II = mean III = median
e. I = mean II = mode III = median (a)
31. The following statistics were collected on two groups of cattle

Group A Group B
sample size 45 30
sample mean 1000 lbs 800 lbs
sample std. dev 80 lbs 70 lbs

Which of the following statements is correct?

a. Group A is less variable than Group B because Group A's std. deviation is larger.
b. Group A is relatively less variable than Group B because Group A's coefficient of variation
(the ratio of the standard deviation to the mean) is smaller
c. Group A is less variable than Group B because the std deviation per animal is smaller.
d. Group A is relatively more variable than Group B since the sample mean is larger.
e. Group A is more variable than Group B since the sample size is larger. (b)
32. "Normal" body temperature varies by time of day. A series of readings was taken of the body
temperature of a subject. The mean reading was found to be 36.5°C with a standard deviation of
0.3°C. When converted to °F, the mean and standard deviation are: (°F = °C(1.8) + 32).
a. 97.7, 32
b. 97.7, 0.30
c. 97.7, 0.54
d. 97.7, 0.97
e. 97.7, 1.80 (c)
33. A scientist is weighing each of 30 fish. She obtains a mean of 30 g and a standard deviation of 2 g.
After completing the weighing, she finds that the scale was misaligned, and always under reported
every weight by 2 g, i.e. a fish that really weighed 26 g was reported to weigh 24 g. What is mean
and standard deviation after correcting for the error in the scale? [Hint: recall that the mean measures
central tendency and the standard deviation measures spread.]
45 of 131
a. 28 g, 2 g
b. 30 g, 4 g
c. 32 g, 2 g
d. 32 g, 4 g
e. 28 g, 4 g (c)
34. A researcher wishes to calculate the average height of patients suffering from a particular disease.
From patient records, the mean was computed as 156 cm, and standard deviation as 5 cm. Further
investigation reveals that the scale was misaligned, and that all reading are 2 cm too large, e.g., a
patient whose height is really 180 cm was measured as 182 cm. Furthermore, the researcher would
like to work with statistics based on metres. The correct mean and standard deviation are:
a. 1.56m, .05m
b. 1.54m, .05m
c. 1.56m, .03m
d. 1.58m, .05m
e. 1.58m, .07m (b)
35. Rainwater was collected in water collectors at thirty different sites near an industrial basin and the
amount of acidity (pH level) was measured. The mean and standard deviation of the values are 4.60
and 1.10 respectively. When the pH meter was recalibrated back at the laboratory, it was found to be
in error. The error can be corrected by adding 0.1 pH units to all of the values and then multiply the
result by 1.2. The mean and standard deviation of the corrected pH measurements are:
a. 5.64, 1,44
b. 5.64, 1.32
c. 5.40, 1.44
d. 5.40, 1.32
e. 5.64, 1.20 (b)
36. Which of the following statements is NOT true?
a. In a symmetric distribution, the mean and the median are equal.
b. The first quartile is equal to the twenty-fifth percentile.
c. In a symmetric distribution, the median is halfway between the first and the third quartiles.
d. The median is always greater than the mean. (d)
e. The range is the difference between the largest and the smallest observations in the data set.
37. An experiment was conducted where a person's heart rate was measured 4 times in the space of 10
minutes. This was repeated on a sample of 20 people. Which of the following is not correct?
a. The standard deviation within subjects refers to the repeated measurements of a single
person's heart rate.
b. The standard deviation among subjects refers to the variation in heart rates among different
people.
c. The variation among subjects was larger than the variation within subjects.
d. The variation in heart rates based on measurements taken for 30 seconds was larger than the
variation of heart rates based on measurements taken for 15 seconds.
e. The average of the heart rate computed from the 15 seconds measuring period was about the
same as the average of the heart rates computed from the 30 second measurement periods.
(d)
38. Here is a summary graph of complex carbohyrates for each of the three fibre groups in the cereal
dataset.
46 of 131

Which of the following is NOT correct?


a. The low fibre group is more variable than the medium fibre group because the central box is
larger.
b. About 25% of low fibre cereals have less than 12 g of complex carbohydrates per serving.
c. About 50% of medium fibre cereals have more than 15 g of complex carbohydrates per
serving.
d. The average amount of complex carbohydrates per serving for the high fibre group appears to
be much smaller than the other two groups. (e)
e. About 25% of the medium fibre cereals have less than 10 g of complex carbohydrates.
39. You are allowed to choose four numbers from 1 to 10 (inclusive, without repeats). Which of the
following is not correct?
a. The numbers 4, 5, 6, 7 have the smallest possible standard deviation.
b. The numbers 1, 2, 3, 4 have the smallest possible standard deviation.
c. The numbers 1, 5, 6, 10 have the largest possible standard deviation.
d. The numbers 1, 2, 9, 10 have the largest possible standard deviation.
e. The numbers 7, 8, 9, 10 have the smallest possible standard deviation. (c)
40. Which of the following is FALSE:
a. The numbers 3, 3, 3 have a standard deviation of 0.
b. The numbers 3, 4, 5 have the same standard deviation as 1003, 1004, 1005.
c. The standard deviation is a measure of spread around the centre of the data.
d. The numbers 1, 5, 9 have a smaller standard deviation than 101, 105, 109.
e. The standard deviation can only be computed for interval or ratio scaled data. (d)

There are main three topics related to data namely


• Data Collection
• Data Presentation
• Data Properties
We have already discussed the collection and presentation of a statistical data. After the presentation of the
data, the next step in the statistical inquiry is data properties.
Properties of Numerical Data:
The three major properties that describe a set of numerical data are:
1. Central Tendency
47 of 131
2. Variation
3. Shape

Presentation of Data
The device of gathering data often results in a massive volume of statistical data, which are in the form of individual
measurements or counts. It is difficult to learn anything by examining the unorganized data, which is more often confusing than
clarifying. The mass of data is therefore to be organized and condensed into a form that can be more rapidly and easily understood
and interpreted. For this purpose, techniques of classification and graphic displays are used. Techniques of the presentation of data
are
• Individual Item Data
• Classification
• Tabulation
• Frequency Distribution
• Stem and Leaf Display
• Graphical Presentation
• Diagrams
• Graphs
Individual Item Data
In individual item data the observations are listed as per individual.
e.g. Let X represent the marks of a student out of 100 in Statistics

Student Name A B C D E F G
X 86 76 45 78 90 65 55
is an example is individual item data.

Frequency Distribution:

Properties of Numerical Data:


The three major properties that describe a set of numerical data are:
1. Central Tendency
2. Variation
3. Shape

Measure of Central Tendency


Most set of data show a distinct tendency to group or cluster about a certain point. Thus, for any particular set of data, it usually
become possible to select some typical values or average to describe the entire set. Such a descriptive typical value is a measure of
central tendency or location. There are five measures of central tendency namely
• Arithmetic Mean
• Geometric Mean
• Harmonic Mean
• Median
• Mode
Arithmetic Mean:
The arithmetic mean, also called the mean, is the most commonly used measure of central tendency or average, although the word
average refers to any summary measure of central tendency, it is most used as a synonym for the mean.
For a sample containing a set of n observations x 1, x2, x3, . . . xn, the arithmetic mean (given by the symbol X - called “X-bar”)
can be written as
n

x1 +x 2 +x 3 + . . . +x n x X i
X= = i =1
= (If data set is individual item data)
n n n
If data is discrete frequency distribution or grouped data then A. M. is given below
n

f x + f x + f x + . . . + f n xn   fX
fi xi
X= 1 1 2 2 3 3 = i =1n =
f1 + f 2 + f3 + ... + f n
f i
f
i =1
The arithmetic mean is the sum of all the values divided by the total number of values.
Example # 1:
Calculate the arithmetic mean of the following data by
1. Direct Method
2. Short-cut Method
i. X: 5, 6, 8, 7, 9, 12, 15, 14, 17
ii. Y: 2.3, 1.2, 1.5, 1.9, 2.5, 3.2, 1.6, 2.6, 1.8, 2.6
48 of 131
iii.
X 4 6 8 12 15 16 18 19
f 5 6 12 8 10 9 8 5
iv.
X 1.3 1.8 2.8 3.12 5.5 4.16 3.18 4.19
f 21 25 36 24 39 45 12 10

v.

a. b. Group f Group f
0 ⎯10 5 20 ⎯40 10
10 ⎯20 12 41 ⎯60 25
20 ⎯30 14 61 ⎯80 24
30 ⎯40 13 81 ⎯95 34
40 ⎯50 16 96 ⎯102 56
50 ⎯60 10 103 ⎯125 24
60 ⎯70 11 126 ⎯145 15
Solution:
1. Direct 70 ⎯80 11 Method: 146 ⎯163 15
i. X: 5, 80 ⎯90 12 6, 8, 7, 9, 12, 15, 14, 164 ⎯170 13 17
n = 9.
 X =5+6+8+7+9+12+15+14+17= 93.
X=
 X = 93 = 10.333
n 9
ii. Y: 2.3, 1.2, 1.5, 1.9, 2.5, 3.2, 1.6, 2.6, 1.8, 2.6
n = 10.
Y =2.3+1.2+1.5+1.9+2.5+3.2+1.6+2.6+1.8+2.6
Y=
Y = 21.2 = 2.12
n 10
iii.

X f fX
4 5 20  fX = 781,  f = 63
6 6 36
8 12 96
X=
 fX =
781
= 12.39683 Ans.
12
15
8
10
96
150
f 63
16 9 144
18 8 144
19 5 95
63 781
49 of 131

iv.

X f fX
1.30
1.80
21
25
27.30
45.00
 fX = 775.83,  f = 223
 fX
2.80 36 100.80
775.83
3.12 24 74.88 X= = = 3.4791
5.50 39 214.50 f 223
4.16 45 187.20
3.18 12 38.16
4.19 21 87.99
223 775.83
iv.

Group f X fX  fX = 4840,  f = 104


0 ⎯10 5 5 25
10 ⎯20 12 15 180
X=
 fX =
4840
= 46.54
20 ⎯30 14 25 350 f 104
30 ⎯40 13 35 455
40 ⎯50 16 45 720
50 ⎯60 10 55 550
60 ⎯70 11 65 715
70 ⎯80 11 75 825
80 ⎯90 12 85 1020
104 4840
vi)

Group f X fX
 fX = 21047.5,  f = 216
20 ⎯40 10 30 300
41 ⎯60 25 50.5 1262.5
X=
 fX =
21047.5
= 97.44
61 ⎯80 24 70.5 1692
f 216
81 ⎯95 34 88 2992
96 ⎯102 56 99 5544
103 ⎯125 24 114 2736
126 ⎯145 15 135.5 2032.5
146 ⎯163 15 154.5 2317.5
164 ⎯170 13 167 2171
216 21047.5

Short-cut Method: ii)


50 of 131
i) X D=X-10 X D=X-2
5 -5 1.2 -0.8
6 -4 1.9 -0.1
8 -2 3.2 1.2
7 -3 2.6 0.6
9 -1 2.6 0.6
12 2 2.3 0.3
15 5 1.5 -0.5
14 4 2.5 0.5
17 7 1.6 -0.4
3 1.8 -0.2
a = 10
1.2

X =a+
 D = 10 + 3 = 10.333
n 9
a =2

X =a+
 D = 2 + 1.2 = 2.12
n 10
iii) iv)

x f D=X-12 fD X f D=X-3 fD
4 5 -8 -40 1.30 21 -1.70 -35.70
6 6 -6 -36 1.80 25 -1.20 -30.00
8 12 -4 -48 2.80 36 -0.20 -7.20
12 8 0 0 3.12 24 0.12 2.88
15 10 3 30 5.50 39 2.50 97.50
16 9 4 36 4.16 45 1.16 52.20
18 8 6 48 3.18 12 0.18 2.16
19 5 7 35 4.19 21 1.19 24.99
63 25 a = 12 223 106.83

a= X =a+
 fD = 12 + 25 = 12.39683
f 63

X =a+
 fD = 3 + 106.83 = 3.4791
f 223

Group f X D=X-40 fD u=D/h fu


0 ⎯10 5 5 -35 -175 -3.5 -17.5
10 ⎯20 12 15 -25 -300 -2.5 -30
20 ⎯30 14 25 -15 -210 -1.5 -21
30 ⎯40 13 35 -5 -65 -0.5 -6.5
40 ⎯50 16 45 5 80 0.5 8
50 ⎯60 10 55 15 150 1.5 15
60 ⎯70 11 65 25 275 2.5 27.5
70 ⎯80 11 75 35 385 3.5 38.5
80 ⎯90 12 85 45 540 4.5 54
104 680 68
51 of 131

a = 40

X =a+
 fD = 40 + 680 = 46.5385 ____________________
f 104

X =a+
 fu  h = 40 + 68 10 = 46.5385
f 104

Group f X D=X-95 fD
a = 95
20 ⎯40
 fD = 95 + 527.5 = 97.44
10 30 -65 -650
41 ⎯60 X =a+
f
25 50.5 -44.5 -1112.5
216
61 ⎯80 24 70.5 -24.5 -588
81 ⎯95 34 88 -7 -238
96 ⎯102 56 99 4 224
103 ⎯125 24 114 19 456
126 ⎯145 15 135.5 40.5 607.5
146 ⎯163 15 154.5 59.5 892.5
164 ⎯170 13 167 72 936
216 54 527.5

Properties of Arithmetic Mean


1. Arithmetic mean of constant is constant is constant itself
2. Sum of deviation of the x’s about A.M. is always zero
Mathematically (X −X ) = 0
Proof: L.H.S. =  ( X − X ) =  X −  X = X − nX = X − n
 X =0
n
If data is frequency distribution
 f (X −X ) = 0
then
L.H.S. =  f ( X − X ) =  fX −  f X = fX − X  f = fX −  f
 fX =0
f
3. The sum of squared deviations of the x’s about A.M. is a minimum. In other words  ( X − X )   ( X − a)
2 2

where a is an arbitrary value


Proof:  ( X − X )   ( X − a)
2 2

Considering
 ( X − a) =  [( X − X ) + ( X − a)]
2 2

=  [( X − X ) + ( X − a) + 2( X − a)( X − X )]
2 2

=  ( X − X ) + n( X − a ) + 2( X − a) ( X − X )
2 2

=  ( X − X ) + n( X − a ) . (X − X ) = 0
2 2

  ( X − X )   ( X − a)
2 2

The equality sign holds only when X =a


52 of 131
4. Combined A.M.
5. The arithmetic mean of several sets of data may be combined into a single arithmetic mean for the combined sets of data
If k subgroups of data consisting of n1 , n2 , n3 , , nk , ( ni = n) observations have respective means
i

n1 X + n2 X + n3 X + + nk X
Xc =
n1 + n2 + n3 + + nk
X 1, X 2 , X 3 , X k then X c , the mean for all the data, is given by
=
n Xi i
(i = 1, 2,3, , k)
n i
Question No. 1
If a sample size of 22 items f a mean of 15 and another sample size and another sample size of 18 items has a mean of
20. Find the mean of the combined sample.
n1 = 22 X 1 = 15
n2 = 18 X 2 = 20
n1 X 1 + n2 X 2
Xc =
n1 + n2
22(15) + 18(20)
=
22 + 18
690
=
40
= 17.25
Question No.2
The average salary of male employees in a firm was Rs. 520 and that of females was Rs. 420. The mean salary
1
log G = [log x1 + log x2 + + log xn ]
n
1 1
=  log xi =  log X
6. If Y = aX + b where a and b are any two numbers and n n a  0 , then
1
G = anti log[  log X ]
n

Y = aX +b
Proof: Considering
Y = aX + b Summing overall values
 Y =  (aX + b)
= a  X + nb
 Y = a X + nb
n n n
Y = aX +b
The Geometric Mean: G.M. or G.
The geometric mean, G, of a set of n positive values x1, x2, x3, , xn , is defined as the positive nth root of their product,
1
i.e. G.M. = n x1.x2 .x3 .....xn = ( x1.x2 .x3 .....xn ) n , where x > 0
Example: Find the geometric mean of 12, 13, 15, 16, 17, 20
1
Solution: G.M. = (12  13 15 16 17  20) 6 = 15.28043764
When n is large, the computation is of the geometric mean becomes laborious, as we have to multiply all the values and then
extract the nth root. The arithmetic is simplified by using logarithms to the base 10. Thus, taking logarithms, we get
53 of 131
1
log G = [log x1 + log x2 + + log xn ]
n
1 1
=  log xi =  log X
n n
1
G = anti log[  log X ]
n
It means the geometric mean is the anti-logarithm of the arithmetic mean of the logarithms of the values themselves.
For data organized into a grouped frequency distribution, having k classes with class marks x1, x2, x3, , xk and the corresponding
frequencies f1, f 2, f3, , f k , the formula for the geometric mean is given by

1
log G = [log x1f1 + log x2f2 + + log xkfk ]
n
1 1
= [ f1 log x1 + f 2 log x2 + + f k log xk ] =  fi log xi
n n
1
G = anti log[  f log X ]
n
Example: The Birch Company, a manufacturer of electrical circuit boards, has manufactured the following number of
units over the past five years:
1992 1993 1994 1995 1996
12,500 13,250 14,310 15,741 17,630
Solution:

X Log(X) 1
12500 4.0969 G = anti log[
n
 log X ]
13250 4.1222
20.81805
14310 4.1556 = anti log[ ]
15741 4.1970 5
17630 4.2463 = 14575.0482
20.8180

Example: Calculate the G.M. from the following data

a. b. c.
X f Group f Group f
1250 20 120 ⎯141 45 170 ⎯174 35
1360 23 141 ⎯161 54 175 ⎯179 53
1564 21 161 ⎯181 47 180 ⎯184 54
1235 24 181 ⎯196 48 185 ⎯189 65
1587 25 196 ⎯203 52 190 ⎯194 120
1689 26 203 ⎯226 35 195 ⎯199 85
1798 28 226 ⎯246 42 200 ⎯199 52
1598 15 246 ⎯264 32 205 ⎯209 40
1756 19 264 ⎯270 30 210 ⎯214 25
Solution:

a.
54 of 131
X f Log(X) f log(X)
1
1250 20 3.0969 61.9382 G = anti log[
n
 f log X ]
1360 23 3.1335 72.07139
640.0551
1564 21 3.1942 67.07897 = anti log[ ]
201
1235 24 3.0917 74.20001 = 1528.8107
1587 25 3.2006 80.01442
1689 26 3.2276 83.91837
1798 28 3.2548 91.13411
1598 15 3.2036 48.05365
1756 19 3.2445 61.64597
201 640.0551

b.

1
Group f X log(X) f log(X)
G = anti log[
n
 f log X ]
120 ⎯141 45 130.50 2.1492 96.7149
141 ⎯161 885.2577
54 151.00 2.2068 119.1686 = anti log[ ]
161 ⎯181 47 171.00 2.2577 106.1109 385
181 ⎯196 48 188.50 2.2923 110.0283 = 199.2373
196 ⎯203 52 199.50 2.3075 119.9898
203 ⎯226 35 214.50 2.3541 82.3938
226 ⎯246 42 236.00 2.3909 100.4193
246 ⎯264 32 255.00 2.4216 77.4913
264 ⎯70 30 267.00 2.4314 72.9409
385 885.2577

c.

Group f X log(X) f log(X) 1


170 ⎯174 35 172 2.2355 78.2435
G = anti log[
n
 f log X ]
175 ⎯179 53 177 2.2480 119.1426 1206.8009
= anti log[ ]
180 ⎯184 54 182 2.2601 122.0439 529
185 ⎯189 65 187 2.2718 147.6697 = 191.1116
190 ⎯194 120 192 2.2833 273.9961
195 ⎯199 85 197 2.2945 195.0296
200 ⎯204 52 202 2.3054 119.8783
205 ⎯209 40 207 2.3160 92.6388
210 ⎯214 25 212 2.3263 58.1584
Harmonic Mean:
529 1206.8009
The harmonic mean, H.M. or H, of a set of n non-zero
values x1, x2, x3, x
, n is defined as the reciprocal of the arithmetic mean of the reciprocals of the values. The harmonic mean (H) is
another specialized average, which is useful in averaging variables expressed as rate per unit of time, such as mileage per hour,
number of units produced per day
In symbols
55 of 131

 1 1 1 1 
x +x +x + +
x1 
H .M . = H = Resciprocal of  1 1 1

 n 
 
 
n
=
1
x
i

n
=
1
X
Harmonic mean for the frequency distribution is given by

H .M . =
f
f
X
Example: An automobile is running at the rate of 10 Km/hr during the first 60 Km; at 20 Km/hr during second 60 Km; 30
Km/hr during the third 60 Km; 40 Km/hr during the fourth 60 Km and 50 Km/hr during the last 60 Km. What would be the
average speed?
Solution:
n
X 1/X H .M . =
1
10 0.10 x
i
20 0.05
5
30 0.03 = = 21.74
.23
40 0.03
50 0.02
0.23
Example: Calculate the Harmonic Mean from the following data

a. b. c.
X f Group f Group f
1250 20 120 ⎯141 45 170 ⎯174 35
1360 23 141 ⎯161 54 175 ⎯179 53
1564 21 161 ⎯181 47 180 ⎯184 54
1235 24 181 ⎯196 48 185 ⎯189 65
1587 25 196 ⎯203 52 190 ⎯194 120
1689 26 203 ⎯226 35 195 ⎯199 85
1798 28 226 ⎯246 42 200 ⎯199 52
1598 15 246 ⎯264 32 205 ⎯209 40
Solution:
1756 19 264 ⎯270 30 210 ⎯214 25
a.
56 of 131
X f 1/X f/X
H .M . =
f
1250 20 0.000800 0.016 f
1360 23 0.000735 0.016912 X
1564 21 0.000639 0.013427
201
1235 24 0.000810 0.019433 = = 1514.7176
1587 25 0.000630 0.015753 0.132698
1689 26 0.000592 0.015394
1798 28 0.000556 0.015573
1598 15 0.000626 0.009387
1756 19 0.000569 0.01082
201 0.132698
b.

Group f X 1/X f/X


H .M . =
f
f
120 ⎯141 45 130.5 0.00766284 0.344827586
X
141 ⎯161 54 151 0.00662252 0.357615894 385
= = 185.8485
2.07158
161 ⎯181 47 171 0.00584795 0.274853801

181 ⎯196 48 188.5 0.00530504 0.25464191

196 ⎯203 52 199.5 0.00501253 0.260651629

203 ⎯226 35 214.5 0.004662 0.163170163

226 ⎯246 42 236 0.00423729 0.177966102


c.
246 ⎯264 32 255 0.00392157 0.125490196

264 ⎯70 30 267 0.00374532 0.112359551


385 2.071576832

H .M . =
f
f
Group f X 1/X f/X X
170 ⎯174 35 172 0.005814 0.203488 529
= = 190.8177
175 ⎯179 53 177 0.005650 0.299435 2.772279
180 ⎯184 54 182 0.005495 0.296703
185 ⎯189 65 187 0.005348 0.347594
190 ⎯194 120 192 0.005208 0.625000
195 ⎯199 85 197 0.005076 0.431472
200 ⎯204 52 202 0.004950 0.257426
205 ⎯209 40 207 0.004831 0.193237
210 ⎯214 25 212 0.004717 0.117925
529 2.772279

Exercise # 1
Q.1 The deviation of a data about x = 22 are 0, 2, -3, -4, 6, 8 –1, 3, 0.
Q.2 A computer calculated a mean value of 42 from 20 observations. It was later discovered at the time of checking that he
had copied down two values as 45 and 38, whereas the correct values were 35 and 58. Find correct value of mean.
Q.3 The following table shows the diameters of rivets manufactured by a company.

Diameter (in inches) frequency


0.7247 ⎯ 0.7249 2
0.7250 ⎯ 07252 6
57 of 131
0.7253 ⎯ 0.7255 8
0.7256⎯0.7258 15
0.7259⎯0.7261 42
0.7262⎯0.7264 68
0.7265⎯0.7267 49
0.7268⎯0.7270 25
0.7271⎯0.7273 18
0.7274⎯0.7276 12
0.7277⎯0.7279 4
0.7280⎯0.7282 1
a) Prove that A.M.  G.M.  H.M.
b) Calculate mode and median.

Q.4 Find the arithmetic mean and geometric means of the series 1, 2, 4, 8, 16, . . . , 2 n. Find also the harmonic mean.
Q.5 Find (i) arithmetic mean, (ii) geometric mean, and (iii) harmonic mean of the series 1, 3, 9, 27, 81, . . . , 3n.
Q.6 The following data relate to sizes of shoes sold at a store during a given week. Find the median of the shoe. Also
calculate the quartiles, the 7th decile and the 64th percentile.
Size of 1 1 1 1 1
5 5 6 6 7 7 8 8 9 9
Shoes 2 2 2 2 2
No. of pairs 2 5 15 30 60 40 23 11 4 1

Q.7 In a group of 500 wage-earners, the weekly wages of 4% were under Rs. 60 and those of 15% were under Rs. 62.50. 15%
of the workers earned Rs. 95 and over, and 5% of them got Rs. 100 and over.
The median and quartile wages were Rs. 82.25, Rs. 72.75 and Rs. 90.50; the fourth and sixth decile wages were Rs.
78.75 and Rs. 85.25 respectively.
Put the above information in the form of a frequency distribution and estimate the mean wage of the 500 wage-earners
there from.
Q.8 The following table shows the distribution of the maximum loads in short tons supported by certain cables produced by a
company.

Max. Loads
9.8 ⎯10.2 10.3⎯10.7 10.8⎯11.2 11.3⎯11.7 11.8⎯12.2 12.3⎯12.7
(Short tons)
No. of pairs 7 12 17 14 6 4
Determine the mean, the median, and the mode.
Q.9 A professor has decided to use a weighted average in figuring grades for his seminar students. The homework average
will count for 20 percent of a student’s grade; the midterm, 25 percent; the final, 35 percent; the term paper, 10 percent;
and quizzes, 10 percent. From the following data, compute the final average for the five students in the seminar.

Student Homework Quizzes Paper Midterm Final


1 85 89 94 87 90
2 78 84 88 91 92
3 94 88 93 86 89
4 82 79 88 84 93
5 95 90 92 82 88
Q.10 The U.S. Postal Service handles seven basic types of letters and cards: third class, second class, first class, air mail,
special delivery, registered, and certified. The mail volume during 1977 is given in the following table:

Ounces Delivered (in


Type of Mailing Price per Ounce
millions)

1 85 89
2 78 84
3 94 88
4 82 79
5 95 90
Q.11 The growth in bad-debt expense for Johnston Office Company over the last few years follows. Calculate the average
percentage increase in bad-debt expense over his time period. If this rate continues, estimate the percentage increase in
bad-debts for 1997, relative to 1995.

1989 1990 1991 1992 1993 1994 1995


0.11 0.09 0.075 0.08 0.095 0.108 0.120
58 of 131
Q.12 73 72 75 74 72 75 73 73 76 78 74 73
74 75 73 76 75 73 75 76 76 75 74 76
72 75 75 74 76 75 76 75 75 73 76 74
76 71 76 73 74 74 74 72 72 76 77 74
75 76 74 74 77 71 72 76 75 75 77 75
77 73 77 75 73 75 74 75 74 72 74 74
74 77 75 75 74 76 73 73 73 75 74 73
73 74 73 71 78 73 75 75 76 74 75 76
76 73 75 74 76 74 74 71 75 75 73 74
70 72 74 73 74 72 78 75 77 75 73 76
75 75 77 74 77 74 75 74 74 74 76 76
73 74 75 76 74 77 76 73 72 77 75 79
74 78 73 74 76 74 75 75 76 72 75 73

Hayes Textile has shown the following percentage increase in net worth over the last 5 years:

1992 1993 1994 1995 1996


5% 10.5% 9.0% 6.0% 7.5%

Q.13 A test “Stein Strength Test in Pounds” is made and the readings are as given below:
Construct discrete frequency distribution and also compute A.M. by short-cut method
Stein Strength Test in Pounds

Q.14
Circle the correct answer or fill in the blank.
69. The value of the every observation in the data set is taken into account when we calculate its median. T F
70. When the population is either negative or positively skewed, it is often preferable to use the median as the best measure of
location because it always lies between the mean and the mode. T F
71. Measures of central tendency in a data set refer to the extent to which the observations are scattered. T F
72. A measure of peakedness of a distribution curve is its skewness. T F
73. With ungrouped data, the mode is most frequently used as the measure of central tendency. T F
74. If we arrange the observations in a data set from highest to lowest, the data point lying in the middle of the data set. T
F
75. When working with grouped data, we may compute an approximate mean by assuming that each value in a given class is
equal to its midpoint. T F
76. The value most often repeated in a data set is called the arithmetic mean. T F
77. If the curve of a certain distribution tails off toward the left end of the measuring scale on the horizontal axis, the
distribution is said to be negatively skewed. T F
78. After grouping a set of data into a number of classes, we may identify the median class as being the one that has the largest
number of observations. T F
79. A mean calculated from grouped data always give a good estimate of the true value, although it is seldom exact.
T F
80. We can compute a mean for any data set once we are given its frequency distribution. T F
81. The mode is always found at the highest point of a graph of a data distribution. T F
82. The number of elements in a population is defined by n. T F
83. For a data array with 50% observations, the median will be the value of the 25 th observation in the array. T
F
84. Extreme values in a data set have a strong effect on the median. T F
85. The difference between the largest and smallest observations in a data set is called the geometric mean. T F
86. The dispersion of a data set gives insight into the reliability of the measure of central tendency. T F
87. The standard deviation is equal to the square root of the variance. T F
88. The difference between the highest and the smallest observations in a data set is called the geometric mean. T F
89. The interquartile range is based on only two values taken from the data set. T F
90. The standard deviation is measured in the same units as the observations in the data set. T F
91. A fractile is a location in a frequency distribution that a given population (or fraction) of the data lies at or above. T
F
92. The variance, like the standard deviation, takes into account every observation in the data set. T F
93. The coefficient of variation is an absolute measure of dispersion. T F
94. The measure of dispersion most often used by statisticians is the standard deviation. T F
95. One of the advantages of dispersion measures is that any statistic that measures absolute variation also measure relative
variation. T F
59 of 131
96. One disadvantage of using the range to measure dispersion is that it ignores the nature of the variations among most of the
observation. T F
97. The variance indicates the average distance of any observation in the data set from the mean. T F
98. Every population has a variance, which is signified by s 2. T F
99. According to Chebyshev’s theorem, no more than 11 percent of the observations in a population can have population
standard scores greater than 3 or less than –3. T F
100. The interquartile range is a specific example of an interfractile range. T F
101. It is possible to measure the range of an open-ended distribution. T F
102. The interquartile range measures the average range of the lower fourth of a distribution. T F
103. When calculating the average rate of debt expansion for a company, the correct mean to use is the
a. Arithmetic mean
b. Weighted mean
c. Geometric mean
d. Either (a) or (c).
104. The mode has all the following disadvantages except
a. A data set may have no model value.
b. Every value in a data set may be a mode.
c. A multimodal data set is difficult to analyze.
d. The mode is unduly affected by extreme values.
105. What is the major assumption we make when computing a mean from grouped data?
a. All values are discrete.
b. Every value in a class is equal to the midpoint.
c. No value occurs more than once.
d. Each class contains exactly the same number of values.
106. Which of the following statement is NOT correct?
a. Some data sets do not have means.
b. Calculation of a mean is affected by extreme data values.
c. A weighted mean should be used when it is necessary to take the importance of each value into account.
d. All these statements are correct.
107. Which of the following is the first step in calculating the median of a data set?
a. Average the middle two values of the data set.
b. Array the data.
c. Determine the relative weights of the data values in terms if importance.
d. None of these.
108. Which of the following is NOT an advantage of using a median?
a. Extreme values affect the median less strongly than they do the mean.
b. A median can be calculated for qualitative descriptions.
c. The median can be calculated for every set of data, even for all set containing open-ended classes.
d. The median is easy to understand.
e. All these are advantages of using a median.
109. Why is it usually better to calculate a mode from grouped, rather than ungrouped, data?
a. The ungrouped data tend to be bimodal.
b. The mode for the grouped data will be the same, regardless of the skewness of the distribution.
c. Extreme values have less effect on grouped data.
d. The chance of an unrepresentative value being chosen as the mode is reduced.
110. In which of these cases would the mode be most useful as an indicator of central tendency?
a. Every value in a data set occurs exactly once.
b. All but three values in a data set occur once; three values occur 100 times each.
c. Al values in a data set occur 100 times each.
d. Every observation in a data set has the same value.
111. Which of the following is an example if a parameter?
a. x.
b. n.
c. 
d. All of these
e. (b) and (c), but not (a)
112. Which of the following is NOT a measure of central tendency.
a. Geometric mean
b. Median
c. Mode
d. Arithmetic mean
e. All these are measures of central tendency.
113. When a distribution is symmetrical and has one mode, the highest point on the curve is called the
a. Range
b. Mode
c. Median
60 of 131
d. Mean
e. All of these
f. (b), (c), and (d), but not (a)
114. When referring to a curve that tails off the left end, you would call it
a. Symmetrical
b. Skewed right
c. Positively skewed
d. All of these
e. None of these
115. Disadvantages of using the range as a measure of dispersion include all of the following except
a. It is heavily influenced by extreme values
b. It can change drastically from one sample to the next
c. It is difficult to calculate
d. It is determined by only two points in the data set.
116. Why is it necessary to square the differences from the mean when computing the population variance?
a. So that extreme values will not affect the calculation.
b. Because it is possible that N could be very small.
c. Some of the differences will be positive and some will be negative.
d. None of these
117. Assume that a population has  = 100 and  = 10. If a particular observation has a standard score of 1, it can be
concluded that
a. Its value is 110
b. It lies between 90 and 110, but its exact value cannot be determined
c. Its value is greater than 110
d. Nothing can be determined without knowing N
118. Assume that a population has  = 100 and  = 10, and N = 1,000. According to Chebyshev’s theorem, which of the
following situations is NOT possible?
a. 150 values are greater than 130.
b. 93 values lie between 100 and 108.
c. 22 values lie between 120 and 125.
d. 70 values are less than 90.
e. All these situations are possible.
119. Which of the following is an example of a relative measure of dispersion?
a. Standard deviation.
b. Variance.
c. Coefficient of variation.
d. All of these.
e. (a) and (b), but not (c).
120. Which of the following is true?
a. The variance can be calculated for grouped or ungrouped data.
b. The standard deviation can be calculated for grouped or ungrouped data.
c. The standard deviation can be calculated for grouped or ungrouped data, but the variance can be calculated only for
ungrouped data.
d. (a) and (b), but not (c).
121. If one were to divide the standard deviation of a population by the mean of the same population and multiply this value by
100, one would have calculated the same population and multiply this value by 100, one would have calculated the
a. Population standard score.
b. Population variance.
c. Population standard deviation.
d. Population coefficient of variation.
e. None of these.
122. How does the computation of a sample variance differ form the computation of a population variance?
a.  is replaced by x .
b. N is replaced by n-1.
c. N is replaced by n.
d. (a) and (c), but not (b).
e. (a) and (b), but not (c).
123. The square of the variance of a distribution is the
a. Standard deviation.
b. Mean.
c. Range.
d. Absolute deviation.
e. (a) and (d).
f. None of these.
124. Chebyshev’s theorem says that 99 percent of the values will lie within  3 standard deviations from the mean for
61 of 131
a. Bell shaped distributions
b. Positively skewed distributions.
c. Left-tailed distributions.
d. All distributions.
e. No distributions.
125. If a curve can be divided into two equal parts that are mirror images, it is _______________________. If it cannot be
divided in this way, it is _______________________.

Measures of Dispersion:
Measures of Dispersion:

Data are required to obtain the average dimensions and the degree of dispersion so that we can determine whether it is alright to
receive or ship the lot, and whether the production process used for manufacturing the lot was suitable, or if some action must be
taken.
Products from the same production line usually differ slightly in dimension, hardness or other qualities. If, after measuring ten
samples, they were all found to measure 10.0, 10.0, 10.0, . . . , 10.0, there would be cause for doubt. We would suspect that the
measuring instrument was wrong or we might even wonder if they had ever been measured at all!. We commute to work every
day and even if we take the same route and the same vehicle we usually find that on some days the trip in exactly the same time
every day, it would require a good deal of effort. In this way, when we look at the certain amount of data we can detect some
dispersion, actually we live in a world of dispersion.
A second important property that describes a set of numerical data is variation. Variation is the amount of dispersion or spread in
the data.
There are two measures of dispersion
1. Absolute Measures of Dispersion
a. Range
b. Quartile Deviation
c. Mean Deviation
d. Standard Deviation
e. Variance
2. Relative Measure of Dispersion
a. Coefficient of Range
b. Coefficient of Quartile Deviation
62 of 131
c. Coefficient of Mean Deviation (U%)
d. Coefficient of Variation (C.V.%)
Absolute measure is in same unit whereas relative measures are in ratio form used for comparison of different data sets
Range:
It is the difference between maximum and minimum values of the data set, mathematically range is defined as
R = X m − X 0 where
X m is the largest value of the data set and
X o is the smallest value of the data set
X − Xo
Coefficient of Range= m
Xm + Xo
Quartile Deviation:
Quartile deviation is the semi-inter-quartile range i.e.
Q3 − Q1
Q.D = . Where Q3 − Q1 is the inter-quartile range.
2
Q3 − Q1
Coefficient of Quartile Deviation=
Q3 + Q1
Mean Deviation:
Mean deviation is the average of absolute deviation of data set from A.M. , mathematically can be described as

M .D =
 X −X . For individual item data
n

M .D =
f X −X . For frequency distribution
f
M .D.
Coefficient of Quartile Deviation=
X
M .D.
U%=  100
X
Q. Show that
( X )
2

( X − X ) =  X
2
i. 2

n
( D)
2

( X − X ) =  D
2
ii. 2
− . Where D = X − a
n
 (  u )  2
2
 X −a
( X − X ) =  u −
2
  h . Where u =
2
iii.
 n  h

Solution:
i.
63 of 131
L.H .S . =  X − X ( )
2

= (X 2 + X − 2X X )
2

=  X 2 + nX − 2X  X
2

X X
2

=  X + n  2
 − 2 X
 n  n

( X ) ( X )
2 2

=X 2
+ −2
n n
( X )
2

=X2 − = R.H .S .
n
Similarly
(  fX )
2

 f ( X − X ) =  fX
2
2

f
i.
(
L.H .S . =  X − X )
2


=  ( D − a) − D − a ( )
2
X = D − a;  X = D − a

(
= D−D )
2

( D)
2

=  D2 − = R.H .S .
n

ii.
(
L.H .S . =  X − X )
2

X −a X −a

=  (a + hu ) − (a + hu 
2
u= ; u =
h h
=  (u − u )  h
2
2

 (  u )  2
2

=  u −
2
  h = R.H .S .
 n 

Standard Deviation:
Standard deviation is the average of squared deviation about A.M., mathematically can be written as

( X − X )
2

X X  D2   D 
2 2
2

S .D. = S = = −   = −  
n n  n  n  n 
Where D = X − a
64 of 131

 f (X − X ) = (  fX ) (  fD )
2 2 2

S .D. = S =  fX 2
− =  fD 2

f f f
(  fu )  h
2

=  fu −2

f
S .D.
C.V.%=Coefficient of variation=  100
X
Variance:
Variance is the square of S.D.
( X − X )
2

X X   D2   D 
2 2 2

Variance = Var ( X ) = S 2
= = −   = −  
n n  n  n  n 

 f ( X − X ) =  fX −   fX 
2 2
2

Variance = Var ( X ) = S 2
=  
f  f   f 
 fD −   fD  =  fu −   fu 
2 2
2 2

=  h2
 f   f   f   f 
S .D.
C.V.%=Coefficient of variation=  100
X
Example:
Calculate Range, Q.D., M.D., S.D and S2 from the following data, using all the discussed above methods.
Also calculate their respective relative measures
i. X: 12, 15, 16, 18, 20, 32, 18, 19, 20, 22, 23
ii.

X 12 14 16 18 20 22 24 26 28
f 8 9 5 10 12 8 4 3 7

iii.
Diameter (in inches) Frequency
47 ⎯ 49 12
50 ⎯ 52 15
53 ⎯55 16
56⎯58 19
59⎯61 20
62⎯64 20
65⎯67 22
68⎯70 14
71⎯73 15
74⎯76 13

Solution:
Range:
i. X: 12, 15, 16, 18, 20, 32, 18, 19, 20, 22, 23
X m =32 X o =12 R = X m − X 0 =32-12=20
ii.
=X 12 14 16 18 20 22 24 26 28
f 8 9 5 10 12 8 4 3 7
65 of 131
X m =28 X o =12 R = X m − X 0 =28-12=16
iii.
Diameter (in inches) Frequency X
47 ⎯ 49 12 48
50 ⎯ 52 15 51
53 ⎯55 16 54
56⎯58 19 57
59⎯61 20 60
62⎯64 20 63
65⎯67 22 66
68⎯70 14 69
71⎯73 15 72
74⎯76 13 75

a. X m =66 X o =48 R = X m − X 0 =66-48=18


b. lm = 76, lo = 47 R= lm- lo = 76 - 47=29
Both above methods are valid.
Quartile Deviation:
i.
Arranged
n +1 11 + 1
X X Q1 = th value = th value = 3rd value = 16
12 12 4 4
16 15 3(n + 1) 3(11 + 1)
20 16 Q3 = th value = th value = 9th value = 22
4 4
18 18
Q − Q1 22 − 16
20 18 Q.D. = 3 = =3
23 19 2 2
15 20
18 20
32 22
19 23
22 32

ii.
X 12 14 16 18 20 22 24 26 28
f 8 9 5 10 12 8 4 3 7
c.f. 8 17 22 32 44 52 56 59 66
n =  f = 180
n +1 66 + 1
Q1 = th value = th value = 16.75 th value = 14
4 4
3(n + 1) 3(66 + 1)
Q3 = th value = th value = 50.25th value = 22
4 4
Q − Q1 22 − 14
Q.D. = 3 = =4
2 2
iii.
66 of 131
Diameter
Frequency C.B. C.F.
(in inches)
47 ⎯ 49 12 46.5 ⎯ 49.5 12
50 ⎯ 52 15 49.5 ⎯ 52.5 27
53 ⎯55 16 52.5 ⎯ 55.5 43 Q1 group
56⎯58 19 55.5 ⎯ 58.5 62
59⎯61 20 58.5 ⎯ 61.5 82
62⎯64 20 61.5 ⎯ 64.5 102
65⎯67 22 64.5 ⎯ 67.5 124 Q3 group
68⎯70 14 67.5 ⎯ 70.5 138
71⎯73 15 70.5 ⎯ 73.5 153
74⎯76 13 73.5 ⎯ 76.5 166
166
h n  n 166
Q1 = l +  −c, = = 41.5 ,
f 4  4 4
h n  3
Q1 = l +  − c  = 53 + ( 41.5 − 27) ) = 89.44
f 4  16
h  3n  3n h  3n  3
Q3 = l +  − c  = 124.5 Q1 = l +  − c  = 67.5 + (124.5 − 124 ) = 67.61
f 4  4 f 4  14
Q − Q1 89.44 − 67.61
Q.D = 3 = = 10.915
2 2

Mean Deviation:
i.

X X −X X −X n = 11
12 -7.55 7.55  X − X = 38.55
16 -3.55 3.55
20 0.45 0.45
M .D. =
 X − X = 38.55 = 3.50
18 -1.55 1.55 n 11
20 0.45 0.45 M .D. 3.50
23 3.45 3.45 Coefficient of M.D.= = = 0.1791
X 19.5455
15 -4.55 4.55 M .D.
18 -1.55 1.55 U%=  100 = 17.91%
X
32 12.45 12.45
19 -0.55 0.55
22 2.45 2.45
215 0.00 38.55
ii
X f fX X −X f X −X
X- X
=
12 8 96 -7.21 96.00 57.70
14 9 126 -5.21 126.00 46.91
16 5 80 -3.21 80.00 16.06
18 10 180 -1.21 180.00 12.12
20 12 240 0.79 240.00 9.45
22 8 176 2.79 176.00 22.30
24 4 96 4.79 96.00 19.15
26 3 78 6.79 78.00 20.36
28 7 196 8.79 196.00 61.52
66 1268 7.09 265.58
(X − X )
2

X X −X 67 of 131
12 -7.55 57.003
X=
 fX =
1268
= 19.21
16 -3.55 12.603 f 66
20 0.45 0.2025
18 -1.55 2.4025
20 0.45 0.2025 M .D. =
f X −X 265.58
= = 4.0239
23 3.45 11.903 f 66
15 -4.55 20.703 M .D. 4.0239
Coefficient of M.D.= = = 0.2094
18 -1.55 2.4025 X 19.21
32 12.45 155 M .D.
U%=  100 = 20.94%
19 -0.55 0.3025 X
22 2.45 6.0025 iii
215 0 268.73 Diameter
(in inches)
Frequency X fX X −X X −X f X −X
47 ⎯ 49 12 48 576 -13.5723 13.57229 162.8675
50 ⎯ 52 15 51 765 -10.5723 10.57229 158.5843
=
53 ⎯55 16 54 864 -7.57229 7.572289 121.1566
56⎯58 19 57 1083 -4.57229 4.572289 86.87349
59⎯61 20 60 1200 -1.57229 1.572289 31.44578
62⎯64 20 63 1260 1.427711 1.427711 28.55422
65⎯67 22 66 1452 4.427711 4.427711 97.40964
68⎯70 14 69 966 7.427711 7.427711 103.988
71⎯73 15 72 1080 10.42771 10.42771 156.4157
74⎯76 13 75 975 13.42771 13.42771 174.5602
166 10221.00 1121.855

X=
 fX = 10221 = 61.5723
 f 166
M .D. =
 f X − X = 1121.855 = 6.7582
f 166
M .D. 6.5782 M .D.
Coefficient of M.D.= = = 0.1068 U%=  100 = 10.68%
X 61.5723 X
Standard Deviation:

( X − X )
2

S .D. = S = Direct Method


n

( X − X )
2
268.73
S .D. = S = = = 4.94
n 11

S .D. 4.94
C.V. = 100 =  100 = 25.27%
X 19.5455
68 of 131

X-method
X X2
12 144 X=
 X = 215 = 19.5455
16 256 n 11

X X 
2
20 400 2
4471  215 
2

18 324 S .D. = S = −   = −   = 4.94


n  n  11  11 
20 400
S .D. 4.94
23 529 C.V. = 100 =  100 = 25.27%
15 225 X 19.5455
18 324
32 1024
19 361
22 484
215 4471

D-method
X D=X-18 D2
12 -6 36 X =a+
 D = 18 + 17 = 19.5455
16 -2 4 n 11

D  D 
2
20 2 4 2 2
295  17 
18 0 0 S .D. = S = −   = −   = 4.94
n  n  11  11 
20 2 4
S .D. 4.94
23 5 25 C.V. = 100 =  100 = 25.27%
15 -3 9 X 19.5455
18 0 0
32 14 196
19 1 1
22 4 16
17 295
69 of 131

Exercise
Q.1 Find the Quartile Deviation from the following data (i) graphically, (ii) using an appropriate formula.

Income per
41 − 50 51 − 60 61 − 70 71 − 80 81 − 90 91 − 100 Total
week (Rs.)
No. of
30 36 43 104 73 14 300
Earners

Importance of C.V. % in Textile


C.V.% is an important statistical measure by which the range of variations in the properties of fibres, blow
room laps, slivers; rovings yarns and fabrics can be compared to decide which of the product is of better
quality than the other. For example, describe values of C.V. % and unevenness (u%) for intermediate
products of carded and combed cotton at 5% user statistics are quoted as under: -
Carded Cotton Combed Cotton
Particulars
U% C.V.% U% C.V.%
Carded Sliver
2.1 2.6 2.1 2.6
Ne 0.12 to 0.16
Combed Sliver
- - 2.1 2.6
Ne 0.12 to 2.0
Breaker Draw
Frame
2.3 2.9 1.7 2.1
Sliver Ne 0.12 to
0.18
70 of 131
Finisher Draw
Frame - - 1.4 1.7
Sliver Ne 0.12 to
0.18 2.4 3.0 - -
Ne 0.12 to 0.24
Roving Ne 1 4.3 4.5 2.8 3.5
Ne 2 4.7 5.9 2.9 3.6
Ne 3 to 4 5.0 6.3 - -
Ne 3 to 5 - - 3.0 3.7

Measures of Central Tendency:


Most sets of data show a distinct tendency to group or cluster about a certain central point. Thus, for any
particular set of data it usually become possible to select some typical

The Main Characteristics of the Mode, the Median, and the Mean

Fact
The Mode The Median The Mean
No.
It is the value of the middle point
It is the most frequent value in It is the value in a given aggregate,
of the array (not midpoint of
1 the distribution; it is the point of which would obtain if all the values
range), such that half the item are
greatest density. were equal.
above and half below it.
The value of the mode is The sum of deviations on either side
The value of the media is fixed by
established by the predominant of the mean are equal; hence, the
2 its position in the array and doesn't
frequency, not by the value in algebraic sum of the deviation is
reflect the individual value.
the distribution. equal zero.
The aggregate distance between
It is the most probable value, the median point and all the value It reflects the magnitude of every
3
hence the most typical. in the array is less than from any value.
other point.
A distribution may have 2 or
more modes. On the other hand, Each array has one and only one An array has one and only one
4
there is no mode in a median. mean.
rectangular distribution.
It cannot be manipulated Means may be manipulated
The mode does not reflect the algebraically: medians of algebraically: means of subgroups
5
degree of modality. subgroups cannot be weighted and may be combined when properly
combined. weighted.
It may be calculated even when
It cannot be manipulated It is stable in that grouping
individual values are unknown,
6 algebraically: modes of procedures do not affect it
provided the sum of the values and
subgroups cannot be combined. appreciably.
the sample size n are known.
It is unstable that it is
Value must be ordered, and may Values need not be ordered or
7 influenced by grouping
be grouped, for computation. grouped for this calculation.
procedures.
Values must be ordered and It can be compute when ends are It cannot be calculated from a
8
group for its computation. open frequency table when ends are open.
It is stable in that grouping
It can be calculated when table It is not applicable to qualitative
9 procedures do not seriously affected
ends are open. data.
it.
71 of 131

Specialized Averages: The Geometric & Harmonic Means

The Geometric Mean: The geometric mean (G) of n non-negative numerical values is the nth
root of the product of the n values.

If some values are very large in magnitude and others are small, then the geometric
mean is a better representative of the data than the simple average. In a "geometric
series", the most meaningful average is the geometric mean (G). The arithmetic
mean is very biased toward the larger numbers in the series.

An Application: Suppose sales of a certain item increase to 110% in the first year
and to 150% of that in the second year. For simplicity, assume you sold 100 items
initially. Then the number sold in the first year is 110 and the number sold in the
second is 150% x 110 = 165. The arithmetic average of 110% and 150% is 130% so
that we would incorrectly estimate that the number sold in the first year is 130 and
the number in the second year is 169. The geometric mean of 110% and 150% is G
= (1.65)1/2 so that we would correctly estimate that we would sell 100 (G)2 = 165
items in the second year.

The Harmonic Mean: The harmonic mean (H) is another specialized average, which
is useful in averaging variables expressed as rate per unit of time, such as mileage
per hour, number of units produced per day. The harmonic mean (H) of n non-zero
numerical values x(i) is: H = n/[ (1/x(i)].

An Application: Suppose 4 machines in a machine shop are used to produce the


same part. However, each of the four machines takes 2.5, 2.0, 1.5, and 6.0 minutes
to make one part, respectively. What is the average rate of speed?

The harmonic means is: H = 4/[(1/2.5) + (1/2.0) + 1/(1.5) + (1/6.0)] = 2.31 minutes.

If all machines working for one hour, how many parts will be produced? Since four
machines running for one hour represent 240 minutes of operating time, then: 240 /
2.31 = 104 parts will be produced.

The Order Among the Three Means: If all the three means exist, then the Arithmetic Mean is never less
than the other two, moreover, the Harmonic Mean is never larger than the other two.

Example # 1
Find the Arithmetic Mean of the following data
72 of 131
15.6, 17.0,
73 of 131
Probability:

Introduction:
The word probability has two basic meanings:
1. A quantitative measure of uncertainty
2. A measure of degree of belief in a particular statement or problem
The foundation of probability were laid by two French mathematicians of the seventeenth century-Blaise
Pascal (1623-1662) and Pierre De Fermat (1601-1665) in connection with gambling problems. Later on it
was developed by Jakob Bernoulli (1654-1705).
Visit sites:
http://www.maths.tcd.ie/pub/HistMath/People/Pascal/RouseBall/RB_Pascal.html
http://www.thocp.net/biographies/pascal_blaise.html

Blaise Pascal (1623-1662) Pierre De Fermat (1601-1665) Jakob Bernoulli (1654-1705).


http://www.maths.tcd.ie/pub/HistMath/People/Fermat/RouseBall/RB_Fermat.html
http://scienceworld.wolfram.com/biography/BernoulliJakob.html
The modern treatment of probability theory which consists of stating a few axioms was developed during the
twenties and thirties of this century.
Today the probability has a wider field of application and is used to make intelligent decision in Economics,
Textile Engineering, Management, Operation Research, Sociology, Psychology, Astronomy, Physics,
Engineering and Genetics where risk and uncertainty are involved.
Application:
Basically the theory of probability deals with the study of uncertainty, thus it has been found to have wide
applications in the following situations:
74 of 131
1. Insurance companies use it when they calculate insurance premiums and the probable life
expectancies of their policyholders.
2. It is used (formally or informally) by a gambler who decides to bet 10 to 1 on a particular horse.
3. Industry officials in determining the reliability of certain equipment use it.
4. Biologists in the study of genetics use it.
5. It is used by medical researchers who claim that smoking increases your chance of getting lung
cancer.
6. It is used by an investor who decides that a particular stock has for future growth than any other
stock.
7. It is used by business managers in determining which products to manufacture, which products to
advertise, and which media to use in advertising television, radio, magazines, newspapers, subway
and bus advertisements, and so on.
8. It is used by the Psychologists in predicting reactions or behavioural patterns under certain stimuli.
9. It is used by the government economists in predicting that the inflation rate will increase or decrease
in the future.
10. Rutherford used the concept of probability in locating the position of electrons in an atom.
11. Probability is used in weather forecasting.
12. In textile testing it is used to measure the level of uncertainty. In textile testing tests are made on the
basis of samples, which arises uncertainty.

Q. “Use of this product maybe hazardous to your health. This product contains saccharin, which has been
determined to cause cancer in laboratory animals.” How might probability have played a part in this
statement?
Answer: Extensive tests with animals indicated (with other factors hold as constant as possible) that subjects
that consumed saccharin were more likely to develop cancer than those not exposed to saccharin.
Extrapolating these results to humans, it was concluded that consumption produces an increase risk of
cancer.
Q. A well-known soft drink company decides to alter the formula of its oldest and most popular product.
How might probability theory be involved in such a decision?
Answer: This decision involves estimates of consumer performance; brand loyally competition response and
numerous other factors, all involving uncertainty. Hence estimates are based on probabilities.
Q. In textile testing how might probability theory be involved.
Answer: In textile testing, the results of every test is based on sample e.g. in determining the count of yarn,
different samples are used not the whole cone of yarn, uncertainty arises due to sample theory which is
measured by using the concept of probability.
Basic Terminologies in Probability:
In general, probability is the chance something will happen. Probabilities are expressed as fractions
(1/6, ½, 8/9) or as decimals (0.167, 0.500, 0.899) between zero and 1.
Assuming a probability of zero means that something can never happen; a probability of 1 indicates that
something will always happen.
Experiment:
The term experiment means a planned activity or process whose results yield a set of data.
Random Experiment:
An experiment, which produces different results even though it is repeated a large number of times
under essentially similar conditions is called a random experiment.
e.g.
1. Flipping of a fair coin
2. Throwing a balanced die
3. Drawing a card from a well-shuffled deck of 52 playing cards
http://jducoeur.org/game-hist/seaan-cardhist.html
4. Selecting a sample

Trial: A single performance of an experiment is called a trial.


Outcome: The result obtained from a n experiment or a trial is called an outcome.
Sample Space: A set consisting of all possible outcomes that can result from a random experiment (real or
conceptual), is defined to be a sample space for the experiment and we usually call it S.
75 of 131
Each possible outcome is a member of the sample space and is called a sample point. Total sample points
are denoted by n(S).
e.g.
Sample Space when
i) One coin is tossed: S ={H, T}
ii) Two coins are flipped: S = {HH, HT, TH, TT}
iii) Three coins are flipped: S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
iv) One die is rolled: S = {1, 2, 3, 4, 5, 6}
v) Two dice are rolled:

(1,1) (2,1) (3,1) (4,1) (5,1) (6,1) 


(1, 2) (2, 2) (3, 2) (4, 2) (5, 2) (6, 2) 
 
(1,3) (2,3) (3,3) (4,3) (5,3) (6,3) 
S=  ; n(S) = 36
 (1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (6, 4) 
(1,5) (2,5) (3,5) (4,5) (5,5) (6,5) 
 
(1, 6) (2, 6) (3, 6) (4, 6) (5, 6) (6, 6) 
The above sample can also be written as
S = {(i, j)\ i = 1,2,3,4,5,6; j = 1,2,3,4,5,6}
vi) You are testing an electric bulb. There are two possible outcome e1 =
the bulb is not defective e2 = the bulb is defective
so the sample space is S= {e1,e2}
vii) You are assigned a construction project, which is to be computed in 16 months. The
possible outcomes are e1 = the project is
completed before time e2 = the project is completed in next 16
months e3 = the project is completed in more than 16 months
e4 = the project is abandoned S = {e1, e2,
e3, e4}
viii) Suppose two electronic bulbs are examined. A bulb maybe non-defective (N), defective
(D). for two bulbs the outcomes maybe listed as under S = {(N, N), (N, D), (D, N), (D,
D)}

ix)
Introduction to Playing Cards:
Total cards = 52
Total Suits = 4
76 of 131
Suit Ace Number Cards Picture Cards Total
1. Heart A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K 13
2. Diamond A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K 13
52
3. Club A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K 13
4. Spade A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K 13

36 12
Event: 36
“An event is an individual outcome or any number of outcomes of a random experiment or a trial” or
“Any subset of a sample space S of the experiment, is called an event” i.e. A  S .
It is customary to denote an event by a capital letter A, B, C, . . . etc.
Note:
1. Every set is a subset of itself. In particular S  S , so the sample space S is also an event. ‘S’
is called sure event.
2.  (null set, called impossible event) is an event because   S .
3. Let A be an event then A or Ac is called the negation of A
Example:
All possible events of the sample space S={H, T}:
A=  (impossible event)
B={H}
C={T}
D={H, T} (sure event)
Rule:
A sample space consisting of n sample points can produce 2 n different events (subsets)
Q. How many events are possible for each of the given sample space
i) One coin is tossed: S ={H, T}
ii) Two coins are flipped: S = {HH, HT, TH, TT}
iii) Three coins are flipped: S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
iv) One die is rolled: S = {1, 2, 3, 4, 5, 6}
v) Two dice are rolled

(1,1) (2,1) (3,1) (4,1) (5,1) (6,1) 


(1, 2) (2, 2) (3, 2) (4, 2) (5, 2) (6, 2) 
 
(1,3) (2,3) (3,3) (4,3) (5,3) (6,3) 
S=  ; n(S) = 36
(1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (6, 4) 
(1,5) (2,5) (3,5) (4,5) (5,5) (6,5) 
 
(1, 6) (2, 6) (3, 6) (4, 6) (5, 6) (6, 6) 
Answer:
i) 22 = 4
ii) 24 = 16
iii) 28 = 256
iv) 26 = 64
v) 236 = 68719476736
Is it possible to write all the events as described in the above example?

THE LAYMAN'S GUIDE TO PROBABILITY THEORY

Independent or mutually exlusive event?


77 of 131
One of the important steps you need to make when considering the probability of two or more
events occurring. Is to decide whether they are independent or related events.

Examples:-

Mutually Exclusive vs. Independent

It is not uncommon for people to confuse the concepts of mutually exclusive events and
independent events.

Definition of a mutually exclusive event

If event A happens, then event B cannot, or vice-versa. The two events "it rained on Tuesday" and
"it did not rain on Tuesday" are mutually exclusive events. When calculating the probabilities for
exclusive events you add the probabilities.

Independent events

The outcome of event A, has no effect on the outcome of event B. Such as "It rained on Tuesday"
and "My chair broke at work". When calculating the probabilities for independent events you
multiply the probabilities. You are effectively saying what is the chance of both events happening
bearing in mind that the two were unrelated.

To be or not to be.....?

So, if A and B are mutually exclusive, they cannot be independent. If A and B are independent,
they cannot be mutually exclusive. However, If the events were it rained today" and "I left my
umbrella at home" they are not mutually exclusive, but they are probably not independent either,
because one would think that you'd be less likely to leave your umbrella at home on days when it
rains. That fact aside use the following to understand the definition.

Example of a mutually exclusive event

What happens if we want to throw 1 and 6 in any order? This now means that we do not mind if
the first die is either 1 or 6, as we are still in with a chance. But with the first die, if 1 falls
uppermost, clearly It rules out the possibility of 6 being uppermost, so the two Outcomes, 1 and 6,
are exclusive. One result directly affects the other. In this case, the probability of throwing 1 or 6
with the first die is the sum of the two probabilities, 1/6 + 1/6 = 1/3.

The probability of the second die being favourable is still 1/6 as the second die can only be one
specific number, a 6 if the first die is 1, and vice versa.

Therefore the probability of throwing 1 and 6 in any order with two dice is 1/3 x 1/6 = 1/18. Note
that we multiplied the the last two probabilities as they were independent of each other!!!

Example of an independent event

The probability of throwing a double three with two dice is the result of throwing three with the first
die and three with the second die. The total possibilities are, one from six outcomes for the first
event and one from six outcomes for the second, Therefore (1/6) * (1/6) = 1/36th or 2.77%.

The two events are independent, since whatever happens to the first die cannot affect the throw of
the second, the probabilities are therefore multiplied, and remain 1/36th.

Converse (complementary) probabilities


78 of 131
Converse - "Something that has been reversed; an opposite."

Often when you work out the probability of an event, you sometimes do not need to work out the
probability of an event occurring you need the opposite. The probability that the event will not
occur. For example, The probability of throwing a 1 on a die is 1/6 therefore the probability of a
'non-1' is (1-1/6) which equals 5/6.

Converse probabilities are used to work out such problems such as, "What is the probability of
exactly one soccer match ending in a draw within a group of three separate matches?"

Let us assume the chance of a draw occurring in any match is 1/3 or 33.33%. To fulfill our target of
only one match ending in a draw we would require the other matches to not end in a draw or (1-
(1/3)) which equals 2/3 or 66.66%.

Therefore the probability of only one match out of three being drawn is 1/3x2/3x2/3 which equals =
4/27 or (.33*.67*.67) = 14.81%

In our group of three matches there are three ways for only one match to draw, DXX, XDX, XXD,
therefore we need to add together all the probabilities, three in this case.

The final answer to the probability of one match drawing is (4/27)+(4/27)+(4/27) = 4/9 or
(=.1481+.1481+.1481) = 44.44%.

The Birthday Problem

Converse probabilities are used to work out the infamous birthday problem. Many people find the
answer puzzling but it can be proved by either asking your personal manger for birthday dates or
flicking through a the who’s who in your reference library.

The question is:-

"How many people should be gathered in a room together before it is more likely than not that two
of them share the same birthday?"

Ignoring the issues of leap years the problem is solved as follows:-

When the first person enters the room and announces their birthday, the probability of the second
person sharing the same birthday is 1/365. Conversely, the probability of the second birthday
being different is the opposite of the first calculation, 364/365. When two birthdays are known, the
probability of the third being different is 363/365, as there are now two 'favourable' outcomes
among 365. The compound probability of birthday 2 being different from birthday 1, and of
birthday 3 being different from the other two, these being independent outcomes, is:-

(364/365)*(363/365) = 0.991796 or 99.2% chance that two people will not share the same
birthday.

Note the start of the sequence is (365/365). We have removed this as it does not affect the result
of the calculation.

All that is necessary now is to continue adding terms to the fraction until it equals less than 1/2 or
50%, since as soon as the probability is less than 1/2 that all birthdays are different, the probability
is clearly more than 1/2 that any two are the same. In other words it is more likely than not that two
people in the room share the same birthday. The following chart shows the number of the people
in the room and the probability that they DO NOT share the same birthday.
79 of 131
People Chance %
2 99.7
3 99.2
4 98.4
5 97.3
6 96.0
7 94.4
8 92.6
9 90.5
10 88.3
11 85.9
12 83.3
13 80.6
14 77.7
15 74.7
16 71.6
17 68.5
18 65.3
19 62.1
20 58.9
21 55.6
22 52.4
23 49.3
24 46.2
50 3.0
3,254,690 to
100
1 on

The fraction drops to less than 1/2 with 23 iterations, so it is more likely than not that in any
gathering of 23 or more persons, two of them will share a birthday.

Only 50 people need be present for the 'coincidence' of two of them having the same birthday to
become, roughly, a 30-1 on chance.

In a company of 100 employees the odds are more than three million to one on that two share a
birthday.

The birthdays proposition is one where a gambler who can estimate probabilities can make money
from unsuspecting punters.

Venn Diagram:
A diagram that is used to represent sets by circular regions, parts of circular regions or their complements
with respect to a rectangle representing the space S is called a Venn diagram devised by English logician
John Venn (1834-1923). The Venn diagrams are used to represent sets and subsets in a particular way and to
verify the relationship among sets and subsets.
80 of 131

A A ( A B)

A B A B A \ B or A B B \ A or A B

A S =S
A B C A B C A B C

A B C A B C A B C

A B C (A B ) (C D) ( A B C) ( A B C)

This ability to represent a "sharing of conditions" makes Venn diagrams useful tools for solving complicated
problems. Consider the following example:

Example:
81 of 131

Twenty-four dogs are in a kennel. Twelve of the dogs are black, six
of the dogs have short tails, and fifteen of the dogs have long hair.
There is only one dog that is black with a short tail and long hair.
Two of the dogs are black with short tails and do not have long hair.
Two of the dogs have short tails and long hair but are not black. If
all of the dogs in the kennel have at least one of the mentioned
characteristics, how many dogs are black with long hair but do not
have short tails?

Solution:
Draw a Venn diagram to represent the situation described in the problem.
Represent the number of dogs that you are looking for with x.

• Notice that the number of dogs in each of the three categories is labeled OUTSIDE of the circle in a
colored box. This number is a reminder of the total of the numbers which may appear anywhere inside
that particular circle.
• After you have labeled all of the conditions listed in the problem, use this OUTSIDE box number to
help you determine how many dogs are to be labeled in the empty sections of each circle.

• Once you have EVERY section in the diagram labeled with a number or an expression, you are ready
to solve the problem.

• Add together EVERY section in the diagram and set it equal to the total number of dogs in the
kennel (24). Do NOT use the OUTSIDE box numbers.

• 9 - x + 2 + 1 + 1 + 2 + x + 12 - x = 24

• 27 - x = 24

• x = 3 (There are 3 dogs which are black with long hair but do not have a short tail.)

Mutually Exclusive Events:


82 of 131
“ Two events A and B of a single experiment are said to be mutually exclusive or disjoint if and only if
they cannot both occur at the same time, i.e. they have no points in common.” In other words
“ Two events A and B of a single experiment are said to be mutually exclusive or disjoint if and only if
A B = ”
With the help of Venn diagram we can define mutually exclusive events
A B A B

Not mutually exclusive A & B are mutually exclusive

In general the events E1, E2, ... , En are said to be mutually exclusive if the occurrence of any
one them automatically implies the non-occurrence of the remaining n-1 events. In other words,
two mutually exclusive events cannot both occur.

Examples

• A flipped coin coming up heads and the same coin coming up tails are mutually exclusive events.
• A student passing a test and failing it are mutually exclusive (though someone can fail a test, retake it, and
then pass).

Q. Which of the following are pair of mutually exclusive in the drawing of one card from a standard deck of
52 or in rolling of two dice.
i) A heart and a queen
ii) A club and red card
iii) An even number and a spade
iv) An ace and an even number
v) A total of 5 points and 5 on one die
vi) A total of 7 points and an even number of points on both dice
vii) A total of 9 points and a 2 on one die
viii) A total of 10 points and a 4 on one die

Answer:
i) Not mutually exclusive
ii) m.e.
iii) not m.e.
iv) m.e.
v) m.e.
vi) not m.e.
vii) not m.e.
viii) not m.e.

Equally Likely Outcomes:


Two events A and B are said to be equally likely, if both have same chances of occurrence. In flipping a
coin head and tail both outcomes have same chances of occurrence.

Probability:
1. Quantitative Approach
2. Subjective Approach
Definition of Probability:
There are three basic approaches to define quantitative probability
1. Mathematical or Classical Definition of Probability
2. Axiomatic Definition of Probability
83 of 131
3. Relative Frequency Definition of Probability

Mathematical or Classical Definition of Probability:


If a random experiment can produce n mutually exclusive and equally likely outcomes and m out of these
outcomes are considered favourable to the occurrence of a certain event A, then the probability of the event
m
A, denoted by P(A), is defined as the ratio . Symbolically, we write
n
m n( A) Number of favourable outcomes
P( A) = = =
n n( S ) Total possible outcomes

For example, in a deck of 52 cards, the probability of pulling one of the 13 hearts from the deck is much
higher than the likelihood of pulling out the ace of spades. To calculate an exact value for the probability of
drawing a heart from the deck, divide the number of hearts you could possibly draw by the total number of
cards in the deck.

In contrast, the possibility of drawing the single ace of spades from the deck is:

After looking at these examples, you should be able to understand the general formula for calculating
probability. Here’s another example:

Joe has 3 green marbles, 2 red marbles, and 5 blue marbles. If all the marbles are dropped into a dark bag, what is
the probability that Joe will pick out a green marble?

There are 3 ways for Joe to pick a green marble (since there are 3 different green marbles), but there are 10
total possible outcomes (one for each marble in the bag). Therefore, you can simply calculate the probability
of picking a green marble:

When calculating probabilities, always be careful to count all of the possible favorable outcomes among the
total possible outcomes. In the last example, you may have been tempted to leave out the three chances of
picking a green marble from the total possibilities, yielding the equation P = 3⁄7. If you did that, you’d be
wrong.
84 of 131
The Range of Probability

The probability, P, of any event occurring will always be 0 ≤ P ≤ 1. A probability of 0 for an event means
that the event will never happen. A probability of 1 means the event will always occur. For example,
drawing a green card from a standard deck of cards has a probability of 0; getting a number less than seven
on a single roll of one die has a probability of 1.

The Probability That an Event Will Not Occur

The probability that an event will not occur. In that case, just figure out the probability of the event
occurring, and subtract that number from 1.

Example # 1: Find the probability of appearing head if one coin is tossed.


Solution:
Sample space when one coin is tossed: S = {H, T},
Let
A = One head appear. A = {H}
n(S) =2, n(A) = 1
n( A)
P( A) =
n( S )
1
=
2

Example # 2: A pair of fair coins is tossed then find the probability of appearing
i) One head
ii) Two heads
iii) At least one tail
iv) At most one head
Solution:
Sample space when two coins are tossed: S = {HH, HT, TH, TT}

i) Let ii) Let


A = One head appear. B = Two heads appear.
A = {HT, TH} B = {HH}
n(S) = 4, n(A) = 2 n(S) = 4, n(B) = 1
n( A) n( B )
P( A) = P( B) =
n( S ) n( S )
2 1 1
= = =
4 2 2

iii) Let iv) Let


C = At least one tail. D = At most one head.
C = {HT, TH, TT} D = {HT, TH, TT}
n(S) = 4, n(C) = 3 n(S) = 4, n(D) = 3
n(C ) n( D )
P(C ) = P( D) =
n( S ) n( S )
3 3
= =
4 4
85 of 131

Example # 3: Three coins are tossed simultaneously then find the probability of appearing
i) One head
ii) At least head
iii) At least two tails
iv) No head
Solution:
Sample space when three coins are tossed:
S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}

i) Let ii) Let


A = One head appear. B = At least one head appears.
A = {HTT, THT, TTH} B = {HTT, THT, TTH, HHT, HTH,
n(S) = 8, n(A) = 3 THH, HHH}
n( A) n(S) = 8, n(B) = 7
P( A) =
n( S ) n( B )
P( B) =
3 n( S )
=
8 7
=
8
iii) Let iv) Let
C = At least two tails. D = No head.
C = {TTH, THT, HTT, TTT} D = {TTT}
n(S) = 8, n(C) = 4 n(S) = 8, n(D) = 1
n(C ) n( D )
P(C ) = P( D) =
n( S ) n( S )
4 1 1
= = =
8 2 8

Example # 4 When a fair dice is rolled what is the probability that the upper face turned up
i) 6
ii) not 6
iii) even integer
iv) less than 5
v) between 3 and 5
vi) either 4 or 6

Solution:
Sample space when one die is rolled: S = {1, 2, 3, 4, 5, 6}
i) Let ii) Let
A = 6 appear. B = Not 6.
A = {6} B = {1, 2, 3, 4, 5 }
n(S) = 6, n(A) = 1 n(S) = 6, n(B) = 5
n( A) n( B )
P( A) = P( B) =
n( S ) n( S )
1 5
= =
6 6
86 of 131
iii) Let iv) Let
C = Even Integer. D = Less than 5
C = {2, 4, 6} D = {1, 2, 3, 4}
n(S) = 6, n(C) = 3 n(S) = 6, n(D) = 4
n(C ) n( D )
P(C ) = P( D) =
n( S ) n( S )
3 1 4 2
= = = =
6 2 6 3
v) Let vi) Let
E = Between 3 and 5. F = Either 4 or 6
E = {4} F = {4, 6}
n(S) = 6, n(E) = 1 n(S) = 6, n(F) = 2
n( E ) n( F )
P( E ) = P( F ) =
n( S ) n( S )
1 2 1
= = =
6 6 3

Example # 5 Two fair dice are rolled at the same time and the number of dots appearing on both dice is
counted. Find the probability that this sum is
i) 7
ii) Odd number greater than 6
iii) Less than 2
iv) More than 12
v) At least 4
vi) Between 2 & 12 inclusively
vii) At most 8
viii) Divisible by 4

Solution:
Sample space when one die is rolled:
(1,1) (2,1) (3,1) (4,1) (5,1) (6,1) 
(1, 2) (2, 2) (3, 2) (4, 2) (5, 2) (6, 2) 

(1,3) (2,3) (3,3) (4,3) (5,3) (6,3) 
S=  ; n(S) = 36
(1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (6, 4) 
(1,5) (2,5) (3,5) (4,5) (5,5) (6,5) 
 
(1, 6) (2, 6) (3, 6) (4, 6) (5, 6) (6, 6) 
i) Let ii) Let
A = Sum is 7. B = Odd number greater than 6
n(S) = 36, n(A) = 6 B = {7, 9, 11}
n( A) n(S)=36, n(B) = 6+4+2=12
P( A) =
n( S ) n( B )
P( B) =
6 1 n( S )
= =
36 6 12 1
= =
36 3
iii) Let iv) Let
C = Less than 2. D = More than 12
C={} D={}
n(S) = 36, n(C) = 0 n(S) = 36, n(D) = 0
n(C ) n( D )
P(C ) = P( D) =
n( S ) n( S )
87 of 131
0 0
= =0 = =0
36 36

v) Let vi) Let


E = At least 4. i.e. 4 or greater F = Between 2 and 12 inclusively
than 4 n(S) = 36, n(F) = 36
n(S) = 36, n(E) = 30 n( F )
P( F ) =
n( E ) n( S )
P( E ) =
n( S ) 36
= =1
30 5 36
= =
36 6
vii) Let viii) Let
G = At most 8 i.e. 8 or less H = Divisible by 4
n(S) = 36, n(G) = 26 n(S) = 36, n(H) = 9
n(G ) n( H )
P (G ) = P( H ) =
n( S ) n( S )
26 13 9 1
= = = =
36 18 36 4

Permutations and Combinations

Permutations and combinations are counting tools. They have vast applications in probability, especially in
determining the number of successful outcomes and the number of total outcomes in a given scenario.
Questions about permutations and combinations on the Math IC will not be complex, nor will they require
advanced math. But you will need to understand how they work and how to work with them. Important to
both of these undertakings is a familiarity with factorials.

Factorials

The factorial of a number, n!, is the product of the natural numbers up to and including n:

If you are ever asked to find the number of ways that the n elements of a group can be ordered, you simply
need to calculate n!. For example, if you are asked how many different ways 6 people can sit a table with six
chairs, you could either list all of the possible seating arrangements or just answer 6! = 6 5 4 3 2
1 = 720.

Permutations

Permutation means arrangement of things. The word arrangement is used, if the order of things is
considered.

A permutation is an ordering of elements. For example, say you’re running for student council. There are six
different offices to be filled—president, vice president, secretary, treasurer, spirit coordinator, and
parlimentarian—and there are six candidates running. Assuming the candidates don’t care which office
they’re elected to, how many different ways can the student council be composed?

The answer is 6! because there are 6 students running for office, and thus, 6 elements in the set.
88 of 131
Say that due to budgetary costs, there are now only the three offices of president, vice president, and
treasurer to be filled. The same 6 candidates are still running. To handle this situation, we will now have to
change our method of calculating the number of permutations.

In general, the permutation, nPr, is the number of subgroups of size r that can be taken from a set with n
elements:

For our example, we need to find 6P3:

Consider the following problem:

At a dog show, three awards are given: best in show, first runner-up, and second runner-up.
A group of 10 dogs are competing in the competition. In how many different ways can the
prizes be awarded?

Example: Suppose we have to form a number of consisting of three digits using the digits
1,2,3,4, To form this number the digits have to be arranged. Different numbers will get formed
depending upon the order in which we arrange the digits. This is an example of Permutation.

Now suppose that we have to make a team of 11 players out of 20 players, This is an example of
combination, because the order of players in the team will not result in a change in the team. No
matter in which order we list out the players the team will remain the same! For a different team to
be formed at least one player will have to be changed.

Now let us look at two fundamental principles of counting:

Addition rule : If an experiment can be performed in ‘n’ ways, & another experiment can be
performed in ‘m’ ways then either of the two experiments can be performed in (m+n) ways. This
rule can be extended to any finite number of experiments.

Example: Suppose there are 3 doors in a room, 2 on one side and 1 on other side. A man
want to go out from the room. Obviously he has ‘3’ options for it. He can come out by door ‘A’ or
door ‘B’ or door ’C’.

Multiplication Rule : If a work can be done in m ways, another work can be done in ‘n’ ways, then
both of the operations can be performed in m x n ways. It can be extended to any finite number of
operations.
89 of 131
Example.: Suppose a man wants to cross-out a room, which has 2 doors on one side and 1
door on other site. He has 2 x 1 = 2 ways for it.

Factorial n : The product of first ‘n’ natural numbers is denoted by n!.

n! = n(n-1) (n-2) ………………..3.2.1.

Ex. 5! = 5 x 4 x 3 x 2 x 1 =120

Note 0! = 1

Proof n! =n, (n-1)!

Or (n-1)! = [n x (n-1)!]/n = n! /n

Putting n = 1, we have

O! = 1!/1

or 0 = 1

Permutation

Number of permutations of ‘n’ different things taken ‘r’ at a time is given by:-
nP = n!/(n-r)!
r

Proof: Say we have ‘n’ different things a1, a2……, an.

Clearly the first place can be filled up in ‘n’ ways. Number of things left after filling-up the first place
= n-1

So the second-place can be filled-up in (n-1) ways. Now number of things left after filling-up the
first and second places = n - 2

Now the third place can be filled-up in (n-2) ways.

Thus number of ways of filling-up first-place = n

Number of ways of filling-up second-place = n-1

Number of ways of filling-up third-place = n-2

Number of ways of filling-up r-th place = n – (r-1) = n-r+1


90 of 131
By multiplication – rule of counting, total no. of ways of filling up, first, second -- rth-place
together :-

n (n-1) (n-2) ------------ (n-r+1)

Hence:
nP = n (n-1)(n-2) --------------(n-r+1)
r

= [n(n-1)(n-2)----------(n-r+1)] [(n-r)(n-r-1)-----3.2.1.] / [(n-r)(n-r-1)] ----3.2.1

nP = n!/(n-r)!
r

Number of permutations of ‘n’ different things taken all at a time is given by:-
nP = n!
n

Proof :
Now we have ‘n’ objects, and n-places.

Number of ways of filling-up first-place = n

Number of ways of filling-up second-place = n-1

Number of ways of filling-up third-place = n-2

Number of ways of filling-up r-th place, i.e. last place =1

Number of ways of filling-up first, second, --- n th place


= n (n-1) (n-2) ------ 2.1.
nP = n!
n

Concept.

We have nP = n!/n-r
r

Putting r = n, we have :-
nP = n! / (n-r)
r

But nP = n!
n

Clearly it is possible, only when n! = 1

Hence it is proof that 0! = 1

Note : Factorial of negative-number is not defined. The expression –3! has no meaning.

Examples
91 of 131
Q. How many different signals can be made by 5 flags from 8-flags of different colours?

Ans. Number of ways taking 5 flags out of 8-flage = 8P5

= 8!/(8-5)!

= 8 x 7 x 6 x 5 x 4 = 6720

Q. How many words can be made by using the letters of the word “SIMPLETON” taken all at a
time?

Ans. There are ‘9’ different letters of the word “SIMPLETON”

Number of Permutations taking all the letters at a time = 9P9

= 9! = 362880.

Number of permutations of n-thing, taken all at a time, in which ‘P’ are of one type, ‘g’ of them are
of second-type, ‘r’ of them are of third-type, and rest are all different is given by :-

n!/p! x q! x r!

Example: In how many ways can the letters of the word “Pre-University” be arranged?

13!/2! X 2! X 2!

Number of permutations of n-things, taken ‘r’ at a time when each thing can be repeated r-times is
given by = nr.

Proof.

Number of ways of filling-up first –place = n

Since repetition is allowed, so

Number of ways of filling-up second-place = n

Number of ways of filling-up third-place

Number of ways of filling-up r-th place = n

Hence total number of ways in which first, second ----r th, places can be filled-up

= n x n x n ------------- r factors.

= nr

Example: A child has 3 pocket and 4 coins. In how many ways can he put the coins in his
pocket.
92 of 131
Ans. First coin can be put in 3 ways, similarly second, third and forth coins also can be put in 3
ways.

So total number of ways = 3 x 3 x 3 x 3 = 34 = 81

This problem is a permutation since the question asks us to order the top three finishers among 10
contestants in a dog show. There is more than one way that the same three dogs could get first place, second
place, and third place, and each arrangement is a different outcome. So, the answer is 10P3 = 10!⁄(10 -3)! =
10!⁄7! = 720.

Permutations and Calculators

Graphing calculators and most scientific calculators have a permutation function, labeled nPr. In most cases,
you must enter n, then press the button for permutation, and then enter r. This will calculate a permutation
for you, but if n is a large number, the calculator often cannot calculate n!. If this happens to you, don’t give
up! In cases like this, your knowledge of the permutation function will save you. Since you know that 100P3
is 100!⁄(100 -3)! you can simplify it to 100! /97!, or 100 99 98 = 970,200.

Combinations

Combination means selection of things. The word selection is used, when the order of things has no
importance.

A combination is an unordered grouping of a set. An example of a scenario in which order doesn’t matter is
a hand of cards: a king, an ace, and a five is the same as an ace, a five, and a king.

Combinations are represented as nCr, or , where unordered subgroups of size r are selected from a set of
size n. Because the order of the elements in a given subgroup doesn’t matter, this means that will be less
than nPr. Any one combination can be turned into more than one permutation. nCr is calculated as follows:

Here’s an example:

Suppose that a committee of 10 people must elect three leaders, whose duties are all the
same. In how many ways can this be done?

In this example, the order in which the leaders are assigned to positions doesn’t matter—the leaders aren’t
distinguished from one another in any way, unlike in the student council example. This distinction means
that the question can be answered with a combination rather than a permutation. We are looking for how
many different groups of three can be taken from a group of 10:

There are only 120 different ways to elect three leaders, as opposed to 720 ways when their roles were
differentiated.

Combinations and Calculators


93 of 131
There should be a combination function on your graphing or scientific calculator labeled nCr. Use it the
same way you use the permutation key.

Exercise:
1. How many distinct arrangements are possible using all the letters in the word
DOODLED? (Express your answer in factorial notation.)

2. How many committees of 6 people can be selected from 10 girls and 4 boys?
(Give answer in factorial form.)

3. How many distinct arrangements are possible using all the letters in the word
TORONTO? (Give answer in factorial form.)

4. How many ways can four students be seated in a row if two of them must be
seated together?
Permutations, Combinations, Binomial Theorem

5. How many distinct arrangements are possible using all the letters in the word
“wormwood”? Leave answer in factorial form.

6. In how many ways can 4 girls and 2 boys be seated at a round table? (Express
your answer in factorial notation.)

Multiple Choice:
1. Find the number of ways 8 different books can be arranged on a shelf if 3
particular books must be placed together.
a) 4320
b) 40 320
c) 241 920
d) 6720
Answer: a)

2. Harry, Peter and four girls sit in a row. Harry can’t sit in an end seat. In how
many ways can they be arranged?
a) 96
b) 600
c) 480
d) 240
Answer: c)

Long Answer:
1. Given the letters of the word ELEMENTS, how many 5 letter “words” can be
found in which vowels and consonants alternate?
Answer: 80

2. How many integers that do not contain repeated digits are there from 1 to 1000
inclusive?
Answer: 738

3. How many four digit multiples of five that are greater than 4000 can be formed
from the digits 0,1,2,3,5,9? (Repetitions are not allowed)
Answer: 36

4. There are 12 people to be seated around two circular tables, the first with seven
chairs and the second with five chairs. In how many ways can this be done?
94 of 131
Answer: 13 685 750 ways

5. How many four digit multiples of 5 can be formed using the digits 1, 2, 5, 7, 9, 0,
if no repetitions are allowed?
Answer: 108

6. How many different 3 letter “words” are possible using the letters in OCTOBER?
Answer: 135 arrangements

7. How many different two letter arrangements can be made from the letters in the
word SEEN?
Answer: 7

8. How many 4 digit numbers, greater than 7500 can be made using the digits
0,3,5,7,8,9, without repetition?
Answer: 156 numbers

9. A standard deck consists of 52 cards. How many 5 card hands can be made if a
hand must contain 2 pairs, each pair of a different value? The fifth card must
have a different value than either of the pairs.
Answer: 123 552 ways

10. Five boys and five girls are going for a picnic. Six can ride in one car and four in
another. In how many ways can they be distributed between the two cars?
Answer: 210 ways

11. How many 5 card hands can be formed from a deck of 52 cards if the hands
contain a pair and 3 of a kind?
Answer: 3744 ways

12. Four boys and four girls have decided to go out for dinner at a local restaurant.
There are two circular tables available, one near the window, and one in the centre
of the restaurant. Each table seats 4 people.
a) In how many ways can the eight people be seated randomly?
b) In how many ways can they be seated if all the boys must sit at one table and all
the girls at another table?
Answer: a) 2520
b) 36

13. A committee of 10 people is to be chosen from 10 boys and 10 girls. How many
ways can this be done if the committee must contain 9 boys and 1 girl?
Answer: 100 ways

Problem 1

Mike, a DJ at a high-school radio station, needs to play two or three more songs before the end of the
school dance. If each composition must be selected from a list of the 10 most popular songs of the
year, how many song sequences are available for the remainder of the dance?

(A) 6
(B) 90
(C) 120
(D) 720
(E) 810
95 of 131

Members of a student parliament took a vote on a proposition for a new social event on Fridays. If all the
members of the parliament voted either for or against the proposition and if the proposition was accepted in
a 5-to-2 vote, in how many ways could the members vote?

(A) 7
(B) 10
(C) 14
(D) 21
(E) 42

A set consists of all integers between 600 and 1999, inclusive. If a number is selected at random from this
set, what is the probability that it is divisible by both 5 and 13?

(A) 65/1399
(B) 13/280
(C) 1/50
(D) 3/200
(E) 21/1399

Jake, Lena, Fred, John and Inna need to drive home from a corporate reception in an SUV that can seat 7
people. If only Inna or Jake can drive, how many seat allocations are possible?

(A) 30
(B) 42
(C) 120
(D) 360
(E) 720

Permutation and Combination


Circular Permutations
<<(premutation and combination) previous | next (restricted permutations}>>

Circular permutations

There are two cases of circular-permutations:-


96 of 131
(a) If clockwise and anti clock-wise orders are different, then total number of circular-
permutations is given by (n-1)!

(b) If clock-wise and anti-clock-wise orders are taken as not different, then total number of
circular-permutations is given by (n-1)!/2!

Proof(a):

(a) Let’s consider that 4 persons A,B,C, and D are sitting around a round table

Shifting A, B, C, D, one position in anticlock-wise direction, we get the following agreements:-

Thus, we use that if 4 persons are sitting at a round table, then they can be shifted four times, but
these four arrangements will be the same, because the sequence of A, B, C, D, is same. But if A,
B, C, D, are sitting in a row, and they are shifted, then the four linear-arrangement will be different.

Hence if we have ‘4’ things, then for each circular-arrangement number of linear-arrangements =4

Similarly, if we have ‘n’ things, then for each circular – agreement, number of linear – arrangement
= n.

Let the total circular arrangement = p

Total number of linear–arrangements = n.p

Total number of linear–arrangements

= n. (number of circular-arrangements)

Or Number of circular-arrangements = 1 (number of linear arrangements)


97 of 131
n = 1( n!)/n

circular permutation = (n-1)!

Proof (b) When clock-wise and anti-clock wise arrangements are not different, then observation
can be made from both sides, and this will be the same. Here two permutations will be
counted as one. So total permutations will be half, hence in this case.

Circular–permutations = (n-1)!/2

Note: Number of circular-permutations of ‘n’ different things taken ‘r’ at a time:-

(a) If clock-wise and anti-clockwise orders are taken as different, then total number of circular-
permutations = nPr /r

(b) If clock-wise and anti-clockwise orders are taken as not different, then total number of
circular – permutation = nP /2r
r

Example: How many necklace of 12 beads each can be made from 18 beads of different colours?

Ans. Here clock-wise and anti-clockwise arrangement s are same.

Hence total number of circular–permutations: 18P


12/2x12

= 18!/(6 x 24)

Restricted – Permutations

(a) Number of permutations of ‘n’ things, taken ‘r’ at a time, when a particular thing is to be
always included in each arrangement

= r n-1 Pr-1

(b) Number of permutations of ‘n’ things, taken ‘r’ at a time, when a particular thing is fixed: = n-1 Pr-
1

(c) Number of permutations of ‘n’ things, taken ‘r’ at a time, when a particular thing is never taken:
= n-1 Pr.

(d) Number of permutations of ‘n’ things, taken ‘r’ at a time, when ‘m’ specified things always come
together = m! x ( n-m+1) !

(e) Number of permutations of ‘n’ things, taken all at a time, when ‘m’ specified things always
come together = n ! - [ m! x (n-m+1)! ]

Example: How many words can be formed with the letters of the word ‘OMEGA’ when:

(i) ‘O’ and ‘A’ occupying end places.


98 of 131
(ii) ‘E’ being always in the middle

(iii) Vowels occupying odd-places

(iv) Vowels being never together.

Ans.

(i) When ‘O’ and ‘A’ occupying end-places

=> M.E.G. (OA)

Here (OA) are fixed, hence M, E, G can be arranged in 3! ways

But (O,A) can be arranged themselves is 2! ways.

=> Total number of words = 3! x 2! = 12 ways.

(ii) When ‘E’ is fixed in the middle

=> O.M.(E), G.A.

Hence four-letter O.M.G.A. can be arranged in 4! i.e 24 ways.

(iii) Three vowels (O,E,A,) can be arranged in the odd-places (1st, 3rd and 5th) = 3! ways.

And two consonants (M,G,) can be arranged in the even-place (2nd, 4th) = 2 ! ways

=> Total number of ways= 3! x 2! = 12 ways.

(iv) Total number of words = 5! = 120!

If all the vowels come together, then we have: (O.E.A.), M,G

These can be arranged in 3! ways.

But (O,E.A.) can be arranged themselves in 3! ways.

=> Number of ways, when vowels come-together = 3! x 3!

= 36 ways

=> Number of ways, when vowels being never-together


99 of 131
= 120-36 = 84 ways.

Number of Combination of ‘n’ different things, taken ‘r’ at a time is given by:-

nC = n! / r ! x (n-r)!
r

Proof: Each combination consists of ‘r’ different things, which can be arranged among themselves
in r! ways.

=> For one combination of ‘r’ different things, number of arrangements = r!

For nCr combination number of arrangements: r nC


r

=> Total number of permutations = r! nC ---------------(1)


r

But number of permutation of ‘n’ different things, taken ‘r’ at a time

= nPr -------(2)

From (1) and (2) :

nP = r! . nCr
r

or n!/(n-r)! = r! . nCr

or nC = n!/r!x(n-r)!
r

Note: nCr = nCn-r

or nC = n!/r!x(n-r)! and nCn-r = n!/(n-r)!x(n-(n-r))!


r

= n!/(n-r)!xr!

Restricted – Combinations

(a) Number of combinations of ‘n’ different things taken ‘r’ at a time, when ‘p’ particular things
are always included = n-pCr-p.

(b) Number of combination of ‘n’ different things, taken ‘r’ at a time, when ‘p’ particular things
are always to be excluded = n-pCr

Example: In how many ways can a cricket-eleven be chosen out of 15 players? if

(i) A particular player is always chosen,


100 of
131
(ii) A particular is never chosen.

Ans:

(i) A particular player is always chosen, it means that 10 players are selected out of the
remaining 14 players.

=. Required number of ways = 14C = 14C4


10

= 14!/4!x19! = 1365

(ii) A particular players is never chosen, it means that 11 players are selected out of 14 players.

=> Required number of ways = 14C


11

= 14!/11!x3! = 364

(iii) Number of ways of selecting zero or more things from ‘n’ different things is given by:- 2n-1

Proof: Number of ways of selecting one thing, out of n-things = nC1

Number of selecting two things, out of n-things =nC2

Number of ways of selecting three things, out of n-things =nC3

Number of ways of selecting ‘n’ things out of ‘n’ things = nCn

=>Total number of ways of selecting one or more things out of n different things

= nC1 + nC2 + nC3 + ------------- + nCn

= (nC0 + nC1 + -----------------nCn) - nC0

= 2n – 1 [ nC0=1]

Example: John has 8 friends. In how many ways can he invite one or more of them to dinner?

Ans. John can select one or more than one of his 8 friends.

=> Required number of ways = 28 – 1= 255.

(iv) Number of ways of selecting zero or more things from ‘n’ identical things is given by :- n+1
101 of
131
Example: In how many ways, can zero or more letters be selected form the letters AAAAA?

Ans. Number of ways of :

Selecting zero 'A's = 1

Selecting one 'A's = 1

Selecting two 'A's =1

Selecting three 'A's = 1

Selecting four 'A's = 1

Selecting five 'A's = 1

=> Required number of ways = 6 [5+1]

(V) Number of ways of selecting one or more things from ‘p’ identical things of one type ‘q’
identical things of another type, ‘r’ identical things of the third type and ‘n’ different things is
given by :-

(p+1) (q+1) (r+1)2n – 1

Example: Find the number of different choices that can be made from 3 apples, 4 bananas and 5
mangoes, if at least one fruit is to be chosen.

Ans:

Number of ways of selecting apples = (3+1) = 4 ways.

Number of ways of selecting bananas = (4+1) = 5 ways.

Number of ways of selecting mangoes = (5+1) = 6 ways.

Total number of ways of selecting fruits = 4 x 5 x 6

But this includes, when no fruits i.e. zero fruits is selected

=> Number of ways of selecting at least one fruit = (4x5x6) -1 = 119

Note :- There was no fruit of a different type, hence here n=o


102 of
131
=> 2n = 20=1

(VI) Number of ways of selecting ‘r’ things from ‘n’ identical things is ‘1’.

Example: In how many ways 5 balls can be selected from ‘12’ identical red balls?

Ans. The balls are identical, total number of ways of selecting 5 balls = 1.

Example: How many numbers of four digits can be formed with digits 1, 2, 3, 4 and 5?

Ans. Here n = 5 [Number of digits]

And r = 4 [ Number of places to be filled-up]

Required number is 5P = 5!/1! = 5 x 4 x 3 x 2 x 1


4

Restricted Permutations
<<(circular permuations) previous | next (restricted combination)>>

Restricted – Permutations

(a) Number of permutations of ‘n’ things, taken ‘r’ at a time, when a particular thing is to be
always included in each arrangement

= r n-1 Pr-1

(b) Number of permutations of ‘n’ things, taken ‘r’ at a time, when a particular thing is fixed: = n-1 Pr-
1

(c) Number of permutations of ‘n’ things, taken ‘r’ at a time, when a particular thing is never taken:
= n-1 Pr.

(d) Number of permutations of ‘n’ things, taken ‘r’ at a time, when ‘m’ specified things always come
together = m! x ( n-m+1) !

(e) Number of permutations of ‘n’ things, taken all at a time, when ‘m’ specified things always
come together = n ! - [ m! x (n-m+1)! ]

Example: How many words can be formed with the letters of the word ‘OMEGA’ when:

(i) ‘O’ and ‘A’ occupying end places.

(ii) ‘E’ being always in the middle


103 of
131
(iii) Vowels occupying odd-places

(iv) Vowels being never together.

Ans.

(i) When ‘O’ and ‘A’ occupying end-places

=> M.E.G. (OA)

Here (OA) are fixed, hence M, E, G can be arranged in 3! ways

But (O,A) can be arranged themselves is 2! ways.

=> Total number of words = 3! x 2! = 12 ways.

(ii) When ‘E’ is fixed in the middle

=> O.M.(E), G.A.

Hence four-letter O.M.G.A. can be arranged in 4! i.e 24 ways.

(iii) Three vowels (O,E,A,) can be arranged in the odd-places (1st, 3rd and 5th) = 3! ways.

And two consonants (M,G,) can be arranged in the even-place (2nd, 4th) = 2 ! ways

=> Total number of ways= 3! x 2! = 12 ways.

(iv) Total number of words = 5! = 120!

If all the vowels come together, then we have: (O.E.A.), M,G

These can be arranged in 3! ways.

But (O,E.A.) can be arranged themselves in 3! ways.

=> Number of ways, when vowels come-together = 3! x 3!

= 36 ways

=> Number of ways, when vowels being never-together

= 120-36 = 84 ways.
104 of
131
Number of Combination of ‘n’ different things, taken ‘r’ at a time is given by:-

nC = n! / r ! x (n-r)!
r

Proof: Each combination consists of ‘r’ different things, which can be arranged among themselves
in r! ways.

=> For one combination of ‘r’ different things, number of arrangements = r!

For nCr combination number of arrangements: r nC


r

=> Total number of permutations = r! nC ---------------(1)


r

But number of permutation of ‘n’ different things, taken ‘r’ at a time

= nPr -------(2)

From (1) and (2) :

nP = r! . nCr
r

or n!/(n-r)! = r! . nCr

or nC = n!/r!x(n-r)!
r

Note: nCr = nCn-r

or nC = n!/r!x(n-r)! and nCn-r = n!/(n-r)!x(n-(n-r))!


r

= n!/(n-r)!xr!

Restricted – Combinations

(a) Number of combinations of ‘n’ different things taken ‘r’ at a time, when ‘p’ particular things
are always included = n-pCr-p.

(b) Number of combination of ‘n’ different things, taken ‘r’ at a time, when ‘p’ particular things
are always to be excluded = n-pCr

Example: In how many ways can a cricket-eleven be chosen out of 15 players? if

(i) A particular player is always chosen,


105 of
131
(ii) A particular is never chosen.

Ans:

(i) A particular player is always chosen, it means that 10 players are selected out of the
remaining 14 players.

=. Required number of ways = 14C = 14C4


10

= 14!/4!x19! = 1365

(ii) A particular players is never chosen, it means that 11 players are selected out of 14 players.

=> Required number of ways = 14C


11

= 14!/11!x3! = 364

(iii) Number of ways of selecting zero or more things from ‘n’ different things is given by:- 2n-1

Proof: Number of ways of selecting one thing, out of n-things = nC1

Number of selecting two things, out of n-things =nC2

Number of ways of selecting three things, out of n-things =nC3

Number of ways of selecting ‘n’ things out of ‘n’ things = nCn

=>Total number of ways of selecting one or more things out of n different things

= nC1 + nC2 + nC3 + ------------- + nCn

= (nC0 + nC1 + -----------------nCn) - nC0

= 2n – 1 [ nC0=1]

Example: John has 8 friends. In how many ways can he invite one or more of them to dinner?

Ans. John can select one or more than one of his 8 friends.

=> Required number of ways = 28 – 1= 255.

(iv) Number of ways of selecting zero or more things from ‘n’ identical things is given by :- n+1
106 of
131
Example: In how many ways, can zero or more letters be selected form the letters AAAAA?

Ans. Number of ways of :

Selecting zero 'A's = 1

Selecting one 'A's = 1

Selecting two 'A's =1

Selecting three 'A's = 1

Selecting four 'A's = 1

Selecting five 'A's = 1

=> Required number of ways = 6 [5+1]

(V) Number of ways of selecting one or more things from ‘p’ identical things of one type ‘q’
identical things of another type, ‘r’ identical things of the third type and ‘n’ different things is
given by :-

(p+1) (q+1) (r+1)2n – 1

Example: Find the number of different choices that can be made from 3 apples, 4 bananas and 5
mangoes, if at least one fruit is to be chosen.

Ans:

Number of ways of selecting apples = (3+1) = 4 ways.

Number of ways of selecting bananas = (4+1) = 5 ways.

Number of ways of selecting mangoes = (5+1) = 6 ways.

Total number of ways of selecting fruits = 4 x 5 x 6

But this includes, when no fruits i.e. zero fruits is selected

=> Number of ways of selecting at least one fruit = (4x5x6) -1 = 119

Note :- There was no fruit of a different type, hence here n=o


107 of
131
=> 2n = 20=1

(VI) Number of ways of selecting ‘r’ things from ‘n’ identical things is ‘1’.

Example: In how many ways 5 balls can be selected from ‘12’ identical red balls?

Ans. The balls are identical, total number of ways of selecting 5 balls = 1.

Example: How many numbers of four digits can be formed with digits 1, 2, 3, 4 and 5?

Ans. Here n = 5 [Number of digits]

And r = 4 [ Number of places to be filled-up]

Required number is 5P = 5!/1! = 5 x 4 x 3 x 2 x 1


4

Set A consists of all positive integers less than 100; Set B consists of 10 integers, the first four of which are
2, 3, 5, and 7. What is the difference between the median of Set A and the range of Set B?

(1) All numbers in Set B are prime numbers;


(2) Each element in Set B is divisible by exactly two factors;

(A) Statement (1) alone is sufficient, but statement (2) alone is not sufficient.
(B) Statement (2) alone is sufficient, but statement (1) alone is not sufficient.
(C) BOTH statements TOGETHER are sufficient, but NEITHER statement ALONE is sufficient.
(D) Each statement ALONE is sufficient.
(E) Statements (1) and (2) TOGETHER are NOT sufficient.

A university needs to select a nine-member committee on extracurricular life, whose members must belong
either to the student government or to the student advisory board. If the student government consists of 10
members, the student advisory board consists of 8 members, and 6 students hold membership in both
organizations, how many different committees are possible?

(A) 72
(B) 110
(C) 220
(D) 720
(E) 1096

Solution: http://www.projectgmat.com/solutions.html
108 of
131
eddeec

Example # 6 From a pack of 52 cards, two are drawn at random. Find the probability that one is a king and
the other is a queen

Solution:

King Queen Other Total


4 4 44 52

Total possible cases if two cards are drawn = C2 =1326


52

4
C1. 4C 1. 44C0 16 8
52 =
P(one king & other queen) = C4 = 1326 663

Example # 7 A box contains 4 red, 4 white and 5 green balls. Three are drawn from the box together. Find
the probability that they may be
i) All of different colours
ii) All of the same colours
Solution:
Red White Green Total
4 4 5 13
Drawn = 3

C1. 4C1. 5C1 4  4  5 80 40


4
i) P(all of different colours) = 13
= = =
C3 286 286 143
ii) P(all of same colours) = P(3 red or 3 white or 3 green)
= (3 red) + P(3white) + P(3 green)
4
C3 . 4C0 . 5C0 4C0 . 4C3 . 5C0 4C0 . 4C0 . 5C3
= 13
+ 13
+ 13
C3 C3 C3
4 4 10 18 9
= + + = =
286 286 286 286 143
Example: Given the following structure formed by attaching four equilateral triangles (taking a large
equilateral triangle and drawing lines from the midpoint of each side to the midpoint of every other side); if
two rods are painted white and the rest are painted black, how many distinct patterns are possible?
109 of
131

The Divisibility Rules

These rules let you test if one number can be evenly divided by another, without having to do too
much calculation!

A number is
If: Example:
divisible by:
128 is
2 The last digit is even (0,2,4,6,8)
129 is not
381 (3+8+1=12, and 12÷3 = 4)
Yes
3 The sum of the digits is divisible by 3
217 (2+1+7=10, and 10÷3 = 3 1/3)
No

1312 is (12÷4=3)
4 The last 2 digits are divisible by 4
7019 is not
175 is
5 The last digit is 0 or 5
809 is not
114 (it is even, and 1+1+4=6 and 6÷3
= 2) Yes
6 The number is divisible by both 2 and 3
308 (it is even, but 3+0+8=11 and
11÷3 = 3 2/3) No
If you double the last digit and subtract it from the
rest of the number and the answer is: 672 (Double 2 is 4, 67-4=63, and
63÷7=9) Yes
• 0, or
7
• divisible by 7
905 (Double 5 is 10, 90-10=80,
(Note: you can apply this rule to that answer again and 80÷7=11 3/7) No
if you want)
109816 (816÷8=102) Yes
8 The last three digits are divisible by 8
216302 (302÷8=37 3/4) No

1629 (1+6+2+9=18, and again,


The sum of the digits is divisible by 9
1+8=9) Yes
9
(Note: you can apply this rule to that answer again
if you want)
2013 (2+0+1+3=6) No

10 The number ends in 0 220 is


110 of
131
221 is not

If you sum every second digit and then 1364 ((3+4) - (1+6) = 0) Yes
subtract all other digits and the answer is:
11 3729 ((7+9) - (3+2) = 11) Yes
• 0, or
• divisible by 11
25176 ((5+7) - (2+1+6) = 3) No

648 (6+4+8=18 and 18÷3=6, also


48÷4=12) Yes
12 The number is divisible by both 3 and 4

916 (9+1+6=16, 16÷3= 5 1/3) No

Exercise # 4
Q. 1 A single throw of two fair dice, find the probability that the product of the numbers is
i) Between 8 and 16
ii) Divisible by 4
iii) Divisible by 4 or 6

Q.2 A card is selected from an ordinary deck of 52 cards. What is the probability of getting
i) A queen
ii) A diamond
iii) Picture card
iv) The king of clubs
Q.3 A bag contains 6 white and 4 black identical balls. A ball is selected at random then what is the
probability that the selected ball is white.

Q.4 If a bag contains 3 white and 2 black balls. If two balls are selected at random, what is the probability
that the
i) Both balls are white
ii) Both are of different colours

Q.5 From a group of 6 men and 8 women 5 people are chosen at random. Find the probability that there are
more men than women.

Q.6 Three applicants are to be selected at random out of 4 boys and 6 girls. What is the probability of
selecting:
i) All girls
ii) At least one boy

Q.7 A bag contains 14 identical balls 4 of are red, 5 black and 5 white. Six balls are drawn from the bag.
Find the probability that:
i) 3 are white
ii) at least two are white

Q.8 Three distinct integers are chosen at random from the first 20 positive integers. Compute the
probability that:
i) Their sum is even
ii) Their product is even
111 of
131
Q.9 Of 12 eggs in a refrigerator 2 are bad. From those 4 eggs are chosen at random to make a cake. What is
the probability that
i) Exactly one is bad
ii) At least one is bad

Q.10 The face cards are removed from a full pack. Out of the remaining 4 are drawn at random. What is
the probability that they belong to different suits.

Q.11 A normal pack of 52 cards contains four aces and 48 other cards. Find the probability that a random
hand of 13 cards contains
i) Four aces
ii) No ace
iii) At least one ace
iv) At least 2 aces

Q.12 If a symmetrical 6 sided die is thrown 4 times. What is the probability that at least one six appears?

Q.13 A store receives 5 red shirts and 10 green shirts. A random sample of 5 shirts is selected. Determine
the probability that:
i) It contains 3 red shirts
ii) It contains 1 red shirt
iii) What percentage of the samples contain 3 green shirts

Example: If there is a party and every person shakes hands with each other once, and there are 45
handshakes, how many people are there at the party?

Solution: If there are n people at the party, then each person will shake hands with n-1 other people. So
with n people each making (n-1) handshakes, it appears at first sight that there are n(n-1) handshakes.

However, each handshake will have been counted twice, i.e. A->B and
B->A, so we must divide by 2.

Total number of handshakes = n(n-1)/2

Now we are given that there were 45 handshakes in all, so we must


solve the equation:

n(n-1)/2 = 45

n(n-1) = 90

n^2 - n - 90 = 0

(n-10)(n+9) = 0 From this n = 10 or -9

Clearly the -9 has no meaning in this question, so we conclude that


n = 10

Number at the party = 10


112 of
131

Disjunctive Probabilities/Conjunctive Probabilities


There are basically two ways in which individual probability values can be linked together mathematically,
and these in turn correspond to two basic kinds of logical linkage. The first is associated with the common-
sense meaning of the word "and," and the second with the common-sense meaning of the word "or." In
formal logic, the relationships denoted by these words are spoken of as conjunction and disjunction,
respectively. Conjunctive probability questions take the general form, "What is the probability of having A
and B occur?" and disjunctive questions take the form, "What is the probability of having A or B occur?"
(Compound probabilities are sometimes described in the language of set theory. In this case, conjunction
will be spoken of as "intersection" and disjunction will be described as "union.")
Disjunctive Probabilities: 'A or B', 'A or B or C', etc.
Conjunctive Probabilities: 'A and B', 'A and B and C', etc.

Axiom: established or accepted principle, self-evident truth


Description: Axiom, in logic and mathematics, a basic principle that is assumed to be true without
proof. The use of axioms in mathematics stems from the ancient Greeks, most probably during the 5th
century BC, and represents the beginnings of pure mathematics as it is known today... An online
encyclopedia

Axiom, in mathematics and logic, general statement accepted without proof as the basis for logically
deducing other statements (theorems). Examples of axioms used widely in mathematics are those related to
equality (e.g., "Two things equal to the same thing are equal to each other"; "If equals are added to equals,
the sums are equal") and those related to operations (e.g., the associative law and the commutative law). A
postulate, like an axiom, is a statement that is accepted without proof; however, it deals with specific subject
matter (e.g., properties of geometrical figures) and thus is not so general as an axiom. It is sometimes said
that an axiom or postulate is a "self-evident" statement, but the truth of the statement need not be evident
and may in some cases even seem to contradict common sense. Moreover, a statement may be an axiom or
postulate in one deductive system and may instead be derived from other statements in another system. A set
of axioms on which a system is based is often wished to be independent; i.e., no one of its members can be
deduced from any combination of the others. (Historically, the development of non-Euclidean geometry
grew out of attempts to prove or disprove the independence of the parallel postulate of Euclid.) The axioms
should also be consistent; i.e., it should not be possible to deduce contradictory statements from them.
Completeness is another property sometimes mentioned in connection with a set of axioms; if the set is
complete, then any true statement within the system described by the axioms may be deduced from them.

Definition: The event A is said to imply the event B if every element of A is also an element of B,
written A  B or A  B . If A implies B and B implies A, then A and B are equal, or A=B

Definition: The union of A and B, (that is, A or B) denoted A B is the set of all elementary events
that are either in A or B (or both). A B = {x S : x Aor x B or x A B}
Definition: The intersection of A and B, (that is, A or B) denoted A B or, more conveniently, as AB,
is the set of all elementary events that are in both A and B. A B = {x S : x A and x B}
Definition: Given two sets A and B, the set difference of A and B is denoted A-B or A\B, and is defined
as
A B = {x S : x A and x B}
Definition: A collection of sets { Ai , i = 1, 2,..., n} {we will permit n=  } is said to be disjoint or mutually
exclusive if
Ai Aj =  ; all I, j such that i  j.

Definition: A collection of sets { Ai , i = 1, 2,..., n} {we will permit n=  } is said to be collectively


exhaustive if exclusive if
113 of
131
n
Ai = S
i =1

Axiomatic Definition of Probability:


This definition, introduced in 1933 by the Russian mathematician Andrei N. Kolmogorov (1903--), is based
on set of axiom, where an axiom is a statement that is assumed to be true. Let S be a sample space with the
sample points E1, E2, E3, . . . , Ei, . . . ,En. To each sample point, we assign a real number, denoted by the
symbol P(Ei), and called the probability of Ei, that must satisfy the following basic axioms:
1. For any event Ei, 0  P( Ei )  1
2. P(S) = 1 for the sure event S
3. If A and B are mutually exclusive events then P( A B ) = P ( A) + P ( B )

Note:
n
Sum of probability of all possible outcomes is equal to one i.e.  P( E ) =1
i =1
i

Laws of Probability:
Prove the following probability laws:
a. If  is the impossible event, then P(  ) = 0
b. P( A) =1 − P( A)
If A and B are any two events defined in a sample space
c. if A  B , then P(A)  P(B)
d. P( A B) = P( B) − P( A B)
e. P( A B) = P( A) − P( A B)
f. P( A B ) = P ( A) + P ( B ) − P ( A B )
a. If A and B are mutually exclusive then
P ( A B ) = P ( A) + P ( B )
g. If A, B and C are any three events in a sample space S, then the probability of at least one of them
occurring is given by
P( A B C ) = P ( A) + P ( B ) + P (C ) − P ( A B ) − P ( A C ) − P ( B C ) + P ( A B C ) In general, the
formula for the k events is

P( A1 A2 ... Ak ) =  P( Ai ) −  P( Ai Aj ) +  P( A i Aj Ak ) − ... + ( −1) k +1 P( A1 A2 ... Ak )


i i j i j k

h. P( A B) 1 − P( A) − P( B)

Proof:
1. If  is the impossible event, then P(  ) = 0
Proof: S  =S
Applying probability, we get
P( S  ) = P(S )
P ( S ) + P ( ) = P ( S ) S and  are mutually exclusive events
 P(  ) = 0 Hence proved.
2. P( A) =1 − P( A)
Proof: A A= S
Applying probability, we get
114 of
131
P( A A) = P(S )
P( A) + P( A) = 1 A and A are mutually exclusive events & P(S) = 1
 P( A ) = 1 – P(A) Hence proved.
3. if A  B , then P(A)  P(B)
Proof: A B  A B= A BB B
From the Venn Diagram A

Shaded area = A B
B = A (Shaded area)

=A A B A B
Now applying probability, we get
P(B)=P{A ( A B )}

= P(A) + P( A B ) A & A B are mutually exclusive

 P(A)  P(B) P( A B )  0
Hence Proved

4. P( A B) = P( B) − P( A B)
Proof:
From the Venn diagram, we can write
B = (Shaded area) (A B)
= ( A B) ( A B)
P(B) = P( ( A B) ( A B) )
= P( A B ) + P( A B)
since ( A B ) and ( A B) are mutually exclusive
 P( A B ) = P(B) – P( A B) Hence proved

5. DYS similar to 4
6. P( A B ) = P ( A) + P ( B ) − P ( A B ) also
If A and B are mutually exclusive then P ( A B ) = P ( A) + P ( B )
Proof:
From the Venn diagram, we can write
A B = A Shaded area
= A ( A B)
P( A B ) = PA) + P( A B) ------- (i)
since ( A B ) & A are mutually exclusive
Now considering
B = (Shaded area) (A B)
= ( A B) ( A B)
P(B) = P( ( A B) ( A B) )
= P( A B ) + P( A B) since ( A B ) and ( A B) are mutually exclusive
 P( A B ) = P(B) – P( A B)
115 of
131
(i)  P( A B ) = PA) + P(B) – P( A B) Hence proved
a. If A & B are mutually exclusive then P( A B) = 0, therefore
P( A B ) = PA) + P(B)
7. If A, B and C are any three events in a sample space S, then the probability of at least one of
them occurring is given by:
P( A B C ) = P ( A) + P ( B ) + P (C ) − P ( A B ) − P ( A C ) − P ( B C ) + P ( A B C ) In general, the
formula for the k events is

P( A1 A2 ... Ak ) =  P( Ai ) −  P( Ai Aj ) +  P( A i Aj Ak ) − ... + ( −1) k +1 P( A1 A2 ... Ak )


i i j i j k

Proof: L.H.S.= P ( A B C ) Let B C = D


P( A B C ) = P( A D)
= P ( A) + P ( D) − P( A D)
= P( A) + P( B C ) − P{ A ( B C )} returning the values of D
= P( A) + P ( B ) + P (C ) − P ( B C ) − { P ( A B ) ( A C )}
= P( A) + P ( B ) + P (C ) − P ( B C ) − { P ( A B ) + P ( A C ) − P ( A B C )}
= P( A) + P( B) + P(C ) − P( B C ) − P( A B) − P( A C ) + P ( A B C )
P ( A B C ) = P ( A) + P ( B ) + P (C ) − P ( B C ) − P ( A B ) − P ( A C ) + P ( A B C )
Hence Proved
This result maybe written as
P( A1 A2 A3 ) = P( A1 ) + P( A2 ) + P( A3 ) − P( A1 A2 ) − P( A1 A3 ) − P( A2 A3 ) + P( A1 A A3 )

=  P( A ) −  P( A
i
i
i j
i Aj ) + P( A1 A2 A3 )

In general, the formula for k events is


P( A1 A2 ... Ak ) =  P( A ) −  P( A
i
i
i j
i Aj ) +  P( A
i  j l
i Aj Ak ) − ... + (−1) k +1 P( A1 A2 ... Ak )

8. P( A B) 1 − P( A) − P( B)
Proof: P ( A B ) = P ( A) + P ( B ) − P ( A B )
P ( A B ) = P ( A) + P( B) − P( A B)
= 1 − P( A) +1 − P( B) − P( A B)
= 1 − P( A) − P( B) + 1− P( A B)
P( A B) 1 − P( A) − P( B) Since 1 − P ( A B )  0
Hence proved

Problem For each of the following, state whether it is always true, always
false, or neither.
(a) (P(A) < P(A ∩ B)
(b) P(A) > P(A ∪ B)
(c) P(A ∩ B) ≤ P(A)
(d) ((A|B) = 1 − P(A ̄|B), where P(B) > 0 and A ̄ is the
complement of A (i.e. the set of outcomes that are not in A).
Solution
(a) always false . Since A ∩ B ⊂ A, we always have P(A) ≥ P(A ∩ B).
More intuitively, it can never be strictly more likely for A and B to
116 of
131
occur than for A to occur.
(b) always false . Since A ⊂ A∪B, we always have P(A) ≤ P(A∪B). As
before, it can never be strictly more likely for A to occur than for A or
B to occur.
(c) always true . We showed this is part (a). In particular, whenever A
and B happen, then A must have happened, so A is more probable
that A and B.
(d) always true . P(A|B) + P(A ̄|B) = P(Ω|B) = 1. Subtracting P(A ̄|B)
from both sides gives P(A|B) = 1 − P(A ̄|B). If I tell you that B
happens, then you can divide this into two disjoint pieces: when B
and A happen and when B happens and A doesn’t happen (that is,
when B and A ̄ happen). And these pieces make up everything, so their
probabilities must sum to 1.

Example: What is the probability of being dealt a bridge hand void in a specified suit from an ordinary
deck of 52 playing cards.

Solution:
The probability of something is the number of ways it works divided by the total number of ways possible.
There are a total of

52!
C1352 = = 635013559600 different possible bridge hands.
13!(52 − 13)!

So we just need to count how many have a void in a specific suit.


Let's say spades. There are 39 non-spade cards, so the number of ways to get a hand with 13 of those 39
cards is
39!
C1339 = = 8122425444.
13!(39 − 13)!

Well, that was easy! So we divide and get approximately

8122425444
= 0.01279 or about a 1.279 percent chance.
635013559600

Conditional Probability:
Conditional probability represents the chance that one event will occur given that a second event has already
occurred.
Conditional probability can be defined as the probability of an event A if we assume that another event B
has occurred. We write a conditional probability of A, given B as PA\B) and compute the conditional
probability by the formula
P( A B)
P( A \ B) = ; Provided that P(B)  0
P( B)
117 of
131
Similarly conditional probability of B, given A can be written as
P( A B)
P( B \ A) = ; Provided that P(A)  0
P( A)
Example: A math teacher gave her class two tests. 25% of the class passed both tests and 42% of the
class passed the first test. What percent of those who passed the first test also passed the second test?
Solution: Let A = the students pass the first test &
B = the students pass the second test
Given that
P(A) = 0.42, P( A B ) = 0.25
P( A B) 0.25
P(B/A) = = = 0.60 = 60%
P( A) 0.42
Example: At National Textile University, the probability that a student takes Technology and Computer
is 0.087. The probability that a student takes Technology is 0.68. What is the probability that a student takes
Computer given that the student is taking Technology?

Solution: Let A = the student takes Technology


B = the student takes Computer
P(B/A) = ?
Given: P( A B ) = 0.087 and P(A) = 0.68
P( A B) 0.087
P(B/A) = = = 0.128 =12.8%
P( A) 0.68
Example: Two coins are tossed. What is the conditional probability that two heads result given that
there is at least one head?

Solution: Sample space when two coins are tossed:


S = {HH, HT, TH, TT}
Let A = Two heads result ={HH}
B = At least one head={HH, HT, TH}
P(A/B) = ?
1
P( A B) 1
P(A/B) = = 4=
P( B) 3 3
4
Example: A man tosses two fair dice. What is the conditional probability that the sum of the two dice
will be 7 given that
i. The sum is odd
ii. The sum is greater than 6
iii. The two dice had the same outcomes

Solution: Sample space when two dice are tossed:


(1,1) (2,1) (3,1) (4,1) (5,1) (6,1)  2 3 4 5 6 7 
(1, 2) (2, 2) 
(3, 2) (4, 2) (5, 2) (6, 2)  3 4 5 6 7 8 
 
(1,3) (2,3) (3,3) (4,3) (5,3) (6,3)  4 5 6 7 8 9 
S= ; S= 
(1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (6, 4)  5 6 7 8 9 10 
(1,5) (2,5) (3,5) (4,5) (5,5) (6,5)  6 7 8 9 10 11
   
(1, 6) (2, 6) (3, 6) (4, 6) (5, 6) (6, 6)  2 3 4 5 6 7
A = the sum of the two dice is 7
i. Let B = the sum is odd

P(A/B) = ?
118 of
131
6
P( A B) 1
P(A/B) = = 36 =
P( B) 18 3
36
ii. Let C = The sum is greater than 6
6
P( A C ) 2
P(A/C) = = 36 =
P(C ) 21 7
36
iii. Let D = The two dice had the same outcomes
0
P( A D)
P(A/D) = = 36 = 0
P( D) 6
36
Example: What is the probability that a randomly selected poker hand contains exactly 3 aces given
that it contains at least 2 aces

Solution: Total cards = 52


Cards in poker hand = 5
52! 52.51.50.49.48
Total possible poker hands = C552 = = = 2598960
5!.47! 5.4.3.2.1
Let A = exactly 3 aces
B = at least 2 aces
A B = 3 aces
P(A/B) = ?
aces others total
4 48 52
n(A) = C3 .C2 = 4512
4 48

n(B) = C24 .C348 + C34 .C248 + C44 .C148 =108336


n( A B ) = C34 .C248 = 4512
P( A B)
P(A/B) = = 0.0416
P( B)
Law of Multiplication:
P( A B)
The relation P(A/B) = is also useful for calculating P( A B ) or simply P(AB)
P( B)
P( A B)
P(A/B) =  P( A B ) = P(B).P(A/B)
P( B)
P( A B)
Also P(B/A) =  P( A B ) = P(A).P(B/A)
P ( A)
Example: Suppose that five good fuses and two defective ones have been mixed up. To find the defective
fuses, we test them one by one, at random, and without replacement. What is the probability that we are
lucky and find both of the defective fuses in the first two tests?

Solution: Let D1 and D2 be the events of finding a defective fuses in the first and second tests, respectively.
We are interested in P( D1 D2 ).
Defectives Good Total
2 5 7
P( D1 D2 ) = P(D1).P(D2/D1)
119 of
131
2 1 1
=  =
7 6 21
This can be generalize:
For three events A, B, C
P( A B C ) = P(A).P(B/A).P(C/ A B )

Example: Suppose that five good and two defective fuses have been mixed up. To find the defective ones,
we test them one by one at random, and without replacement. What is the probability that we find both of
the defective fuses in exactly three tests?

Solution: Let D1, D2 and D3 be the events that the first, second and third fuses tested are defective,
respectively. Let G1, G2 and G3 be the events that the first, second and third fuses tested are good,
respectively. We are interested in the probability of the event (G1 D2 D3 ) ( D1 G2 D3 )
Defectives Good Total
2 5 7
P (G1 D2 D3 ) ( D1 G2 D3 ) = P (G1 D2 D3 ) + P( D1 G2 D3 )
= P(G1).P(D2/G1).P(D3/ G1 D2 )+ P(D1).P(G2/D1).P(D3/ D1 G2 )
5 2 1 2 5 1
=   +    0.095
7 6 5 7 6 5
Exercise:
Q.1 A box contains 15 items, 4 of which are defective and 11 are good. Two items are selected. What is the
probability that the first is good and the second defective? Ans. 0.16

Q.2 Two cards are dealt from a pack of ordinary playing cards. Find the probability that the second card
dealt is a heart. Ans. 0.25

Q.3 Box A contains 5 green and 7 red balls. Box B contains 3 green, 3 red and 6 yellow balls. What is the
probability that the ball drawn is green? Ans. 1/3

Q.4 An urn contains 10 white and 3 black balls. Another urn contains 3 white and 5 black balls. Two balls
are transferred from first urn and placed in the second and then one ball is taken from the latter. What is the
probability that it is a white ball

Q.5 Three Urns of the same appearance are given as follows


Urn Red White Total
A 5 7 12
B 4 3 7
C 3 4 7
An urn is selected at random and a ball is drawn from the urn
i) What is the probability that the ball drawn is red
ii) If the ball drawn is red, what is the probability that it came from urn A
Ans. 0.4722, 0.294

Q.6 In throwing two fair dice, what is the probability of sum 5 if they land on different numbers?

Q.7 If eight defective and 12 nondefective items are inspected one by one, at random, and without
replacement, what is the probability that (a) the first four items inspected are defective; (b) from the first
three items at least two are defective? Ans. 0.0144, 0.344

Independent Events:
Two events A and B in the same sample space S, are said to be independent if the occurrence of one event
does not affect the occurrence or non-occurrence of the other, that is P(A/B) = P(A) and P(B/A) = P(B). It
then follows that
120 of
131
Two events A and B are independent if and only if P( A B ) =P(A).P(B)
or
Two events in which the outcome of the second is not affected by the outcome of the first.

Some other examples of independent events are:

• Landing on heads after tossing a coin AND rolling a 5 on a single 6-sided die.
• Choosing a marble from a jar AND landing on heads after tossing a coin.
• Choosing a 3 from a deck of cards, replacing it, AND then choosing an ace as the
second card.
• Rolling a 4 on a single 6-sided die, AND then rolling a 1 on a second roll of the die.

Important Note:
1. Two mutually exclusive events A and B are independent if and only if P(A).P(B) = 0, which is true
either P(A) = 0 or P(B) = 0
2. If both events A and B have nonzero probabilities and are independent then they can never be
mutually exclusive.
3. Two events that are mutually exclusive are also dependent.
for m.e. events A and B  P( A B ) = 0  P( A B ) = P(A).P(B)
 A & B are dependent
4. Two events that are not mutually exclusive, may either be independent or dependent events.
for not m.e. events A and B  P( A B )  0  either P( A B ) = P(A).P(B) or P( A B ) 
P(A).P(B)
 A & B are either independent or dependent.

Example: A drawer contains 3 red paperclips, 4 green paperclips, and 5 blue paperclips. One paperclip is
taken from the drawer and then replaced. Another paperclip is taken from the drawer. What is the
probability that the first paperclip is red and the second paperclip is blue?

Solution: Because the first paper clip is replaced, the sample space of 12 paperclips does not change from
the first event to the second event. The events are independent.
P(red then blue) = P(red) x P(blue) = 3/12 • 5/12 = 15/144 = 5/48.

Example: A dresser drawer contains one pair of socks of each of the following colors: blue, brown, red,
white and black. Each pair is folded together in matching pairs. You reach into the sock drawer and choose a
pair of socks without looking. The first pair you pull out is red -the wrong color. You replace this pair and
choose another pair. What is the probability that you will choose the red pair of socks twice?
121 of
131
1
Solution: P(red) =
5
1 1 1
P(red and red) = P(red) · P(red) =  =
5 5 25
Example: A card is chosen at random from a deck of 52 cards. It is then replaced and a second card is
chosen. What is the probability of choosing a jack and an eight?

4
Solution: P(jack) =
52
4
P(8) =
52
4 4 1
P(jack and 8) = P(jack) · P(8) =  =
52 52 169

Theorem: If A and B are two independent events in a sample space, then show that
i. A and B are independent
ii. A and B are independent
iii. A and B are independent
Solution:
Since A & B are independent therefore P( A B ) = P(A).P(B)
i. the events A B and A B are m.e. and their union is A, i.e. A = ( A B ) ( A B ).
Therefore P(A) =P( A B ) + P( A B ).
P( A B ) = P(A) − P( A B )
= P(A) − P(A).P(B) [ since A & B are independent]
= P(A)[1 − P(B)]
= P(A)P( B )
Hence A and B are independent
ii. Similarly P(B) =P( A B ) + P( A B ).
P( A B ).= P(B) − P( A B )
= P(B) − P(A).P(B) [ since A & B are independent]
= P(B)[1 − P(A)]
= P(B)P( A )
= P( A )P(B)
Hence A and B are independent
iii. Using the De-Morgan’s Law, A B = A B
P( A B) = P ( A B)
= 1 − P( A B)
= 1 − [P(A) + P(B) − P( A B )]
= 1 − [P(A) + P(B) − P(A).P(B)]
={1 − P(A)} − P(B) +P(A).P(B)
= P( A ) − P(B) {1 − P(A)}
= P( A ) − P(B) P( A )
= P( A ) [1 − P(B) ]
= P( A ) P( B )
Hence A and B are independent

Exercise:
122 of
131
Q.1 Two cards are drawn from a well-shuffled ordinary deck of 52 cards. Find the probability that they
are both aces if the first card is (i) replaced, (ii) not replaced.
Ans. 1/169, 1/221
Q.2 A pair of fair dice is thrown twice. What is the probability of getting totals of 5 and 11?
Ans. 1/81
Q.3 The probability that a man will be alive in 25 years is 3/5, and the probability that his wife will be
alive in 25 years is 2/3. Find the probability that (i) both will be alive, (ii) only the man will be alive, (iii)
only the wife will be alive, (iv) at least one will be alive and (v) neither will be alive in 25 years.
Ans. 2/5, 1/5, 4/15, 13/15, 2/15

Q.4 Three missiles are fired at a target. If the probabilities of hitting the target are 0.4, 0.5 and 0.6,
respectively, and if missiles are fired independently, what is the probability
i. That all the missiles hit the target?
ii. That at least one of the three hits the target?
iii. That exactly one hits the target?
iv. That exactly 2 hit the target? Ans. 0.12, 0.88, 0.38, 0.38

Q.5 A committee of three A, Band C is to make a decision on the basis of majority vote. What is the
probability of a wrong decision by the committee if the probabilities of a wrong decision by each
member are 0.05, 0.04, and 0.10 respectively?

Multiple Choice Questions:


Probability - General

1. The probability that the Red River will flood in any given year has been estimated from 200 years of
historical data to be one in four. This means:
a. The Red River will flood every four year.
b. In the next 100 years, the Red River will flood exactly 25 times.
c. In the last 100 years, the Red River flooded exactly 25 times.
d. In the next 100 years, the Red River will flood about 25 times.
e. In the next 100 years, it is very likely that the Red River will flood exactly 25 times.
2. The chances that you will ticketed for illegal parking on campus are about 1/3. During the last nine
days, you have illegally parked every day and have NOT been ticketed (you lucky person)! Today, on
the 10th day, you again decide to park illegally. The chances that you will be caught are:
a. greater than 1/3 since you were not caught in the last nine days.
b. less than 1/3 since you were not caught in the last nine days.
c. still equal to 1/3 since the last nine days do not affect the probability.
d. equal to 1/10 since you were not caught in the last nine days.
e. equal to 9/10 since you were not caught in the last nine days.
3. The chance that a person will contract AIDS after a sexual contact with an infected partner has been
estimated to be 1/4. This means:
a. A person will be infected after exactly 4 sexual contacts with infected partners.
b. Of 1000 people having sexual contacts with infected partners, exactly 250 will become
infected.
c. Of 200 people having sexual contacts with infected partners, about 50 will become infected.
d. In exactly 25% of all sexual contacts with infected partners, the infection will spread.
e. Of 20 people having sexual contacts with infected partners, it is very likely that exactly 5
people will become infected.
4. A random variable Y has the following distribution:

Y | -1 0 1 2
123 of
131
P(Y)| 3C 2C 0.4 0.1

The value of the constant C is:

a. 0.10
b. 0.15
c. 0.20
d. 0.25
e. 0.75
5. A random variable X has a probability distribution as follows:

r | 0 1 2 3
P(R=r) | 2k 3k 13k 2k

Then the probability that Pr(X < 2.0) is equal to

a. .90
b. .25
c. .65
d. .15
e. 1.00
6. Suppose that the allele for tallness (T) is dominant over shortness (t); that for Yellow (Y) is dominant
over green (y); and that for roundness (W) is dominant over wrinkled(w). Suppose we cross two plants
with genotypes TTYyWw and TtYyWw. The probability of a Tall, Yellow, Round plant is:
a. 9/16
b. 3/32
c. 1/16
d. 9/32
e. 3/16

Answers: 1. d 2.c 3.c 4.a 5.b 6.a

PRACTICE PROBLEMS--PERMUTATIONS AND COMBINATIONS


1. Permutations/Combinations--Executive Decision

1. A company president is deciding how to fill three vice-presidencies in the company: VP-Marketing,
VP-Finance, and VP-Production. Twelve executives are eligible and qualified for promotion, and
each could fill any of the three positions. In how many ways can the positions be filled?
See if you can do the problem two (slightly) different ways, which might actually represent two
different methods of thinking that the decision-maker might use.

2. Imaginary Lottery Game

The Lottery Commission is considering a new game in which five balls would be withdrawn
from a box containing 10 balls, numbered 0 to 9. The five balls would come out of the box at nearly
the same time, as they do in the current Lotto game, in which six balls come out of a box into a tube
at nearly the same time.

In this new game, however, the winning ticket must have the five lucky numbers in the same
order as they came out of the box.
124 of
131

What is the chance of winning with a single five-number ticket?

3. Permutations/Combinations--Dinner Party

a. "I forgot to buy vegetables for our dinner party tonight. Will you go back
to the store and get three bags of frozen vegetables?" If the store has 10 bags each of 20 different frozen
vegetables, in how many ways, with respect to kinds of vegetables, can the errand be performed?

b. "I forgot to buy vegetables for our dinner party tonight. Will you go back
to the store and get three bags of different frozen vegetables?" If the store has 10 bags each of 20
different frozen vegetables, in how many ways, with respect to kinds of vegetables, can the errand be
performed?

c. In setting the dinner table, including place cards, for eight people, how
many seating arrangements are possible?

d. What if it is a round table and it really does not matter exactly where
people sit, but is does matter who is sitting next to whom. How many seating arrangements are possible
with 8 people?

4. Card Game

a. If you are dealt a hand of five cards from a standard deck of 52 cards, what
is the probability that your hand contains four aces? (This can be done using a permutation/combination
computation, but there is another way.)

b. What is the probability that your hand contains a straight flush--a sequence
of five consecutive cards, all of the same suit? (A-K-Q-J-10 is the highest sequence in each suit, and 6-
5-4-3-2 is the lowest.)
125 of
131
SOLUTIONS

1. Permutations/Combinations--Executive Decision

Duplicates are not possible.

Method One: The decision-maker first selects three people from among the twelve, not yet
thinking about their job assignments (order not important). This can be done C(12,3) = 220 ways.
Then the decision-maker assigns the three chosen people to the three jobs (order important). This
can be done P(3,3) = 6 ways. So the total number of ways is 220 x 6 = 1,320.

Method Two: The decision maker selects a person and immediately assigns him/her to one of the three
positions (order important). This is repeated two more times. The number of ways is P(12,3) = 1,320.

Method Three--formula-free sequential method: 12 people can be selected for the first position, 11 for
the second, and 10 for the third. 12 x 11 x 10 = 1,320.

2. Imaginary Lottery Game

Order is important, duplicates are not possible. P(10,5) = 30,240.

Formula-free sequential method: the first number has 10 possibilities, the second number has 9, the third
number has 8, the fourth number has 7, and the fifth number has 6. 10 x 9 x 8 x 7 x 6 = 30,240.

So the probability of winning is 1 / 30,240 = 0.000033069.

3. Permutations/Combinations--Dinner Party

a. Order is not important, duplicates are possible. C-bar (20,3) = 1,540.

Note: As long as there are three or more bags of each vegetable, the actual number of bags of each
vegetable is not relevant. The question asked how many ways "with respect to kinds of vegetables"
can the errand be performed. Therefore "n" is the number of kinds of vegetables, not the total
number of bags.

b. Order is not important, duplicates are not possible. C(20,3) = 1,140.

c. Order is important, duplicates are not possible. P(8,8) = 40,320.

Formula-free: going around the table, there are 8 choices for the first seat, 7 choices for the second
seat, etc. 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 = 40,320.

d. The first person can sit anywhere. There are then 7 choices for the person
to the first person's right, 6 choices for the person to that person's right, etc. 7 x 6 x 5 x 4 x 3 x 2 x 1 =
5,040. This is also equal to P(7,7).
126 of
131
4. Card Game

a. Method One: There are C(52,5) = 2,598,960 possible hands. 48 of these contain 4 aces (4 aces and
the king of spades, 4 aces and the queen of spades, 4 aces and the jack of spades, etc.).

48 / 2,598,960 = 0.000018469, or one chance in 54,145.

Or, the "sequential approach." As you pick up your cards, what is the probability of picking up four
aces, followed by a non-ace? By the multiplicative rule, this is:

4/52 x 3/51 x 2/50 x 1/49 x 48/48 = 0.0000036937852.

But you do not have to get the four aces first and the non-ace last. You could get the non-ace fourth:

4/52 x 3/51 x 2/50 x 48/49 x 1/48 = 0.0000036937852.

Or you could get the non-ace third:

4/52 x 3/51 x 48/50 x 2/49 x 1/48 = 0.0000036937852.

Or you could get the non-ace second:

4/52 x 48/51 x 3/50 x 2/49 x 1/48 = 0.0000036937852.

Or you could get the non-ace first, followed by the four aces:

48/52 x 4/51 x 3/50 x 2/49 x 1/48 = 0.0000036937852.

By the addition rule, the union of all of the "or's" above would be the sum of the five individual
probabilities, which is 0.000018469, in exact agreement with the first method.

b. There are C(52,5) = 2,598,960 possible hands. Among these, how many are "straight flushes?"
There are nine straight-flushes in each suit: A-K-Q-J-10, K-Q-J-10-9, Q-J-10-9-8, J-10-9-8-7, 10-9-
8-7-6, 9-8-7-6-5, 8-7-6-5-4-, 7-6-5-4-3, and 6-5-4-3-2. There are four suits, so there are 36 possible
straight flushes.

36 / 2,598,960 = 0.000013852 or one chance in 72,193.

Note that this is rarer than four-of-a-kind, even four aces. So, in poker, any straight flush will beat
any four-of-a-kind.

Exercise –Mutually and not Mutually Exclusive Events


a. In a group of 20 adults, 4 out of the 7 women and 2 out of the 13 men wear glasses. What is
the probability that a person chosen at random from the group is a woman or some one who
wear glasses? Ans. 9/20
Q. One integer is chosen at random from the numbers 1, 2, 3, …, 50. What is the
probability that the chosen number is divisible by 6 or by 8. Ans. 6/26
a. Let A and B be events with P( A B) = 3 , P( A) = 2 , P( A B) = 1
4 3 4
Then find i) P(A), P(B), (

Textile Related Examples:


1. The following are the lengths (in cm) of a sample of six garment blanks chosen at random from a
large batch of similar blanks:
127 of
131

54.5 53.0 55.7 51.8 54.2 52.4


Find the mean and variance
2. The data shown in the table were collected by counting the number of end-breaks per hour on one
side of a certain spinning frame during 100 consecutive hours. As they appear in the table

No. of end-breaks per hour on one side of a certain spinning frame


0 0 3 2 1 1 1 4 1 0
1 2 2 1 1 0 0 1 0 0
1 1 0 0 0 1 1 1 1 1
2 4 1 0 1 0 1 1 1 1
1 1 0 0 1 1 3 1 0 0
0 1 5 1 2 1 1 0 2 0
0 0 0 3 1 2 0 2 1 0
1 1 0 1 1 0 1 3 1 0
2 0 1 2 0 0 1 0 1 2
1 2 2 0 0 1 2 1 0 1
Find frequency distribution, bar-chart, frequency polygon for the data

3. The following data are the results of an extensive series of standard linear density tests carried out on
a large delivery of worsted yarn. Results of linear-density test (Tex) on a large consignment of
worsted yarn.
Results of Linear-density tests (tex)
On a Large Consignment of Worsted Yarn

31.3 31.3 31.5 31.3 31.3 32.0 31.9 31.8 33.1 30.6
30.2 31.2 29.6 32.7 32.7 31.8 30.2 31.8 30.5 30.5
31.4 30.6 31.4 31.5 30.1 30.3 31.2 30.7 30.9 31.9
30.9 30.1 32.4 32.8 31.6 31.8 31.7 29.5 30.7 31.6
30.6 31.4 31.0 31.0 30.5 30.5 31.0 29.1 30.2 31.1
29.8 30.6 32.2 30.4 32.1 31.7 31.5 31.7 31.4 30.4
31.5 30.4 31.3 31.9 31.1 31.9 32.0 31.6 30.3 32.1
31.0 31.4 33.1 30.6 31.2 32.2 32.6 31.9 32.2 31.3
30.7 30.9 30.7 32.3 32.7 31.3 32.5 31.3 31.3 31.5
31.9 31.0 31.0 32.3 31.5 29.8 32.4 31.7 31.6 32.0
30.6 30.8 31.1 32.1 29.9 31.6 30.6 30.6 31.1 30.0
32.4 31.1 29.7 31.2 30.6 31.5 31.0 31.1 31.2 31.6
31.1 30.8 30.9 31.6 30.6 30.4 30.9 29.7 30.2 30.1
30.3 29.4 30.0 30.0 32.8 31.9 30.7 31.7 31.8 31.5
31.0 30.8 32.1 30.8 31.1 32.5 31.7 30.5 30.5 30.5
31.1 31.2 31.4 29.5 31.5 31.2 31.4 30.1 32.2 30.5
31.2 30.9 30.6 31.2 30.3 30.6 31.8 31.4 30.6 31.3
30.9 31.2 30.2 29.6 31.2 29.9 30.5 31.1 30.8 31.8
31.4 29.3 31.2 31.1 31.1 31.0 31.0 30.7 31.3 30.7
31.0 30.2

4. The following are the results of counting the number of wrap breakages during the weaving of 92
standard lengths of a certain kind of cloth.
Construct a frequency distribution for these data, calculate the relative frequencies, and draw a
frequency polygon.
128 of
131

No. of wrap breakages


2 3 0 1 0 1 2 1 1 2
1 0 1 2 3 4 3 4 2 3
4 1 3 4 1 2 0 3 0 1
3 1 2 2 5 0 3 2 3 1
1 2 3 0 2 1 1 3 4 2
0 2 1 3 1 2 2 1 3 3
2 4 0 1 2 1 3 0 5 2
1 3 2 3 0 3 5 4 1 0
2 1 3 2 1 0 1 0 4 1
0 1
5. The data below are the results of breaking strength tests (in gf) on a yarn

Breaking strength (in gf)


491 507 501 512 501 508 508 513 511 496 505
507 491 503 503 499 499 501 493 490 498 511
514 493 494 507 499 499 501 491 499 513 485
488 496 512 508 508 508 523 496 523 502 499
498 504 510 500 501 501 498 513 505 508 514
493 501 498 509 524 524 497 506 517 490 516
506 480 505 482 498 498 505 513 502 483 504
526 501 513 519 490 490 497 508 491 499 506
511 501 487 512 511 511 500 478 510 502 490
512 516 512 497 506 506 509 493 516 475 503
Construct a frequency table and draw a histogram.

6. Rayon yarn is wound on metal spools that are made to a specified of 226 g with a tolerance range of
3 g. A random sample of 100 spools were found to have the following masses (g)

206 210 231 235 225 225 223 210 212 218
227 211 208 230 228 223 230 228 208 226
209 228 210 208 206 210 227 215 213 210
218 208 226 227 207 207 226 226 232 226
227 225 228 227 209 225 234 209 223 210
233 217 227 210 228 210 225 229 210 231
228 226 208 224 216 210 217 227 226 219
207 208 225 212 210 224 208 209 223 230
232 230 209 220 223 206 206 226 209 222
209 227 211 218 227 207 209 226 229 225
Construct a frequency distribution wit a class interval of 3 g and draw the corresponding histogram.
Comment on the performance of the spool-making machines and on the size of the tolerance range.
7. Yarn is wound a large spools, which are run at high speeds. At times the spools run erratically, and
when this occurs the operation is stopped and the spools are doffed short of their intended load. The
following data refer to a sample of 147 spools, in which the frequency of doffing is given for the
percentage of the intended load at doffing, arranged in classes as shown
Class % 0-5 5-15 15-25 25-35 35-45 45-55 55-65 65-75 75-85 85-95 95-100

Frequency 45 4 10 5 2 11 10 6 2 1 51

Construct the appropriate histogram and comment on data.


129 of
131

Permutation and Combination


Exercise
1. How many different arrangements of 9 letters can be formed from the letters in the word
"SEVENTEEN"
Answer:
"SEVENTEEN"
(number of letters)! 9!
= = 7560
(number of repeating letters)! 4!2!

2.
Joslyn's dance group consists of 8 girls and 5 boys. 4 people are to be chosen at random to perform at
the next recital. if the group of performers must include at least 2 girls, in how many ways can the
performers be chosen?:
Answer:

Girls Boys Total


8 4 12
To be Chosen=4
No. of ways that the group of performers must include at least 2 girls
 8  4   8  4   8  4 
=    +    +   
 2  2   3 1   4  0 
87 43 87 6 87 65
=  + 4+ 1
2 1 2 1 3  2 1 4  3  2 1
=

C(8,2)C(5,2)+C(8,3)C(5,1)+C(8,4)C(5,0)
(number of letters)! 9!
(28)(10)+(56)(5)+(70)(1) = = 7560
(number of repeating letters)! 4!2!
280 + 280 + 70
630
130 of
131

3. there are 12 people on kari's soccer team. individual pictures are taken and 8 pictures are selected to be hung
in a row. how many different arrangements of pictures are possible if kari's picture must be among those
hung?: there are 12 people on kari's soccer team. individual pictures are taken and 8 pictures are selected to
be hung in a row. how many different arrangements of pictures are possible if kari's picture must be among
those hung?

Answer:
12*8
96
so there are 96 different ways.

4. Six strangers arrive at a business seminar and each person shakes hands with every other person. How
many handshakes are there?:
Answer:
Each man shakes hands with 5 other men, for a total of 30 handshakes.
However, if man A shakes hands with man B, it is the same as B with A. Therefore we divide our total by two and get
the answer...
15 handshakes
Another way is 5+4+3+2+1 = 15...

5. how many different 4 digit PINs can be created using the numbers 2, 3, 0, 6, 7, and 8 with repitition. i know
with repitition it would be 6 x 6 x 6 x 6 = 1296
HOWEVER for a pin number like 2222 rearranging the numbers is useless because you don't get a new
number. This is the same for a number like 2223, or 2233 <- rearranging is pointless.
So, please help me fiigure out how to apply these restrictions with all the numbers: how many different 4 digit
PINs can be created using the numbers 2, 3, 0, 6, 7, and 8 with repitition.
i know with repitition it would be 6 x 6 x 6 x 6 = 1296
HOWEVER for a pin number like 2222 rearranging the numbers is useless because you don't get a new
number. This is the same for a number like 2223, or 2233 <- rearranging is pointless.
So, please help me fiigure out how to apply these restrictions with all the numbers

You are saying with repition in your question ....


that would be: 6 * 6 * 6 * 6 = 1296
If with limitations of different numbers ....
6 * 5 * 4 * 3 = 360

6. There are twenty tiles numbered 1-20. What is the propability of drawing two even numbered tiles, without
replacing them. Express awnser in permutation form.: There are twenty tiles numbered 1-20. What is the
propability of drawing two even numbered tiles, without replacing them. Express awnser in permutation form.

P(10,2)/P(20,2) there are 10 even numbers and you grab only 2 tiles; there are a total of 20 and you grab only 2
90/380
131 of
131
9/38

You might also like