Chapter - 1: Introduction To Probability and Statistics For Civil Engineering

Introduction to probability and Statistics for Civil Engineering
Chapter - 1
1.1 Definition of Statistics and Classification of Statistics
A. Definition of Statistics
Statistics can be defined in two senses: plural (as Statistical Data) and singular (as Statistical
Methods).
Plural sense: Statistics are collection of facts (figures). This meaning of the word is widely used
when reference is made to facts and figures on sales, employment or unemployment, accident,
weather, death, education, etc. E.g.: Sales Statistics, Labor Statistics, Employment Statistics, etc.
In this sense the word Statistics serves simply as data. But not all numerical data are statistics.
Singular sense: Statistics is the science that deals with the methods of data collection,
organization, presentation, analysis and interpretation of data. It refers the subject area that is
concerned with extracting relevant information from available data with the aim to make sound
decisions. According to this meaning, statistics is concerned with the development and
application of methods and techniques for collecting, organizing, presenting, analyzing and
interpreting statistical data.
B. Classification of Statistics
Based on the scope of the decision, statistics can be classified into two; Descriptive and
Inferential Statistics.
Descriptive Statistics refers to the procedures used to organize and summarize masses of data.
It is concerned with describing or summarizing the most important features of the data. It deals
only the characteristics of the collected data without going beyond it. That is, this part deals with
only describing the data collected without going any further: that is without attempting to
infer(conclude) anything that goes beyond the data themselves. The methodology of descriptive
statistics includes the methods of organizing (classification, tabulation, Frequency Distributions)
and presenting (Graphical and Diagrammatic Presentation) data and calculations of certain
indicators of data like Measures of Central Tendency and Measures of Dispersion (Variation)
which summarize some important features of the data.
Inferential (Inductive) Statistics includes the methods used to find out something about a
population, based on the sample. It is concerned with drawing statistically valid conclusions
about the characteristics of the population based on information obtained from sample. In this
form of statistical analysis, descriptive statistics is linked with probability theory in order to
generalize the results of the sample to the population. Performing hypothesis testing, determining
relationships between variables and making predictions are also inferential statistics.
Ex: Classify the following statements as Descriptive and Inferential Statistics
a. The average age of the students in this class is 21 years.

b. At least 5% of the killings reported last year in city X were due to tourists.
c. Of the students enrolled in Debre Markos University in this year 74% are male and 26%
are female.
d. The chance of winning the Ethiopian National Lottery in any day is 1 out of 167000.
1.1.2 Stages in Statistical investigation
According to the singular sense definition of statistics, a statistical study (statistical

investigation) involves five stages: Collection of Data, Organization of Data, Presentation of
Data, Analysis of Data and Interpretation of Data.
1. Collection of Data: This is the first stage in any statistical investigation and involves the
process of obtaining (gathering) a set of related measurements or counts to meet
predetermined objectives. The data collected may be primary data (data collected directly by
the investigator) or it may be secondary data (data obtained from intermediate sources such
as newspaper s, journals, official records, etc).
2. Organization of Data: It is usually not possible to derive any conclusion about the main
features of the data from direct inspection of the observations. The second purpose of
statistics is describing the properties of the data in a summary form. This stage of statistical
investigation helps to have a clear understanding of the information gathered and includes
editing (correcting), classifying and tabulating the collected data in a systematic manner.
Thus the first step in the organization of data is editing. It means correcting (adjusting)
omissions, inconsistencies, irrelevant answers and wrong computations in the collected data.
The second step of the organization of data is classification that is arranging the collected
data according to some common characteristics. The last step of the organization of data is
presenting the classified data in tabular form, using rows and columns (tabulation).
3. Presenting of Data: The purpose of data presentation is to have an overview of what the data
actually looks like, and to facilitate statistical analysis. Data presentation can be done using
Graphs and Diagrams which have great memorizing effect and facilitates comparison.
4. Analysis of Data: The analysis of data is the extraction of summarized and comprehensive
numerical description in order to reach conclusions or provide answers to a problem. The
problem may require simple or sophisticated mathematical expressions.
5. Interpretation of Data: This is the last stage of statistical investigation. Interpretation
involves drawing conclusions from the data collected and analyzed in order to make decision.
1.1.3 Definition of some Statistical terms

Sampling: - The process of selecting a sample from the population is called sampling.
Population: A population is a totality of things, objects, peoples, etc about which information
is being
Collected.. It is the totality of observations with which the researcher is concerned.
Sample: A sample is a subset or part of a population selected to draw conclusions about the
population.
Census survey: -It is the process of examining the entire population. It is the total count of the
population.
Parameter:- It is a descriptive measure (value) computed from the population. It is the
population measurement used to describe the population. Example: population mean and
population standard deviation
Statistic: - It is a measure used to describe the sample. It is a value computed from the sample.
Sampling frame:-A list of people, items or units from which the sample is taken.
Data:- Data as a collection of related facts and figures from which conclusions may be drawn.
Variable: A certain characteristic which changes from object to object and time to time.
Sample size: The number of elements or observation to be included in the sample.
1.1.4 Applications, Uses and Limitations of statistics
Applications of Statistics in Engineers

In this modern time, statistical information plays a very important role in a wide range of fields.
Today, statistics is applied in almost all fields of human endeavor.
 In Scientific Research: Statistics is used as a tool in a scientific research. Statistical

formulas and concepts are applied on a data which are results of an experiment.
 In Quality Control: Statistical methods help to check whether a product satisfies a given
standard.
 For Decision Making: statistics helps to enhance the power of decision making in the
face of uncertainty by providing sufficient information.
 Reliability Engineering : is the study of the ability of a system or component to perform
its required functions under stated conditions for a specified period of time
 The application of probability theory, which includes mathematical tools for dealing with
large populations, to the field of mechanics, which is concerned with the motion of
particles or objects when subjected to a force.
 The field of statistics deals with the collection, presentation, analysis, and use of data to:
Such as Make decisions, Solve problems and Design products and processes. It is the
science of learning information from data.
Uses of Statistics in Engineers
1. Design of Experiments (DOE) uses statistical techniques to test and construct models of
engineering components and systems.
2. Quality control and process control use statistics as a tool to manage conformance to
specifications of manufacturing processes and their products.
3. Time and methods engineering uses statistics to study repetitive operations in manufacturing
in order to set standards and find optimum (in some sense) manufacturing procedures.
4. Reliability engineering uses statistics to measures the ability of a system to perform for its
intended function (and time) and has tools for improving performance.
5. Probabilistic design uses statistics in the use of probability in product and system design.
6. Every structural design, every safety factor, every hydrological analysis, every mechanical
analysis, everything, even the materials used are based on statistics. The results gotten from
the analysis are projected to other conditions, and the probability of them to interact together
(for example, earthquake, wind and max load. Or having the highest flow and rain)
7. Condenses and summarizes masses of data and presents facts in numerical and definite form
8. Facilitates comparison: statistical devises such as averages, percentages, ratios, etc. are used
for this purpose.
9. Formulating and testing hypothesis
10. Forecasting: Statistical methods help in studying past data and predicting future trends.
Limitations of Statistics
 It cannot deal with a single observation; rather it deals aggregate of facts.
 Statistical methods are not applicable to qualitative character i.e. it deals with quantitative
characteristics.
 Statistical results are true on average; i.e. for the majority of case. Laws of statistics are not
universally true like the laws of physics, chemistry and mathematics.
 Statistics are liable to be misused or misinterpreted. This may be due to incomplete

information, inadequate and faulty procedures during data collection and sample selection
and mainly due to ignorance (lack of knowledge).
1.1.5 Types of variables and Measurement Scales

Variable :It is a characteristics or an attribute that can assume different values.
E.g.: Height, Family size, Gender
Based on the values that variables assume, variables can be classified as
1. Qualitative variables: do not assume numeric values.
E.g.: Gender
2. Quantitative variables: assume numeric values. These variables are numeric in
nature.
E.g.: Height, Family size
 Discrete variable: takes whole number values and consists of distinct
recognizable individual elements that can be counted. It is a variable that
assumes a finite or countable number of possible values. These values are
obtained by counting (0, 1, 2, … ,).
E.g.: Family size, Number of children in a family, number of cars at the
traffic light
 Continuous variable: takes any value including decimals. Such a variable
can theoretically assume an infinite number of possible values. These values
are obtained by measuring.
E.g.: Height, Weight, Time, and Temperature
Generally the values of a variable can be obtained either by counting for discrete variables, by
measuring for continuous variables or by making categories for qualitative variables.
Ex: Classify each of the following as Qualitative and Quantitative and if it is quantitative classify
as Discrete and Continuous.
a. Color of automobiles in a dealer’s show room.

b. Number of seats in a movie theater.
c. Classification of patients based on nursing care needed (complete, partial or seafarer)
d. Number of tomatoes on each plant on a field.
e. Weight of newly born babies.
Scales of Measurements/Levels of Measurements
Consider the following two cases.

 Mr. A wears 5 when he plays foot ball.
 Mr. B wears 6 when he plays foot ball.
Who plays better?

What is the average shirt number?
 Mr. A scored 5 in stat quiz.

 Mr. B scored 6 in Stat quiz.
Who did better?
What is the average score?
Based on the number on the shirts it is not possible to judge, whether Mr. B plays better. But by
using the test score, it is possible to judge that Mr. B did better in the exam. Also it not possible
to find the average shirt numbers (or the average shirt number is nothing) because the numbers
on the shirts are simply codes but it is possible to obtain the average test score.
Therefore scales of measurement
 Shows the information contained in the value of a variable.

 Shows also that what mathematical operations and what statistical analysis are
permissible to be done on the values of the variable.
 Nominal Scales of variables are those qualitative variables which show category of
individuals. They reflect classification in to categories (name of groups) where there is no
particular order or qualitative difference to the labels. Numbers may be assigned to the
variables simply for coding purposes. It is not possible to compare individual basing on the
numbers assigned to them. The only mathematical operation permissible on these variables is
counting.
These variables
 Have mutually exclusive (non-overlapping) and exhaustive categories.
 No ranking or order between (among) the values of the variable.
Example: Gender, Religion, ID No, Ethnicity, Color
 Ordinal Scales of variables are also those qualitative variables whose values can be ordered
and ranked. Ranking and counting are the only mathematical operations to be done on the
values of the variables. But there is no precise difference between the values (categories) of
the variable.
Eg: Academic qualifications (B.Sc., M.Sc., Ph.D), Strength (very weak, week, strong, very
strong), Health status (very sick, sick, cured)
 Interval Scales of variables are those quantitative variables when the value of the variables is
zero it does not show absence of the characteristics i.e. there is no true zero. Zero indicates
low than empty. There is a precise difference between the units of measurement (levels)
Eg: temperature, 00c does not mean there is no temperature but to say it is too cold.
 Ratio Scales of variables are those quantitative variables when the values of the variables are
zero it shows absence of the characteristics. Zero indicates absence of the characteristics.
Eg: Height, Weight, Income, Amount of yield, Expenditure, Consumption.
All mathematical operations are allowed to be operated on the values of the variables.
1.2 Methods of data collection and presentation

1.2.1 Methods of data collection
We have already explained what it means by statistical data. Numerical facts or
measurements obtained in the course of enquiry in to a phenomenon, marked by uncertainty,
constitute statistical data. The statistical data may be already available or may have to be
collected by an investigator or an agency. Data termed primary when the reference is to data
collected for the first time by the investigator and is termed secondary when the data are
taken from records or data already available.
Based on the source, data can be classified into two: Primary Data and Secondary Data.
Method of primary data collection
In primary data collection, you collect the data yourself using methods such as interviews,
observations, laboratory experiments and questionnaires. The key point here is that the data you
collect is unique to you and your research and, until you publish, no one else has access to it.
There are many methods of collecting primary data and the main methods include:
Questionnaire: It is a popular means of collecting data, but is difficult to design and often
require many rewrites before an acceptable questionnaire is produced.
Interviewing is a technique that is primarily used to gain an understanding of the underlying

reasons and motivations for people’s attitudes, preferences or behavior. Interviews can be
undertaken on a personal one-to-one basis or in a group. They can be conducted at work, at
home, in the street or in a shopping center, or some other agreed location.
Observation: It involves recording the behavioral patterns of people, objects and events in a
systematic manner.
Diaries: A diary is a way of gathering information about the way individuals spend their time
on professional activities. They are not about records of engagements or personal journals of
thought! Diaries can record either quantitative or qualitative data, and in management
research can provide information about work patterns and activities.
Laboratory experiment: Conducting laboratory experiments on fields of chemical, biological
sciences and so on.
Methods of secondary data collection

Secondary data analysis can be literally defined as second-hand analysis and is the analysis of
data or information that was either gathered by someone else (e.g., researchers, institutions, other
NGOs, etc.) or for some other purpose than the one currently being considered, or often a
combination of the two.
Some of the sources of secondary data are government document, official statistics, technical
report, scholarly journals, trade journals, review articles, reference books, research institutes,
universities, hospitals, libraries, library search engines, computerized data base and world wide
web ( ).
1.2.3 Methods of Data Presentation
So far you know how to collect data. So what do we do with the collected data next? Now you
have to present the data you have collected so that they can be of use. Thus the collected data
also known as raw data are always in an unorganized form and need to be organized and
presented in a meaningful and readily comprehensible form in order to facilitate further
statistical analysis. This chapter introduces tabular and graphical methods commonly used to
summarize both qualitative and quantitative data. Tabular and graphical summaries of data can
be obtained in annual reports, newspaper articles and research studies. Everyone is exposed to
these types of presentations, so it is important to understand how they are prepared and how they
will be interpreted. Modern statistical software packages provide extensive capabilities for
summarizing data and preparing graphical presentations.
Class: is a description of a group of similar numbers in a data set.
Frequency: is the number of times a variable value is repeated.
Class frequency: the number of observations belonging to a certain class.
There are three types of frequency distributions; categorical, ungrouped (discrete or frequency
array) and grouped (continuous) frequency distributions.
1.Categorical FD:-a FD in which the data is qualitative i.e. either nominal or ordinal. Each
category of the variable represents a single class and the number of times each category repeats
represents the frequency of that class (category).
E.g. 1:-The blood type of 25 students is given below

A B B AB O A
O O B AB B A B
B B O A O AB
A O O O AB O
Class(Blood type) Frequency(number of students)

A 5
B 7
AB 4
O 9
Total 25
E.g. 3:-construct FD for the following letter grade of 25 students

A B C C C
C B B A D
A C C A B
F C C A B
2. Ungrouped FD (Frequency Array):- A FD of numerical data (quantitative) in which each

value of a variable represents a single class (i.e. the values of the variable are not grouped) and
the number of times each value repeats represents the frequency of that class.
E.g.:-Number of children for 21 families.

2 3 5 4 3 3 2
3 1 0 4 3 2 2
1 1 1 4 2 2 2
Class(Number of children) Frequency(Number of families)

0 1
1 4
2 7
3 5
4 3
5 1
Total 21
Grouped (Continuous) FD: - A FD of numerical data in which several values of a variable are
grouped into one class. The number of observations belonging to the class is the frequency of the
class.
E.g.:-Consider age group and number of persons

Class Limits Class Boundaries Frequency
(Age in years) (Age in years) (number of persons)
1-25 0.5-25.5 20
26-50 25.5-50.5 15
51-75 50.5-75.5 25
76-100 75.5-100.5 10
Total 70
Class Limits:-The lowest and highest values that can be included in a class are called Class
Limits. The lowest values are called Lower Class Limits and the highest values are called Upper
Class Limits.
Class limit for the first class 1-25

Lower class limit 1
Upper class limit 25
Class Boundaries:-are class limits when there is no gap between the UCL of the first class and
the LCL of the second class. The lowest values are called Lower Class Boundaries and the
highest values are called Upper Class Boundaries.
Cass Boundary for the first class 0.5-25.5

Lower class boundary 0.5
Upper class boundary 25.5
Class Width (Class Size):-the difference between UCB and LCB of a class. It is also the
difference between the lower limits of two consecutive classes or it is the difference between
upper limits of two consecutive classes.
W=UCB-LCB or W=LCLi-LCLi-1 or W=UCLi-UCLi-1

For the above E.g. W=25.5-0.5=25 or W=26-1=25 or W=50-25=25
Class Mark (Class Midpoint):-is the half way between the class limits or the class boundaries.
LCL  UCL LCB  UCB

CM= or CM=
2 2
Note that W=CMi-CMi-1
Class Limits Class Boundaries Class Mark Frequency

1-25 0.5-25.5 13 20
26-50 25.5-50.5 38 15
51-75 50.5-75.5 63 25
76-100 75.5-100.5 88 10
Total 70
Relative frequency: - is the ratio of class frequency to the total frequency (total number of
observations).
Percentage frequency: - Relative frequency ×100
Class Limits Class Boundaries Class Mark Frequency Relative Percentage

frequency frequency
1-25 0.5-25.5 13 20 20/70
26-50 25.5-50.5 38 15 15/70
51-75 50.5-75.5 63 25 25/70
76-100 75.5-100.5 88 10 10/70
Total 70 70/70=1 100
Cumulative frequency: is the sum of frequencies (total number of observations) below or above
a certain value.
Less than Cumulative Frequency: is the total number of values of a variable below a certain
UCB.
More than Cumulative Frequency: - is the total number of values of a variable above a certain
LCB.
Class Class Class Frequency Less than More than

Limits Boundaries Mark Cum. Freq. Cum. Freq.
1-25 0.5-25.5 13 20 20 10+25+15+20=70
26-50 25.5-50.5 38 15 20+15=35 10+25+15=50
51-75 50.5-75.5 63 25 20+15+25=60 10+25=35
76-100 75.5-100.5 88 10 20+15+25+10=70 10
Total 70
Construction of Grouped Frequency Distribution

1. Arrange the data in an array form (increasing or decreasing order).
2. Find the Unit of Measurement (U).
U is the smallest difference between any two distinct values of the data.
3. Find the Range(R)
R is the maximum numerical difference in the data set, i.e. the difference
between the largest and the smallest values of the variable.
4. Determine the number of classes (K) using Sturgis Rule.
K=1+3.322logN where N is the total number of observations.
5. Specify the class width(W)
R
W=
K
6. Put the smallest value of the data set as the LCL of the first class. To obtain the LCL of
the second class add the class width W to the LCL of the first class. Continue adding
until you get K classes.
Let X be the smallest observation
LCL1=X
LCLi=LCLi-1+W for i=2, 3… K.
7. Obtain the UCLs of the FD by adding W-U to the corresponding LCLs.
UCLi=LCLi+ (W-U) for i=1,2…K.
8. Generate the class boundaries.
1 1
LCBi=LCLi- U and UCBi=UCLi+ U for i=1,2…K.
2 2
Example 1: Mark of 50 students out of 40
16 21 26 24 11 17 25 26 13 27 24 26 3 27 23 24 15 22 22 12 22 29 18 22 28 25 7
17 22 28 19 23 23 22 3 19 13 31 23 28 24 9 20 33 30 23 20 8 21 24
Construct grouped frequency distribution.
Solution
1. The array form of the data (increasing order)

3 3 7 8 9 11 12 13 13 15 16 17 17 18 19 19 20 20 21 21 22 22 22 22 22 22
23 23 23 23 23 24 24 24 24 24 25 25 26 26 26 27 27 28 28 28 29 30 31 33
2. U=9-8=1
3. R=L-S=33-3=3
4. K=1+3.322logN=1+3.322log50=6.64≈7
5. W=R/K=30/6.64=4.5≈5
6. W-U=5-1=4
Class Class Class Frequency Relative Percentage
Limits Boundaries mark Frequency Frequency
3-7 2.5-7.5 5 3 3/50=0.06 6
8-12 7.5-12.5 10 4 4/50=0.08 8
13-17 12.5-17.5 15 6 6/50=0.12 12
18-22 17.5-22.5 20 13 13/50=0.26 26
23-27 22.5-27.5 25 17 17/50=0.34 34
28-32 27.5-32.5 30 6 6/50=0.12 12
33-37 32.5-37.5 35 1 1/50=0.02 2
total 50 1 100
CB F Class LCF Class MCF

2.5-7.5 3 <7.5 3 >2.5 50
7.5-12.5 4 <12.5 7 >7.5 47
12.5-17.5 6 <17.5 13 >12.5 43
17.5-22.5 13 <22.5 26 >17.5 37
22.5-27.5 17 <27.5 43 >22.5 24
27.5-32.5 6 <32.5 49 >27.5 7
32.5-37.5 1 <37.5 50 >32.5 1
50
Exercise In a survey the age of 44 women at marriage was reported as follows. Construct the
appropriate FD for this data.
24 25 27 26 22 23 24 25 24 23 26 28 24 25 23 24 25 25 25 22 27 28
27 24 25 24 25 28 26 25 24 28 24 25 25 24 25 24 26 27 27 25 28 26
1.2.3.2 Diagrammatic and/or graphical presentation of data: Bar charts, pie-chart,

pictogram, Histogram, Frequency polygon, Ogive curve, Stem and leaf plot
1. Histogram: A graph in which the classes are marked on the X axis (horizontal axis) and
the frequencies are marked along the Y axis (vertical axis).
 The height of each bar represents the class frequencies and the width
of the bar represents the class width.
 The bars are drawn adjacent to each other.
1. Frequency Polygon: A graph that consists of line segments connecting the
intersection of the class marks and the frequencies.
 Can be constructed from Histogram by joining the mid-points of each
bar.
2. Frequency curve: is a smooth free hand curve of frequency polygon.
Diagrams
1. Bar Diagram:-It is the simplest and most commonly used diagrammatic
representation of a frequency distribution. It is appropriate to present Qualitative Data
(nominal\ordinal). It uses a serious of separated and equally spaced bars in which the
width of the bars is constant and height of bars corresponds to the frequency of the
category. The bars are separated by constant distance.
1.1 Simple Bar Diagram: is a diagram in which categories of a variable are
marked on the X axis and the frequencies of the categories are marked on the Y
axis.
It is applicable for discrete variables, that is, for data given according to some
period, places and timings. These periods and timings are represented on the
base line (X-axis) at regular interval and the corresponding frequencies are
represented on the Y-axis.
 The width of the rectangle represents nothing (it is meaningless), but it
should be equal for all rectangles.
 Each rectangle is separated by an equal space.
 It can also represent some magnitude (on the Y axis) over time, space,
groups, etc. (on the X axis).
Example1:
Marital Status Number of individuals

Single 100
Married 70
Divorced 30
Total 200
Mar Status
100
80
60
Fr eq uen cy
40
20
0
Single Married Divorced
Mar Status
Example2:
Year 1983 1984 1985 1986 1987
Crop 1.5 2.4 1.2 3 2.5

Production
1.2 Component Bar Diagram: is used when there is a desire to show a total or
aggregate is divided into its component parts. The bars represent total value of
a variable with each total broken into its component parts and different colors
are used for identification. In such type of diagrams, a bar is subdivided in to
parts in proportion to the size of the sub division. These subdivided rectangles
are shaded differently by lines, dots and colors so that they will be very easy to
compare the components.
Sometimes the volumes of different attributes may be greatly different. For
making meaningful comparisons, the components of the attributes are reduced
to percentages. In that case each attribute will have 100 as its maximum
volume. This sort of component bar diagram is known as percentage bar-
diagram.
 Each rectangle represents total value of a variable and is broken into its
component parts.
Example
Marital Status Male Female Total
Single 90 10 100
Married 30 40 70
Divorced 1 29 30
250
200
150 Divorced
100 Married
50 Single
0
Male Female Total
1.3 Multiple Bars Diagram: used to display data on more than one variable. In
the multiple bars diagram two or more sets of inter-related data are interpreted.
Example:
Year Coffee Butter Sugar Total
1997 120 127 75
1998 25 98 87
1999 100 120 75
2000 198 98 60
400
300 Coffee
200 Butter
100 Sugar
0 Total
time1 time2 time3 time4
Pie chart: - Pie chart is popularly used in practice to show percentage break down of data. A pie
chart is a circle representing a set of data by dividing the circle into sectors proportional to the
number of items in the categories or a pie chart is a circle representing the total, cut into slices in
proportional to the size of the parts that make up the total. It gives the proportional sizes of
different data groups as slice of a pie or a circle.
Example:
Marital Status Number of individuals Percentage Degree

Single 100 50 180
Married 70 35 126
Divorced 30 15 54
Total 200 100 360
Single
Married
Divorced
Histogram
Histogram is a special type of bar graph in which the horizontal scale represents classes
of data values and the vertical scale represents frequencies. The height of the bars
correspond to the frequency values, band the drawn adjacent to each other (without gaps).
We can construct a histogram after we have first completed a frequency distribution table
for a data set. The y axis is reserved for the class boundaries.
Consider the following set of Example 2.4: data and construct the frequency distribution.
11 ,29, 6, 33, 14, 21, 18, 17, 22, 38, 31, 22, 27, 19, 22, 23, 26, 39, 34, 27
Relative frequency histogram has the same shape and horizontal ( ) scale as a histogram, but the
vertical (y -axis) scale is marked with relative frequencies instead of actual frequencies.
Frequency Polygon
A frequency polygon uses line segment connected to points located directly above class midpoint
values. The heights of the points correspond to the class frequencies, and the line segments are
extended to the left and right so that the graph begins and ends on the horizontal axis with the
same distance that the previous and next midpoint would be located.
An Ogive (pronounced as “oh-jive”) is a line that depicts cumulative frequencies, just as the
cumulative frequency distribution lists cumulative frequencies. Note that the Ogive uses class
boundaries along the horizontal scale, and graph begins with the lower boundary of the first class
and ends with the upper boundary of the last class. Ogive is useful for determining the number of
values below some particular value. There are two type of Ogive namely less than Ogive and
more than Ogive. The difference is that less than Ogive uses less than cumulative frequency and
more than Ogive uses more than cumulative frequency on axis.
Above example Example 2.4:
pictograph
Pictograph is a way of representing statistical data using symbolic figures to match the
frequencies of different kinds of data. Visual presentation of data using icons, pictures, symbols,
etc., in place of or in addition to common graph elements (bars, lines, points). Pictographs use
relative sizes or repetitions of the same icon, picture, or symbol to show comparison.
Also called pictogram, pictorial chart, pictorial graph, or picture graph.
Stem and leaf plot
A stem-and-leaf diagram, also called a stem-and-leaf plot, is a diagram that quickly summarizes
data while maintaining the individual data points. In such a diagram, the "stem" is a column of
the unique elements of data after removing the last digit. The final digits ("leaves") of each
column are then placed in a row next to the appropriate column and sorted in numerical order.

Chapter - 1: Introduction To Probability and Statistics For Civil Engineering

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter - 1: Introduction To Probability and Statistics For Civil Engineering

Uploaded by

Copyright:

Available Formats

Introduction to probability and Statistics for Civil Engineering

Ex: Classify the following statements as Descriptive and Inferential Statistics

a. The average age of the students in this class is 21 years.

1.1.2 Stages in Statistical investigation

According to the singular sense definition of statistics, a statistical study (statistical

1.1.3 Definition of some Statistical terms

Collected.. It is the totality of observations with which the researcher is concerned.

Parameter:- It is a descriptive measure (value) computed from the population. It is the

Sample size: The number of elements or observation to be included in the sample.

1.1.4 Applications, Uses and Limitations of statistics

Applications of Statistics in Engineers

 In Scientific Research: Statistics is used as a tool in a scientific research. Statistical

Uses of Statistics in Engineers

 Statistics are liable to be misused or misinterpreted. This may be due to incomplete

1.1.5 Types of variables and Measurement Scales

a. Color of automobiles in a dealer’s show room.

Scales of Measurements/Levels of Measurements

Consider the following two cases.

Who plays better?

 Mr. A scored 5 in stat quiz.

Therefore scales of measurement

 Shows the information contained in the value of a variable.

1.2 Methods of data collection and presentation

Method of primary data collection

Interviewing is a technique that is primarily used to gain an understanding of the underlying

Methods of secondary data collection

1.2.3 Methods of Data Presentation

E.g. 1:-The blood type of 25 students is given below

Class(Blood type) Frequency(number of students)

E.g. 3:-construct FD for the following letter grade of 25 students

2. Ungrouped FD (Frequency Array):- A FD of numerical data (quantitative) in which each

E.g.:-Number of children for 21 families.

Class(Number of children) Frequency(Number of families)

E.g.:-Consider age group and number of persons

Class limit for the first class 1-25

Cass Boundary for the first class 0.5-25.5

W=UCB-LCB or W=LCLi-LCLi-1 or W=UCLi-UCLi-1

LCL  UCL LCB  UCB

Note that W=CMi-CMi-1

Class Limits Class Boundaries Class Mark Frequency

Percentage frequency: - Relative frequency ×100

Class Limits Class Boundaries Class Mark Frequency Relative Percentage

Class Class Class Frequency Less than More than

Construction of Grouped Frequency Distribution

Example 1: Mark of 50 students out of 40

Construct grouped frequency distribution.

1. The array form of the data (increasing order)

CB F Class LCF Class MCF

1.2.3.2 Diagrammatic and/or graphical presentation of data: Bar charts, pie-chart,

Marital Status Number of individuals

Year 1983 1984 1985 1986 1987

Crop 1.5 2.4 1.2 3 2.5

Marital Status Number of individuals Percentage Degree

Stem and leaf plot

You might also like