You are on page 1of 40

EG303 - Engineering Statistics

College of Engineering
Department of Petroleum Engineering
1 Hussein Jasim Mohammed
Students who pass this course will be able to:
1. Understand the definition of Statistics and recognize its importance.
2. Differentiate between descriptive and inferential Statistics.
3. Recognize the methods of random sample selection.
4. Specify the level of data measurement.
5. Construct frequency distributions for sets of data.
6. Understand pictorial description of data and use them to display data
7. Represent data graphically as graphs and charts.
8. Understand probability concept and compute probabilities for events
9. Understand binomial, Poisson and normal distributions and use them to
calculate probabilities.
10.Solve correlation and regression problems.

College of Engineering
Department of Petroleum Engineering
2 Hussein Jasim Mohammed
Chapter 1

Introduction to Statistics

College of Engineering
Department of Petroleum Engineering
3 Hussein Jasim Mohammed
What is Statistics?
• Statistics is defined as the branch of applied Mathematics that involve
collecting, analyzing, summarizing, and presenting data to help in decision
making process.

• In many situations, Statistics helps decision makers to make the right decisions and
solve problems they face efficiently by conducting statistical studies.

• Statistics is used as a powerful tool to find solutions to wide range of problems arise
in all fields of life such as:

• Business • Weather forecasting • Economics


• Education • Sports • Pandemic prediction
• Industry • Psychology • Prediction of natural disasters
• Health • Biology • Transportation
• Research • Insurance • Politics, etc.

College of Engineering
Department of Petroleum Engineering
4 Hussein Jasim Mohammed
What is Statistics? (cont.)
• The mathematical theories behind statistics heavily use:
• Differential and Integral Calculus
• Linear Algebra
• Probability Theory

• Statistics is divided into two branches:


1- Descriptive statistics
2- Inferential Statistics

College of Engineering
Department of Petroleum Engineering
5 Hussein Jasim Mohammed
What is Statistics? (cont.)
Statistics

Collect
Relevant
Data

Analyze Generalize
Determine Data Data Set and Achieve
the Implement The
Objective a Plan Objective

Summarize Make
Data Decision

Present
Data

Descriptive Statistics Inferential Statistics

Using Statistics in Decision Making


College of Engineering
Department of Petroleum Engineering
6 Hussein Jasim Mohammed
Examples of Applying Statistics in Real Life
1- Quality Control
• Quality testing is an important application of statistics in every industry.
• Quality tests are conducted to ensure that the purchase or produced parts and
materials meet the required standards and get the best results from what was spent.
• A sample test is done to the purchased materials or produced parts. If the sample test
results conform to standards, the materials are accepted, otherwise they should be
rejected.

2- Health Insurance
• Statistics and probability are frequently used by health insurance firms to estimate the
likelihood that specific individuals will incur a particular annual healthcare expense
• For example, a statistician at a health insurance company may utilize variables like
age, pre-existing conditions, current health status to calculate the likelihood that a
specific person will spend $20,000 or more on healthcare in a given year.

College of Engineering
Department of Petroleum Engineering
7 Hussein Jasim Mohammed
Examples of Applying Statistics in Real Life (cont.)
3- Weather Forecasting
• Perhaps the most common real-life example of using probability is weather
forecasting.
• Probability is used by weather forecasters to assess how likely it is that there will be
rain, snow, clouds, etc. on a given day in a certain area.
• Forecasters will regularly say things like “there is an 80% chance of rain today
between 2 p.m. and 5 p.m. to indicate that there’s a high likelihood of rain during
certain hours.

4- Investing
• Investors use probability to assess how likely it is that a certain investment will pay
off.
• For example, a given investor might determine that there is a 1% chance that the
stock of company A will increase 100x during the upcoming year.

College of Engineering
Department of Petroleum Engineering
8 Hussein Jasim Mohammed
Examples of Applying Statistics in Real Life (cont.)
5- Natural Disasters
• The environmental departments of countries often use probability to determine how
likely it is that a natural disaster like a hurricane, tornado, earthquake, etc. will strike
the country in a given year.
• If the probability is quite high, then the department will make decisions about
housing, resource allocation, etc. that will minimize the effects caused by the natural
disaster.

6- Politics
• Political forecasters use probability to predict the chances that certain candidates will
win various elections.
• For example, a forecaster might say that candidate A has a 60% chance of winning,
candidate B has a 20% chance of winning, candidate C has a 10% chance of winning,
etc. to give voters an idea of how likely it is that each candidate will win.
• Note: A real-life example of a site that uses probability to perform political
forecasting is FiveThirtyEight.

College of Engineering
Department of Petroleum Engineering
9 Hussein Jasim Mohammed
Examples of Applying Statistics in Real Life (cont.)
7- Traffic
• Ordinary people use probability every day when they decide to drive somewhere.
• Based on the time of day, location in the city, weather conditions, etc. we all tend to
make probability predictions about how bad traffic will be during a certain time.
• For example, if you think there’s a 90% probability that traffic will be heavy from 4
p.m. 5:30 p.m. in your area, then you may decide to wait to drive somewhere during
that time.

8- Sports
• There are lots of uses of statistics in sports. Every sport requires statistics to make the
sport more effective.
• Statistics help the sports person or team to get the idea about their performance in a
particular sport.
• For example, by the end of each football match, a very detailed report about the
match statistics is issued. This report and all other relevant reports can be used to
achieve better performance and better results.

College of Engineering
Department of Petroleum Engineering
10 Hussein Jasim Mohammed
Sample and Population
• It is often difficult or impractical or even impossible to observe the entire group,
especially if it is large.

• For example, it is difficult and impractical to collect data concerning the heights
and weights of students in a university.
• It is also difficult and impractical to gather data about the numbers of defective and
non-defective bolts produced in a factory on a given day.

• Instead of examining the entire group, called the population, a small part of the group
is examined, called a sample.

College of Engineering
Department of Petroleum Engineering
11 Hussein Jasim Mohammed
Sample and Population (cont.)

• Population is the entire (whole) group of members (people, objects or records)


about which a statistical study is conducted, and conclusions will be drawn.

• Sample is the part of the population which is randomly selected to conduct the
statistical study about the population that the sample is taken from.

• Census is the process of gathering data about every member of a population to do


a statistical study about this population.

• Sampling is the process of random collecting of People, objects or records from a


larger group (population) about which a certain study will be conducted.

College of Engineering
Department of Petroleum Engineering
12 Hussein Jasim Mohammed
Sample and Population (cont.)

Sample

Population

College of Engineering
Department of Petroleum Engineering
13 Hussein Jasim Mohammed
Sample and Population (cont.)
Examples on population are:
• All the crude oil extracted from Rumaila oilfield last years.
• All petroleum engineers work in Ministry of oil.
• All Iraqi citizen who are currently above 40 years.
• Grades of all students in Calculus II last academic year in College of Engineering
at University of Baghdad.

Examples on sample are:


• 10 samples of the crude oil extracted from Rumaila oilfield taken on monthly basis
last years.
• 20 petroleum engineers work in Ministry of Oil were selected, of 4 engineers from
each of the 5 companies belong to the ministry.
• 500 Iraqi people who are currently above 40 years selected randomly from Iraqi
people.
• Grades of 50 students in Calculus II, randomly selected from all students who
studied in College of Engineering at University of Baghdad last academic year.

College of Engineering
Department of Petroleum Engineering
14 Hussein Jasim Mohammed
Reasons for Using Samples in Statistical Studies
The reasons for using samples instead of census in statistical studies are:

1- Less time and resources: Analyzing an entire population can be time-consuming and
resource-intensive, especially when dealing with large datasets. Sampling allows
analysts to work with a manageable subset.

2- Lower cost: Collecting data from an entire population can be expensive. Sampling
can significantly reduce data collection costs.

3- Practicality: In some cases, it’s simply impractical to access or analyze an entire


population, such as when dealing with historical records or extensive databases.

4- Destructive Testing: When testing or analyzing requires the destruction of the tested
objects, as in product testing or medical trials, sampling is essential to avoid excessive
waste.

College of Engineering
Department of Petroleum Engineering
15 Hussein Jasim Mohammed
Descriptive and Inferential Statistics
• The two major areas of statistics are known as descriptive statistics, which describes
the properties of sample and population data, and inferential statistics, which uses
those properties to test hypotheses and draw conclusions.

• Descriptive statistics is the phase of statistics that only deals with collecting,
analyzing, summarizing and presenting data in ways that help in its interpretation
and subsequent analysis without drawing any conclusions or inferences about the
larger group of data.

• Inferential statistics is the phase when the data gathered from a sample is analyzed
and the results of analysis are used to reach conclusions about the population from
which the sample was taken.

College of Engineering
Department of Petroleum Engineering
16 Hussein Jasim Mohammed
Descriptive and Inferential Statistics (cont.)
Parameter vs. Statistic
• A parameter is a descriptive measure of the population, and they are usually denoted
by Greek letters.

• Examples of parameters are:


• Population mean ( μ )
• Population variance ( σ2 )
• Population standard deviation (σ )

• A statistic is a descriptive measure of a sample, and they are usually denoted by


Roman letters.

• Examples of statistics are:


• Sample mean ( 𝑥ҧ )
• Sample variance ( s2 )
• Sample standard deviation ( s )

College of Engineering
Department of Petroleum Engineering
17 Hussein Jasim Mohammed
Descriptive and Inferential Statistics (cont.)
• The basis for inferential statistics, then, is the ability to make decisions about
parameters without having to complete a census of the population.

4- Use 𝑥ҧ
to estimate μ

1- Population 3- Sample
μ 𝑥ҧ
(parameter) (statistic)

2- Select a
random sample

Process of Inferential Statistics to Estimate a Population Mean (μ)

College of Engineering
Department of Petroleum Engineering
18 Hussein Jasim Mohammed
Data Collection Methods
• Depending on the type of data source, the data collection methods can be divided
into two categories: primary data collection methods, and secondary data collection
methods.

• Primary Data is the data that is collected by the researcher for the first time as
he/she is conducting the statistical study.

• Secondary Data is a second-hand data that is already collected and recorded before
the researcher started his/her statistical study.

College of Engineering
Department of Petroleum Engineering
19 Hussein Jasim Mohammed
Data Collection Methods (cont.)
Data collection methods are:

• Observations: is way of gathering data by recording behavior, events, and phenomena as they
occur in their natural setting, providing a rich and real-time source of data.

• Surveys: is a method of data collection done by interviewing people personally or by taking


their opinions through written questionnaires.

• Experiments: is a method of data collection that uses designed experiments under controlled
conditions to generate the required data.

• Using archived data: is a method of selecting and using data that is already collected and
documented in records and databases. These records and databases can be in either paper or
electronic forms.

College of Engineering
Department of Petroleum Engineering
20 Hussein Jasim Mohammed
Data Collection Methods (cont.)
Data
Sources

Primary Secondary
Sources Sources

Data Collection Methods

Using Previously Recorded Data


Observation Surveys Experiments (Paper or Electronic Records)
Examples Examples Examples Examples
• Measuring rainfall levels • Telephone surveys • Testing new medicines • University students’ database
• Observing traffic volume • Written questionnaires • Designing new alloys • Patients records in a hospital
• Personal interviews and materials • Civil records of births, marriages
• Observing people behavior
and deaths of a country
• Observing stock market • Controlling plant growth
• Weather records Iraq

College of Engineering
Department of Petroleum Engineering
21 Hussein Jasim Mohammed
Sampling Methods
• In a statistical study, sampling methods refer to how we select members from the
population to be in the study.
• If a sample is not randomly selected, it will probably be biased in some way and the
data may not be representative of the population.
• There are many ways to select a sample, some of them are good and some are bad.

• Types of bad methods to sample are:


1- Convenience sample
2- Voluntary response sample

• Types of good methods to sample are:


1- Simple random sample
2- Stratified random sample
3- Cluster random sample
4- Systematic random sample

College of Engineering
Department of Petroleum Engineering
22 Hussein Jasim Mohammed
Sampling Methods (cont.)
Bad methods to sample:

1- Convenience Sample: The researcher chooses a sample that is readily available in


some non-random way. Convenience sampling involves using respondents who are
“convenient” to the researcher. There is no pattern whatsoever in acquiring these
respondents, they may be recruited merely asking people who are present in the street,
in a public building, or in a workplace.

• Example: A researcher polls people as they walk by on the street. The sample is
probably biased, since the location and time of day and other factors may produce a
biased sample of people.

2- Voluntary Response Sample: is a sample made up of individuals who volunteer to be


included in the sample.
• The drawback (disadvantage) of this sampling method is that the individuals who
voluntarily respond will likely have stronger opinions (positive or negative) than the
rest of the population, which makes them an unrepresentative sample.

College of Engineering
Department of Petroleum Engineering
23 Hussein Jasim Mohammed
Bad Ways to Sample (cont.)
• Example: A radio host asks listeners to go online and take a survey on his
website about their opinion of his show. Each individual listener can voluntarily
decide to take the survey or not.
• The visual below illustrates this problem: suppose the green circles represent
people who think highly of the radio show while the red circles represent people
who dislike the show:
• Notice how most of the people who think highly of the show are included in the
sample, yet the sample is not representative of the larger population. The results
of the survey would show that most people like the show, when in fact this is not
true.

College of Engineering
Department of Petroleum Engineering
24 Hussein Jasim Mohammed
Sampling Methods (cont.)
Good methods to sample:
1- Simple random sample: Every member and set of members has an equal chance of
being included in the sample. Computerized random number generators, or some other
sort of chance process can be used to get a simple random sample.
• The advantage of using simply selected random samples is that they are usually
representative since they don't favor certain members
• Example: A teachers puts students' names in a box and chooses without looking to
get a sample of students.

2- Stratified random sample: The population is first split into groups. The overall
sample consists of some members from every group. The members from each group are
chosen randomly.
• A stratified sample guarantees that members from each group will be represented in
the sample, so this sampling method is good when we want some members from
every group.
• Example: A student council surveys students by getting random samples of first
year, second year, third year, and fourth year.

College of Engineering
Department of Petroleum Engineering
25 Hussein Jasim Mohammed
Sampling Methods (cont.)
3- Cluster random sampling: The population is first split into groups. The overall
sample consists of every member from some of the groups. The groups are selected at
random.
• The advantage of this method is that the sample gets every member from some of
the groups, so it’s very useful since each group reflects the population as a whole
• Example: An airline company wants to survey its customers one day, so they
randomly select flights that day and survey every passenger on those flights.
4- Systematic random sampling: Systematic sampling is a sampling method where the
sample elements are chosen from target population by selecting a random starting point
and selecting sample members after a fixed sampling interval.
• Systematic sampling is a method that has many of the randomization benefits of
simple random sampling but is slightly easier to conduct.
• Example: supermarket wants to study their customers' buying habits. With
systematic random sampling, they can choose every 10th or 15th customer entering
the supermarket. Then, they can conduct the study on this sample.

College of Engineering
Department of Petroleum Engineering
26 Hussein Jasim Mohammed
Data Types
• In statistics, data is mainly classified into two group: qualitative and quantitative data.
Quantitative data is subdivided into discrete and continuous data.

• Qualitative data: is the data that cannot be counted, measured or expressed using numbers, it
only describes qualities or characteristics.

• Quantitative data: Quantitative data is the data that can be represented numerically, including
anything that can be counted, measured, or given a numerical value. Quantitative data is
divided into two groups: discrete and continuous data.

College of Engineering
Department of Petroleum Engineering
27 Hussein Jasim Mohammed
Data Types (cont.)

Data

Qualitative Quantitative
(Categorical) (Numerical)

Examples
• Marital status
Discrete Continuous
• Political party
(Counted items) (Measured
• Eye color
characteristics)
• Nationality Examples
• Academic discipline • Number of children Examples
• Number of defects per hour • Weight
• Number of workers in a company • Voltage
• Height
• Time
• Temperature

College of Engineering
Department of Petroleum Engineering
28 Hussein Jasim Mohammed
Levels of Data Measurement
• Choosing the appropriate data analysis procedure depends on the level of
measurement of the data gathered.
• There are four common levels of data measurement:
1- Nominal Level
2- Ordinal Level
3- Interval Level
4- Ratio Level

College of Engineering
Department of Petroleum Engineering
29 Hussein Jasim Mohammed
Levels of Data Measurement (cont.)
1- Nominal Level
• The lowest level of data measurement is the nominal level.
• A nominal level usually deals with the non-numeric variables or the numbers that
do not have any value.
• Numbers representing nominal level data (the word level often is omitted) can be
used only to classify or categorize.
• Nominal data cannot be ordered.
• Nominal data is qualitative data.
• Examples of nominal level data include:
• Nationality
• Blood type
• Gender
• Religion
• Geographic location
• Place of birth
• Political party
• Employee identification numbers

College of Engineering
Department of Petroleum Engineering
30 Hussein Jasim Mohammed
Levels of Data Measurement (cont.)
• Employee identification numbers, are an example of nominal data. The numbers
are used only to differentiate employees and not to make a value statement about
them.
• Many demographic questions in surveys result in data that are nominal because
the questions are used for classification only.
• The following question is an example of such a question that would result in
nominal data:
Which of the following employment classifications best describes your field of work?
1. Educator
2. Construction worker
3. Manufacturing worker
4. Lawyer
5. Doctor
6. Other
• Suppose that, for computing purposes, an educator is assigned a 1, a construction
worker is assigned a 2, a manufacturing worker is assigned a 3, and so on. These
numbers should be used only to classify respondents. The number 1 does not denote the
top classification. It is used only to differentiate an educator (1) from a lawyer (4).

College of Engineering
Department of Petroleum Engineering
31 Hussein Jasim Mohammed
Levels of Data Measurement (cont.)
2- Ordinal Level
• Ordinal-level data measurement is higher than the nominal level.
• In addition to the nominal level capabilities, ordinal-level measurement can be
used to rank or order objects.
• With ordinal data, the distances or spacing represented by consecutive numbers
are not always equal.
• Ordinal data is qualitative data.

• For example, using ordinal data, a supervisor can evaluate three employees by
ranking their productivity with the numbers 1 through 3. The supervisor could
identify one employee as the most productive, one as the least productive, and one
as somewhere between by using ordinal data. However, the supervisor could not
use ordinal data to establish that the intervals between the employees ranked 1 and
2 and between the employees ranked 2 and 3 are equal; that is, he/she could not
say that the differences in the amount of productivity between workers ranked 1,
2, and 3 are necessarily the same.

College of Engineering
Department of Petroleum Engineering
32 Hussein Jasim Mohammed
Levels of Data Measurement (cont.)
• Some questionnaire Likert-type scales are considered by many researchers to be
ordinal in level.
• The following is an example of one such scale:
Evaluate the importance of Statistics subject delivered to year 3 students:
1. Very unimportant
2. Unimportant
3. Moderately important
4. Important
5. Very important
• When this survey question is coded for the computer, only the numbers 1 through 5
will remain, not the adjectives.
• Virtually everyone would agree that a 5 is higher than a 4 on this scale and that ranking
responses is possible.

College of Engineering
Department of Petroleum Engineering
33 Hussein Jasim Mohammed
Levels of Data Measurement (cont.)
3- Interval Level
• Interval-level data measurement is the next to the highest level of data in which
the data is always numerical , and the distances between consecutive numbers
have meaning.
• The distances represented by the differences between consecutive numbers are
equal (interval data have equal intervals).
• In the interval scale, there is no true zero point or fixed beginning The zero point
is a matter of convention and not a natural or fixed zero point.
• Zero is just another point on the scale and does not mean the absence of the
phenomenon.

• Examples of interval data include:


• Temperature in Celsius or Fahrenheit: Temperature scales, the Celsius and
Fahrenheit, are perfect examples of interval scales. They have equal intervals
(the difference between 20°C and 30°C is the same as between 30°C and
40°C), but no true zero point as 0°C and 0°F are valid and do not signify the
absence of temperature.
0°C = 32°F

College of Engineering
Department of Petroleum Engineering
34 Hussein Jasim Mohammed
Levels of Data Measurement (cont.)
• IQ Test: according to psychological studies, one can not have zero IQ. The
average IQ is, by definition, 100; scores above 100 indicate a higher-than-average
IQ, and scores below 100 indicate a lower-than-average IQ. Theoretically, scores
can range from any number below or above 100. In practice, however, they do
not meaningfully go much below 50 or above 150.

• Time: time passes is a good example of interval data if measured during the day
or using a 12-hour clock. The numbers on a wall clock are on an interval scale
since they are equidistant and measurable. For example, the difference between 1
o’clock and 2 o’clock is the same as that between 2 o’clock and 3 o’clock.

College of Engineering
Department of Petroleum Engineering
35 Hussein Jasim Mohammed
Levels of Data Measurement (cont.)
4- Interval Level
• Ratio-level data measurement is the highest level of data measurement.
• Ratio data have the same properties as interval data, but ratio data have an
absolute zero, and the ratio of two numbers is meaningful.
• The notion of absolute zero means that zero is fixed, and the zero value in the
data represents the absence of the characteristic being studied.

• Examples of ratio data are height, weight, time, volume, and Kelvin temperature.
• With ratio data, a researcher can state that 180 pounds of weight is twice as much
as 90 pounds or, in other words, make a ratio of 180:90.
• Many of the data gathered by machines in industry are ratio data.
• Because interval- and ratio-level data are usually gathered by precise instruments
often used in production and engineering processes, in national standardized
testing, or in standardized accounting procedures, they are quantitative data.

College of Engineering
Department of Petroleum Engineering
36 Hussein Jasim Mohammed
Levels of Data Measurement (cont.)
• Each higher level of data can be analyzed by any of the techniques used on
lower levels of data but.

Data Level Meaningful Operations

Nominal Classifying and Counting

Ordinal Classifying, Counting and Ranking

Interval Classifying, Counting and Ranking, Addition and Subtraction

Classifying, Counting and Ranking, Addition, Subtraction,


Ratio
Multiplication and Division

College of Engineering
Department of Petroleum Engineering
37 Hussein Jasim Mohammed
Statistical Analysis Software
1- IBM SPSS Statistics

College of Engineering
Department of Petroleum Engineering
38 Hussein Jasim Mohammed
Statistical Analysis Software (cont.)
2- Minitab Statistics

College of Engineering
Department of Petroleum Engineering
39 Hussein Jasim Mohammed
Statistical Analysis Software (cont.)
3- Microsoft Excel

College of Engineering
Department of Petroleum Engineering
40 Hussein Jasim Mohammed

You might also like