You are on page 1of 23

1

Engineering Data Analysis


EnggMath 3

2
Probability
-derived from the verb ‘probe’ meaning to ‘to search
into” or ‘to look for’

Statistics
-a field of study that deals with the collection,
organization, presentation, summarization, analysis
and interpretation of numerical data.

CATEGORIES OF
STATISTICS

4
CATEGORIES OF STATISTICS
DESCRIPTIVE STATISTICS
o concerned with the organization, classification and presentation of
collected data
oInvolves techniques to describe or characterize a set of gathered
data without any attempt to make inference or conclusion about
them.
oAlso concerned with measuring the relationship between two or
more variables.
5
CATEGORIES OF STATISTICS
INFERENTIAL STATISTICS
o refers to the technique of interpreting values resulting from
obtained sample data to draw conclusion, generalizations, or make
predictions or inferences about the population
o The inference about the population is based on values computed
using the methods of descriptive statistics.

6
“ Statistical Methodology may be looked
upon as being three types: descriptive,
correlational, and inferential.

Downie and Heath (1984)

Determine the category of statistics


(whether descriptive or inferential) which is
involved in each of the following:
REVIEW EXERCISES

8
REVIEW EXERCISES
1. A newspaper article reports the average salaries of health
practitioners based on the average salaries obtained from samples
in different health centers and hospitals
2. A social psychologist is interested in determining whether
individuals who graduate from technical vocational schools earn
more than those who finished a four-year college degree. He
gathered data from 150 randomly selected graduates of technical
vocational schools and 150 randomly selected graduates of a four-
year college degree and presented the results he obtained using
tables and graphs.
9
REVIEW EXERCISES

3. A study is conducted on a sample to determine significant


differences in the extent of utilization of the worldwide web
between the freshmen and the seniors in a certain university. The
perceptions of 200 freshmen and 150 seniors were obtained and
presented and compared using tables and graphs.
4. A study of 250 patients admitted to a hospital during the past year
revealed that, on the average, the patients lived 15 miles from the
hospital. This was used to determine the average distance of
residences of patients in the population.

10

BASIC TERMS in STATISTICS

11
BASIC TERMS in STATISTICS

EXPERIMENT
- It is a systematic, planned and controlled activity aimed at
obtaining results that would yield to a set of data.

POPULATION
- It refers to the collection of people, objects, individuals, or scores
that can be described as having a unique combination of qualities.

12
BASIC TERMS in STATISTICS

SAMPLE
- a part of population
- it is a collection of some elements in a population and it
is a representative of the entire population.
13
BASIC TERMS in STATISTICS

VARIABLE
- is any property or characteristic of interest about each
individual unit of a population or of a sample

14
BASIC TERMS in STATISTICS
Types of Variables

• Independent Variable – is frequently referred as the input


or predictor variable because it is systematically manipulated
by the investigator and is used to predict the outcome.
• Dependent Variable – is the quantitative variable that the
investigation measures to determine the effect of the
independent variables. It is often referred to as the output
or response variable.

15
BASIC TERMS in STATISTICS
Symbols:
PARAMETER
- is a summary measure • µ - for population
calculated on an entire mean
population data • π – for population
- it quantifies the proportion
characteristics of the
population under investigation. • σ – for population
standard deviation

16
BASIC TERMS in STATISTICS

STATISTIC
- is a summary measure calculated or a value calculated on
sample data.
- it quantifies the characteristics of the sample which
represents the population.
17
BASIC TERMS in STATISTICS

DATA
- referred to as the raw material of statistics
- It is a set of values collected for the variable from each of the
elements of the sample.

18

The Statistical Data and Sampling

19 Planning – starts with a concise and clear definition of the problem. There should be a
Planning
Framework of clear vision of priorities, and how to achieve them.
Statistical Collection of Data
Data Collection – refers to the process of acquiring measurements, counts, or raw
Analysis
Organization and Presentation of Data
data.
Statistical data are the
raw material of statistical
investigations and they
Analysis of Collected Data Organization and Presentation of Data – Summarizing, organizing and presenting data
arise when
measurements Interpretations and Conclusions is another phase of statistical study.
Recommendations based on conclusions
Data Analysis – includes conversion of the data into relevant information that leads to
the formulation of clear, summarized and comprehensible numerical description.
Conclusion and Interpretation of Results – Intelligent conclusions are drawn from the
analysis of data.
Recommendations – based on the interpretations and conclusions, recommendations
are made.
20

NATURE OF DATA
classified according to source as primary or secondary and
according to type as qualitative or quantitative.

21
NATURE OF DATA
Advantages:
- accuracy, reliability and
PRIMARY DATA relevance to the study
- gathered directly from an because of the researcher’s
original source. direct participation in
gathering the information or
data

22
NATURE OF DATA

SECONDARY DATA
- information gathered from published or unpublished
materials that have been previously obtained by other
individuals or agencies.

23
NATURE OF DATA

QUALITATIVE DATA
- measure a quality, an attribute or a characteristics on each
experimental unit.
- they are labels in which category or class an individual,
object, or process fall
24 Examples:
NATURE OF DATA Discrete Variables:
QUANTITATIVE DATA
- measure a numerical quantity or amount in each
- number of defective items
experiment unit. - number of orders per day (for a certain product)
Categories:
• Discrete variables – can assume a finite or countable number of - number of times you visit a doctor
values.
• Continuous variables – can assume the infinitely many values - number of family members
corresponding to the points on a line interval; measurable,
expressed on a continuous scale Continuous Variables
- Height
- Weight
- time
- Volume
- Serum chl

25

Review Exercises
Identify which of the following represent continuous variables
and which represent discrete variables.

26 Answers:
Review Exercises
1. Discrete
1. number of male students in statistics class 2. Discrete
2. how many kinds of fruits you have eaten last week
3. life span of a sample of batteries 3. Continuous
4. number of words you can encode in one minute
5.
6.
speed of the horses in a race
height of the grade 6 pupils in your school
4. Discrete
7. reaction time of the subjects in an experiment
5. Continuous
6. Continuous
7. Continuous

27

Methods of Data Collection

28
METHODS OF DATA COLLECTION

1. INTERVIEW METHOD
- a direct method of investigation because the collection of
information and data is face-to-face or through a direct verbal
interaction between the interviewer and the interviewee

➢Mail interview ➢Computer interview


➢Telephone interview ➢Personal interview

29
METHODS OF DATA COLLECTION
1. INTERVIEW METHOD: ADVANTAGES

• An interview provides consistent and more precise


information wherein the interviewee could clarify some
issues if needed
• The researcher is able to witness for himself the reactions
or the emotions portrayed by the interviewee or the
respondent.
30
METHODS OF DATA COLLECTION
1. INTERVIEW METHOD: DISADVANTAGES

• Time-consuming and more expensive in terms of travel


expenses
• It is uncomfortable for some persons that sometimes they
are afraid to be interviewed.
• Limited field coverage.

31
METHODS OF DATA COLLECTION
1. INTERVIEW METHOD: DISADVANTAGES

• Respondents’ feelings that they are participants of


research might result to their expectations for some
sort of returns or reward for participating
• Analyzing information gathered through interview is
rather difficult because it is difficult to quantify them
• Interviewers need training to be able to do the art of
questioning and conduct a reliable and productive
interview

32
METHODS OF DATA COLLECTION

2. QUESTIONNAIRE METHOD
- an indirect method of investigation
The respondents are asked to provide responses to the
prepared and well-planned list of questions

33
METHODS OF DATA COLLECTION
2. QUESTIONNAIRE METHOD: ADVANTAGES

• Less expensive since it does not entail much travel to meet


one individual at a time.
• It can cover a wide area in a shorter span of time.
• Respondents or informers may feel a greater sense of
freedom to express their opinions and views since their
anonymity is maintained.
34
METHODS OF DATA COLLECTION
2. QUESTIONNAIRE METHOD: DISADVANTAGES

• Strong possibility of non-response, especially when the


questionnaires are mailed.
• Some respondents feel lazy to read and answer the
questionnaire
• A possibility that questions not easily understood will not
be answered

35
METHODS OF DATA COLLECTION

3. REGISTRATION METHOD
- usually enforced by law
Examples:
▪Registration of births, marriages, and deaths with the
Philippine Statistics Authority
▪Registration of motor vehicles and securing drivers’ licenses
from the Land Transportation Office

36
METHODS OF DATA COLLECTION
3. REGISTRATION METHOD: ADVANTAGES

• Information is kept systematized and is always made


available to the public

37
METHODS OF DATA COLLECTION

4. OBSERVATION METHOD
- the investigator collects information on the characteristics
of the units under study by actual measurements or by
observing the behavior of persons or organizations and their
outcomes
38
METHODS OF DATA COLLECTION
4. OBSERVATION METHOD: ADVANTAGES

• The recording of behavior at the appropriate time and


situation is made possible since the investigator personally
observes.

39
METHODS OF DATA COLLECTION

5. EXPERIMENTATION METHOD
- used to describe any process that generates a set of data
- used when the objective is to determine the cause-and-
effect relationship of certain phenomena under controlled
conditions such as in scientific researches.

40 Measurement is the assignment of values or numbers to objects or events according


to some rules or criteria set by the investigator.

Classification of Measurement of Data

41 Examples:
Classification of Measurement Data - gender distribution of 100 adults (55 male and 45 female)
- marital status (married, single, widowed, separated)
- outcome of tossing a coin (head or tail)
NOMINAL SCALE
- consists of labels or names to classify the observed elements to the
categories which they belong to
42 Examples:
Classification of Measurement Data - academic performance of students (poor, fair, good, very good, outstanding)
- choice of SIM (most preferred, next preferred, least preferred)
ORDINAL OR RANKING SCALE - size of 100 shirts (25 are small, 25 are medium, 25 are large and 25 are extra large)
- elements are arranged in some meaningful kind of natural order
which corresponds to their relative position or size but no information
about the difference between adjacent positions.

43 This scale of measurement is more informative than nominal or ordinal scale.


Classification of Measurement Data Examples:
- temperatures in degree celcius
INTERVAL SCALE - mental ability scores
- elements can be differentiated according to characteristics
- can be ranked or ordered - blood pressure readings
- The arithmetic difference between elements is meaningful

44 Examples:
Classification of Measurement Data - height of pine trees in Camp John Hay
- volume of helium gas in balloons
RATIO SCALE - time (in minutes) of each runner in a marathon
- here, we have not only the order property, a unit of measurement,
and a meaningful difference between elements but we also have a
fixed origin or zero point as opposed to an arbitrary origin in the
interval scale.

45 Measurement is the assignment of values or numbers to objects or events according


Classification of Measurement Data to some rules or criteria set by the investigator.
• Named + Ordered + Proportionate intervals
Ratio between + can accommodate absolute zero variable

• Named + Ordered + Proportionate intervals


Interval between variables

• Named + Ordered variables


Ordinal or Ranking

• Named variables
Nominal
46 The figure shows the classification of the different measurement scales.
Classification of Measurement Data

Measurement Scale

Qualitative Scale Quantitative Scale


(Non-numeric) (Numeric)

Nominal Output Interval Ratio

47 a) Nominal
Classification of Measurement Data

Example: A telecommunications company manufactures and distributes


cellular phones. Observations on the phone output could be taken and
recorded in any of the following methods:
a) Classify the product by some dichotomy, such as defective or not
defective, type of Subscriber Identification Module (micro or nano),
acceptable or not acceptable, internet speed (3G or 4G)

48 b) Ordinal
Classification of Measurement Data c) Ratio
b) Classify the product into three or more categories according to d) Ratio
some characteristics such as: good, better, best.
c) Inspect each handy phone to determine the storage capacity:
32GB, 64 GB, or 128 GB
d) Count the number of units produced per day for a given number
of days to determine the average daily production and measure its
variation

49 e) Nominal
Classification of Measurement Data f) Ordinal

e) Measure some characteristics precisely, such the settings and


features
f) Rate the phone’s connectivity as excellent, average, poor
50

Sampling
The act, process, or technique of selecting suitable sample, or a
representative part of a population for the purpose of
determining parameters or characteristics of the whole
population

51 - A researcher/experimenter has a population about which to draw inferences.


SAMPLING - A set of n objects (called a sample) is to be selected from the population for study.
Population Sample

52 - Cochran (1963) developed the following formula to yield a representative sample


SAMPLING for proportions.
Cochran (1963) Taro Yamane (1967)
𝑧 2𝑝𝑞 𝑁 - If the researcher is not familiar with the behavior of the population, Yaro Yamen’s
𝑛𝑜 = 2 𝑛=
𝑒 1 + 𝑁𝑒2 formula (1980).
where: where:
𝑛𝑜
𝑒
= sample size
= desired level of precision
𝑛 = sample size Marginal error (e) is the probability of committing an error.
𝑒 = marginal error
𝑝 = estimated proportion of an
attribute 𝑁 = population size Level of confidence is (100 – e) is the probability of getting the correct result.
𝑞 =1−𝑝
𝑧 = found in statistical table

53 Answers:
SAMPLING a) n = 6875
Illustration: In a population of 22,000 students enrolled at b) n = 1492
Saint Louis University in a particular semester, what sample
size is needed to get an accurate result for a study using a c) n = 393
margin of error:
a) 1 %
b) 2.5 %
c) 5 %
54 Sampling is the procedure of gathering sampling units or observations from the
population.

Sampling Techniques
Probability Sampling

55 In simple random sampling, a sample size (n) is selected from a population (N) such
that each member of the population has an equal and independent chance of being
Simple Random drawn and included in the sample.
Sampling

Sometimes called lottery


sampling or raffle sampling Illustration: The president of a company wishes to select 5 of his 40 equally qualified
employees to be sent to a convention. This can be done by recording the name of each
employee on a separate slip of paper, mixing the slips of paper thoroughly, and then

56 This method consists of randomly selecting one unit and choosing additional elements
at equal intervals until the desired sample size is reached.
Systematic
Random Sampling

57 Under this method, the researcher selects simple random samples from each of the
subpopulations or strata of the population.
Steps:
Stratified Sampling 1. Divide the population into sub populations
2. From each stratum, obtain a sample random size proportional to the size of each
stratum.
58 The following table shows the share of each stratum (age group) if the desired sample
Stratified Sampling: Illustration B
size, n, is 1500.
Stratum
Population % share Sample (ni)
(age range)
30 – 44 1,500 1500 / 15000 = 10% 0.1 x 1500 = 150

20 – 29 4,500 4500 / 15000 = 30 % 0.3 x 1500 = 450

10 – 19 9,000 9000 / 15000 = 60 % 0.6 x 1500 = 900

Total: 15,000
1500

59 First, determine the population. Then using a margin of error, say 5%, determine the
Stratified Sampling: Illustration A
Number of
sample size using Yamane’s formula. n=363 (round up).
Sample
Department Students
(
(
Next, determine the number of respondents per group
Business Administration 1,500 140
Management 1,200 112
- Determine the proportion of the sample size and the population size. p=n\N =
Finance 850 79 362.79/3900 = 0.09302
Entrepreneurship 200 19
Culinary Arts 150 14
- Multiply each subpopulation (Ni) by the computed proportion)
Total (N) 3,900 364

60 Cluster – any group of persons or experimental units having similar characteristics.


Cluster Sampling is useful when the members of the population are scattered
geographically.
Cluster Sampling Examples:
1) A salesman who markets pharmaceutical products uses different locations of
clusters of houses (instead of individuals) in marketing his product.
2) A researcher studying the work experience of nurses uses hospitals instead of

61 This method uses several stages or phases in getting random samples from the general
population.
This is useful in conducting nationwide surveys or any survey involving a very large
Multi-stage
Sampling population.
62 In non – probability sampling, the selection of units is solely determined by rules or
guidelines set by the researcher/investigator.

Sampling Techniques
Non - Probability Sampling

63 The researcher lays down the criteria, and subjects that satisfy the criteria are included
in the sample.
Purposive
Sampling

64 The interviewer’s aim is just to fill the prescribed quota provided he follows the given
definite instructions about the section of the public he is to question.

Quota Sampling

65 The samples are selected according to the opinion of someone who is familiar with the
relevant characteristics of the population. Often used when the required sample is
small or when the population is highly heterogeneous.
Judgment Sampling
66 Snowball sampling is especially useful when populations are inaccessible or hard to
find.
In snowball sampling, the investigator begins by identifying someone who meets the
Snowball Sampling criteria for inclusion in the study. Then he asks them to recommend others who they
may know who also meet the criteria.

67

Methods of Data Presentation


Presentation of Data
Stem-and-Leaf Plots
The Frequency Distribution Table
Graphical Presentation of Frequency Distribution

68 Textual Presentation – the use of words, statements, and paragraphs to present data
Presentation of Data or information
Data may be presented in various ways:
Graphical Presentation – a method wherein the set of data is presented by visual
▪ Textual
▪ Graphical (e.g. pie charts, bar charts) forms called graph.
▪ Tabular
Tabular Presentation – use of tables. One of which is the frequency distribution table.

69
Stem-and-leaf Plots

• Data are sorted according to a pattern which involves separating a number into
two parts, usually the first digit and the other digits.
70
Stem-and-leaf Plots

Stem Leaves Frequency (f)

1. Decide what units will be used for the stems and


for the leaves
2. Place the stems in a column (from smallest stem
to largest)
3. Enter the leaf from each measurement into the
row corresponding to the appropriate stem.

71
Steam-and-leaf Plots
Data of the daily price Stem-and-leaf Plot for the given
quotations for a certain stock data
over a period of 20 days Stem Leaves Frequency (f)
1 015 3
10 11 15 23 27
2 378 3
40 41 44 45 46 3 8899 4
4 014566 6
28 38 38 39 39
5 278 3
46 52 57 58 65 6 5 1
Total N=20

72 A frequency distribution for qualitative data lists all categories and the number of
Frequency Distribution Table elements that belong to each of the categories.
A sample of rural country arrests gave the
following set of offenses with which Frequency Table
individuals were charged:
Offense Tally Frequency
rape theft burglary
Rape II 2
robbery arson murder
burglary burglary murder Robbery III 3

arson theft theft Burglary III 3


murder robbery theft Arson III 3
robbery rape theft theft Murder III 3
manslaughter theft manslaughter
Theft IIIII - III 8
arson theft manslaughter
Manslaughter III 3

73

Frequency Distribution
Table for Ungrouped
➢ Relative Frequency – tabular arrangement of data
Data showing the proportion of each frequency to the total
𝑓
Large masses of data can be frequency. 𝑅𝐹 = . It may be expressed in decimal or in
𝑁
analyzed better and quicker when percentage.
organized and arranged in some
meaningful order like in a ➢Cumulative Frequency of each score equals sum of its
frequency and the frequencies of all the scores below it.
frequency distribution table.
74
Frequency Distribution Table for Ungrouped Data
A class of 20 students receive the Relative Frequency and Cumulative
following scores on a quiz of 35 Frequency Distribution Table
points: Score Tally f RF (%) CF
35 I 1 5 20
30 35 28 26 32 34 0 0 19
33 II 2 10 19
32 29 32 33 31 32 IIII 4 20 17
31 III 3 15 13
28 29 29 32 33
30 I 1 5 10
29 29 27 31 31 29 IIIII 5 25 9
28 II 2 10 4
27 I 1 5 2
26 I 1 5 1
Total 20 100

75
Frequency Distribution Table for Ungrouped Data
Interpretation: Relative Frequency and Cumulative
Frequency Distribution Table
Score Tally f RF (%) CF
35 I 1 5 20
▪ 5 % of the class got a perfect score of 35 34 0 0 19
points
33 II 2 10 19
▪ Half of the class got more than 30 points
(because 10 under CF column) is half of 20. 32 IIII 4 20 17
▪ The highest percentage is in the score 29 31 III 3 15 13
which means 25% of the class obtained a 30 I 1 5 10
score of 29, followed by the score 32 with
29 IIIII 5 25 9
20% of the class obtaining it.
28 II 2 10 4
27 I 1 5 2
26 I 1 5 1
Total 20 100

76
➢ Interval Width – the number of units from the lower
class limit to the upper class limit
Frequency Distribution ➢ Daniel (1999) – cited the Sturges’s rule (1926) as a guide
in the matter of deciding how many class intervals are
Table for Grouped Data needed.

An arrangement class intervals 𝑘 = 1 + 3.322(log10𝑛)


and corresponding frequencies in
a table.
Where:
𝑘 – represents the number of class intervals
𝑛 – the number of values or observations in the data set

Frequency Distribution Table for Grouped Data


77
Steps:
1. Compute for Range, 𝑅 =  𝑋h − 𝑋𝐿
2. Compute for number of intervals, 𝑘 = 1 + 3.322(log10𝑛)
𝑅
3. Compute for class interval width, 𝑖 =
𝑘
4. Identify the limit of each intervals
5. Tally corresponding to their class intervals
6. Compute for each class interval frequency
𝑈𝑝𝑝𝑒𝑟 𝐿𝑖𝑚𝑖𝑡 + 𝐿𝑜𝑤𝑒𝑟 𝐿𝑖𝑚𝑖𝑡
7. Compute for class midpoint(mark), 𝑀𝑘 =
2
8. Identify Class Boundaries, 𝐶𝐵, true limits (± 0.5)
78
11 12 12 13 15 15 16
Example:
17 20 21 21 21 22 22

22 23 24 26 27 27 27
The following data are the time, in
minutes, it took a group of 28 29 29 30 31 32 34
volunteer workers to perform a
given task. Construct a frequency 35 37 41 41 42 45 47
distribution table.

50 53 56 60 62 52 21

Frequency Distribution Table: Grouped Data


79 1. Compute for Range, R
2. Compute for number of intervals, k
11 12 12 13 15 15 16 R = Xh − Xl 3. Compute for class interval width, i
1 = 62 − 11 R
17 20 21 21 21 22 22 i=
R = 51 k
22 23 24 26 27 27 27 51
3 =
6
28 29 29 30 31 32 34 = 8.5
k = 1 + 3.322 log N
2 = 1 + 3.322 log 42 i = 9 rounded up
35 37 41 41 42 45 47
= 6.3924
50 53 56 60 62 52 21 k = 6 rounded off

Frequency Distribution Table: Grouped Data


80 1. Identify the limit of each intervals
4 5 6 7 8
f
N
2. Tally corresponding to their class intervals
R = 51
k=6
i=9
Class
Class Class
RF
3. Compute for each class interval frequency
Tally Frequency Mark Boundaries CF
11

17
12

20
12

21
13

21
15

21
15

22
16

22
Interval

11-19 IIIII-III 8 15 10.5-19.5


(%)

19.05 8
4. Compute for class midpoint(mark),
IIIII-IIIII-
22

28
23

29
24

29
26

30
27

31
27

32
27

34
20-28

29-37
IIIII
IIII-III
15

8
24

33
19.5-28.5

28.5-37.5
35.71

19.05
23

31
5. Identify Class Boundaries, , true limits ()
38-46 IIII 4 42 37.5-46.5 9.52 35
35 37 41 41 42 45 47 47-55 IIII 4 51 46.5-55.5 9.52 39
56-64 III 3 60 55.5-64.5 7.14 42
50 53 56 60 62 52 21
Total 42 100
Interpretation: The highest percentage 35.71% which means that most group of

81

Graphical Presentation
82
Graphical Presentation

HISTOGRAM

▪ A common graphic way of


presenting interval or ratio data
▪ Present the distribution of the data
and the distribution is an
extremely important
characteristics

83
Graphical Presentation

POLYGONS

▪ A variation of the histogram in


which vertical bars are
replaced by dots that are
connected to form a line
graph

84
Graphical Presentation

OGIVES

▪ A cumulative frequency graph


▪ Useful when we want to know
how many scores are above
or below some level

85
Graphical Presentation
Frequency Distribution for the time(in minutes) a group of volunteer workers to perform a given task
16

12
FREQUENCY

0
11-19 20-28 29-37 38-46 47-55 56-64
CLASSES
86
Graphical Presentation
Frequency Distribution for the time(in minutes) a group of volunteer workers to
perform a given task
16
15

12
FREQUENCY

8 8
8

4 4
4
3

0
11-19 20-28 29-37 38-46 47-55 56-64
CLASSES

87
Graphical Presentation

You might also like