You are on page 1of 7

ENGINEERING DATA ANALYSIS are no mistakes in experiment set-up, it can be

Roles of Statistics and the Data Analysis Process observed that for every sample, different values
Statistics of strength may come up.
 defined as the branch of science that deals with Epistemic Uncertainties
collection, presentation, organization, and  Those that are caused by an “incomplete”
interpretation of data understanding of reality. For simplicity of
 widely used in many different fields of science understanding of some physical
and technology, as well as in our everyday lives. phenomena, assumptions must be made.
Examples of applicants of statistics include (but not However, these assumptions may not
limited to); capture what is actually happening in
 Population census (as implemented by the reality.
government);  An example of epistemic uncertainty is
 Public choices/ responses (e.g., surveys related determination of beam deflections. An
to elections and government service important assumption in calculating beam
satisfaction); deflections is that the beam material must
 Product advertisements (e.g., comparison of be perfectly elastic. However, most
Brand X vs Brand Y); materials are elastic at certain stresses only;
 Teaching and instruction (e.g., student’s meaning, at some large magnitude of load
performance, examination item analysis etc.) applied on the beam, the beam material
 Scientific observations and experiments (e.g., may behave differently from that of a
clinical trials for medicines and vaccines, perfectly elastic material. If
development of new technologies etc.); and experimentation is done for this, it is
 Engineering data collection (e.g., engineering expected that there will be a difference
soil properties, testing of materials, etc.) between the expected value (calculated
Uncertainty and Variability value) and the observed value (measured
Uncertainty value).
 Occurs when the true value of a certain DATA ANALYSIS PROCESS
quantity at a single instance is unknown 1. Understanding the nature of the problem.
 Uncertainty is derived from theoretical Effective data analysis requires an
information (i.e., it is expressed in terms of understanding of the research problem. We
probabilities) must know the goal of the research and what
Variability questions we hope to answer. It is important to
 Occurs when the said quantity is measured at have a clear direction before gathering data to
multiple instances and there are considerable ensure that we will be able to answer the
differences between the said measurement questions of interest using the data collected.
trials. 2. Deciding what to measure and how to
 Derived form data extracted from observations measure it. The next step in the process is
and experiments (i.e., it is expressed in terms of deciding what information is needed to answer
frequencies). the questions of interest. In some cases, the
 Uncertainty emerges because of variability choice is obvious.
SOURCES OF UNCERTAINTY 3. Data collection. The data collection step is very
Aleatory uncertainties important. The researcher must first decide
 those are caused by natural randomness. It is whether an existing data source is adequate or
natural in reality that the properties of several whether new data must be collected. If a
samples of a certain object may be different. decision is made to use existing data, it is
 An example of occurrence of aleatory important to understand how the data were
uncertainty is determination of compressive collected and for what purpose, so that any
strength of sandstone from a single source. If resulting limitations are also fully understood. If
experimentation is done, assuming that there new data are to be collected, a careful plan
must be developed, because the type of 2. Numerical (quantitative): if the individual
analysis that is appropriate and the conclusions observations are expressed as numbers.
that can be drawn depend on how the data are  Numerical data may be further classified as
collected. follows:
4. Data summarization and preliminary analysis. 1. Discrete: if possible values of the variable/s
After the data are collected, the next step is correspond to isolated points on the
usually a preliminary analysis that includes number line; and
summarizing the data graphically and 2. Continuous: if possible values of the
numerically. This initial analysis provides insight variable/s correspond to all points inside an
into important characteristics of the data and interval on the number line.
provides guidance in selecting appropriate  Depending on the number of variables involved,
methods for further analysis. data may be classified as either:
5. Formal data analysis. The data analysis step 1. Univariate: if a data set consists
requires the researcher to select appropriate observations on a single variable; or
statistical methods. 2. Multivariate: if a data set consists
6. Interpretation of results. The interpretation observations on two or more variables.
step often leads to the formulation of new DATA COLLECTION & SAMPLING TECHNIQUES
research questions. These new questions lead Data Collection Methods
back to the first step. In this way, good data  Use of Documented Data: Available research
analysis is often an iterative process. data may be used such as government data and
data from related studies or researches.
 A population refers to the entire collection of  However, caution must be exercised when
individuals or objects about which information using documented data, especially
is desired, while a sample is a representation of secondary data (i.e., data documented by
the population where this group comes from, entities other than the actual data
i.e., making it a subset of a population. collection).
 A summary measure that describes a specific  Surveys: A survey is a method of collecting data
characteristic of a sample is called a statistic, on the variable of interest by asking people to
while a summary measure that describes a answer a set of carefully written questions
specific characteristic of a population is called a called a questionnaire.
parameter. This means that the data obtained  A survey comprising an entire population is
from descriptive statistics are examples of a called a census while a survey comprising
statistic while the data obtained from only a sample of the population is called a
inferential statistics are examples of a sample survey.
parameter.  Surveys are usually performed if the study
 Designed research that provides information involves human behavior such as consumer
needed to solve a certain research problem is studies and election surveys.
called a statistical inquiry.  Experiments: An experiment is a method of
DATA collecting data where there is a direct human
 A data is a collection of observations on one or intervention on the conditions that may affect
more variables. the values of the variable of interest.
 A variable is a characteristic whose value may  Variables that may be directly manipulated
change from one observation to another. are called independent variables while
CLASSIFICATION OF DATA variables that cannot be manipulated but
 Data may be depending on its nature and on can have their values changed are called
the number of involved variables. Depending on dependent variables.
its nature, data may be classified as either:  Most scientific studies with multivariate
1. Categorical (qualitative): if the individual data involve experimentation.
observations are categorical responses; or
 Observations: An observation is a method of  Cluster Sampling: It is a type of sampling
collecting data on the phenomenon of interest method that involves dividing the population of
by recording the observations made about the interest into non-overlapping subgroups called
phenomenon as it actually happens. clusters, and then these clusters are selected at
 Examples of studies involving observations random, with all individuals in the selected
include weather and climate, earthquake, clusters are included in the sample.
and astronomical studies.  Systematic Sampling: It is a sampling method
Sampling that can be used when it is possible to view the
The process of obtaining or selecting samples from a population of interest as consisting of a list or
population related to a study is called sampling. some other sequential arrangement.
DESIGN OF EXPERIMENTS
Sampling Bias  An experiment is a method of collecting data
 Sampling must be carefully performed for where there is a direct human intervention on
improper sampling may cause bias, which is the the conditions that may affect the values of the
tendency for samples to differ from the variable of interest.
corresponding population in some systematic  It is a study in which one or more
way. explanatory variables are manipulated in
 Bias results either from the sampling itself or order to observe the effect on a response
from the way in which data is obtained once the variable.
sample was chosen.  Explanatory variables are independent
 The three most common types of bias in variables or factors, those that have values that
sampling are as follows: are controlled by the experimenter
 Selection Bias: Tendency for samples to  Response variables are dependent variables,
differ from the population as a result of a those that are thought to be related to the
systematic exclusion of some part of the explanatory variables in an experiment. These
population. are measured as part of the experiment, but
 Measurement or Response Bias: Tendency not controlled by the experimenter.
for samples to differ from the population  An experimental study involves several set-ups,
because the method of observation tends called experimental conditions or treatments,
to produce values that differ from the true to observe the relationship between the
values. independent and the dependent variables.
 Nonresponse Bias: Tendency for samples to  The main goal of an experiment is to determine
differ from the population because data are the effects of independent variables on the
not obtained from all individuals selected dependent variables.
for inclusion in the sample.  A well-designed experiment requires not just
Sampling Methods manipulating the independent variables, but
 Random Sampling: It is a sampling method that also eliminating the effects of other variables
ensures that every different sample of a certain not involved in the study on the dependent
size has an equal chance of being chosen as the variables.
sample. Random sampling may be done as  These variables that are not included as
either sampling without replacement (once independent variables but may affect the
chosen, cannot be chosen again) or sampling dependent variables are called extraneous
with replacement (once chosen, may not be variables.
chosen again).  If extraneous variables are left alone, then
 Stratified Random Sampling: It is a type of the independent variables may be
sampling method that divides the population confounded, i.e., if their effects on the
into a set of non-overlapping subgroups, and dependent variable cannot be distinguished
then random sampling is done for each of these from one another.
subgroups.
Strategies for design of experiments which can control
the effect of extraneous variables may be employed, as
follows:
 Random Assignment: Random assignment (of
subjects to treatments or of treatments to
trials) to ensure that the experiment does not
systematically favor one experimental condition
(treatment) over another.
 Blocking: Using extraneous variables to create
groups (blocks) that are similar. All
experimental conditions (treatments) are then
tried in each block.
 Direct Control: Holding extraneous variables so
that their effects are not confounded with those
of the experimental conditions (treatments).
 Replication: Ensuring that there is an adequate
number of observations for each experimental
condition.
DATA PRESENTATION & ORGANIZATION
Excerpt taken from the business section of the
Philippine Star:
1. “The 30-company Philippine Stock Exchange
Index finished down 10.21 points, or 0.5
percent, at 1921.33, after failing 0.8 percent
Tuesday following a seven-day rally that
boosted the main index by 6.4 percent.
Weighing on the index were losses incurred by GRAPHICAL
Globe Telecom, down by 2.4 percent at Php 830 Portrays numerical figures or relationships among
Ayala Land, off 1.2 percent at php 8, and variables in pictorial form.
Jollibee, lower by 3.4 percent at Php 28.50 on Raw Dara and Array: Raw data are data in their original
profit taking.” form while array is an ordered arrangement of data
according to magnitude (also called sorted data or
2. “Partly offsetting the market’s losses were Ayala ordered data).
Corporation, up 1.4 percent at 6.30, and Bank of
the Philippine Islands, which rose one percent
at Php 50 on bargain hunting. Ayala unit Manila
Water rose 1.7 percent to Php 6.10 after the
water utility Tuesday posted a 57-percent year-
on-year rise in first -quarter net profit.”

3. “All sectoral indicators ended lower, except the


oil sub index, which finished higher. Declines led
gainers 46 to 21, while 51 stocks were
unchanged.”
Frequency Distribution: LINE CHART
 It is a way of summarizing data by showing the It is a way of presenting data by connecting data points
number of observations that belong in the with line segments. Line charts are usually used for
different categories or classes (also called group temporal data (data sets with time as one of the
data). variables). More than one line chart may be drawn on a
 Frequency may be expressed as absolute single plot area, especially if comparisons are to be
frequency (actual frequency) or relative made.
frequency (normalized frequency; for each
class, relative frequency is equal to the absolute
frequency of the class divided by the total
absolute frequency for all classes).
 Frequency distribution may be presented in
graphical form as either a frequency histogram
(vertical bar chart), a frequency polygon (line
chart), or an ogive (line chart showing
cumulative frequency distribution).

BAR CHART
It is a way of presenting data by using either vertical
bars or columns (vertical/ column bar charts) or
horizontal bars (horizontal bar charts). Bar charts are
usually used for frequency histograms and categorical
data sets.

PIE CHART
It is a way of presenting data by utilizing the area of a
circle for easier comparison by dividing it into several
sectors depending on the relative frequencies. Pie
charts are usually used for categorical data.

PICTOGRAPH
It is a way of presenting data similar to a horizontal bar
chart but using symbols or pictures to represent the
magnitudes of data. Pictographs are usually used for
categorical data.
SCATTERPLOT
It is a way of presenting data by plotting the data as a
set of scattered points. It is used to determine
STATISTICAL MAP relationship between two variables for bivariate data
It is a way of presenting data that makes use of maps sets.
and colors or shades to represent magnitudes of data.
Statistical maps are used in represent magnitudes of
data. Statistical maps are used in representing
geographic data.

CONSTRUCTING A FREQUENCY DISTRIBUTION


Constructing a frequency distribution may be done with
the following step-by-step procedures:
1. Arrange the raw data into an array.
2. Divide the data into classes. It is preferred to
DOTPLOT have equal interval sizes for each class. The
It is a way of representing data similar to a column bar interval size may be selected but having too
chart that makes use of dots to represent the data small or too large of an interval size. If one has
magnitudes. Dotplots are used for numerical data with no idea on what interval size should be used,
small number of observations. Sturges’ rule may be used, i.e., the
recommended number of classes K should be
equal to K=1+ 3.322log n
3. Determine the range of the data (equal to the
difference between the largest and the smallest
values of observation) and the determine the
STEM-AND-LEAF DISPLAY interval size by dividing the range by the
It is an effective and compact way to summarize raw number of intervals.
numerical data by using the first few digits as “stems” 4. Make a table listing the class intervals, the class
and the last digits as “leaves”. Stem-and leaf displays marks (midpoint of each class) and the number
are used for numerical data with small to moderate of observations falling within their respective
number of observations. intervals. You may use either absolute or
relative frequencies.
5. Plot the frequency histogram or frequency
polygon of the said frequency distribution.
6. If ogives are to be plotted, add another column
for cumulative frequency distributions (either
increasing/ less than ogive or
decreasing/greater than ogive).
SAMPLE PROBLEM
Consider the following data for the annual rainfall
intensity recorded in a certain place over a period of 29
years.
43.30 53.02 63.52 45.93 48.26 50.51 49.57 43.93 46.77
59.12 54.49 47.38 40.78 45.05 50.37 54.91 51.28 39.91
53.29 67.59 58.71 42.96 55.77 41.31 58.83 48.21 44.67
67.72 43.11
Make a frequency distribution and plot its frequency
histogram, frequency polygon, and ogives for the said
data.
1. Rearrange the raw data into an array. (This step
is skipped and left as an exercise for students.)
2. Divide the data into classes. Since there are
n=29 observations, the recommended number
of classes according to Sturges’ rule is equal to
K=1+ 3.222log n=1+3.222 log 29=5.858
3. Determine the class size. In this example, the
largest observation value is 67.72 while the
smallest observation value 39.91. Therefore,
the range is equal to 27.81. The class size is
27.81/6=4.635.
4. Make a table showing the frequency
distribution of the given data. The following
table is from the MS Excel worksheet made for
this example.

The plots for frequency histograms (top), frequency


polygons (middle), and ogives (bottom) are shown,
using absolute (left)and relative (right) frequencies. MS
Excel is also used for plotting.

You might also like