You are on page 1of 6

22/01/2020

The Challenge

1: Introduction to Statistics • With the advancement in sciences and engineering occurring in large
part through the collection and analysis of data, proper analysis of

and Data Analysis data can be challenging, because scientific data are subject to random
variation.
EM 7: Engineering Data Analysis • How can one draw conclusions from the results of an experiment
Pamantasan ng Lungsod ng Valenzuela when those results could have come out differently?
• The method of statistics allow scientists and engineers to design valid
experiments and to draw reliable conclusions from the data they
produce.

The Basic Idea Two fields of statistics


• The basic idea behind all statistical methods of data analysis is to • INFERENTIAL STATISTICS is the process of using data analysis to make
make inferences about a population by studying a relatively small predictions (“inference”) from that data.
sample from it. • DESCRIPTIVE STATISTICS are used to describe the basic features in the
• For example, consider a machine that makes steel balls for ball study, in the form of charts, graphs, plots, etc.
bearings used in clutch systems. The specification for the diameter of
the balls is 0.65 ± 0.03 cm. During the last hour, the machine has
made 2000 balls. The QE wants to know how many of these balls
meet the specifications. He does not have the time to measure all
2000 balls, so he draws a random sample of 80 balls, 72 of which
(90%) meet the specifications. (How can he be sure that 90% of the
whole population meet the specifications)?

Sampling Sampling
DEFINITION
• A population is the entire collection of objects or outcomes about which
Sample information is sought.
Population • A sample is a subset of a population, containing the objects or outcomes that
are actually observed.
Think of a lottery consisting of 10,000 tickets and 5 winners will be
chosen. What is the fairest way to choose the winners?
For example, we wished to study the heights of students at PLV by
measuring a sample of 100 students.
A simple random sample of size n is a sample chosen by a method in
• How should we choose the 100 students to measure? which each collection of n population items is equally likely to comprise
the sample, just as in a lottery.

1
22/01/2020

Sampling Sampling
EXAMPLE: A utility company wants to conduct a survey to measure the EXAMPLE: A quality engineer wants to inspect electronic microcircuits
satisfaction level of its customers in a certain town. There are 10,000 in order to obtain information on the proportion that are defective. She
customers in the town, and utility employees want to draw a sample of decide to draw a sample of 100 circuit from a day’s production. Each
size 200 to interview personally. They obtain a list of all 10,000 hour for 5 hours, she takes the 20 most recently produced circuits and
customers, and number them from 1 to 10,000. They use a computer tests them. Is this a simple random sample?
random number generator to generate 200 random integers between 1
and 10,000 and then contact the customers who correspond to those
numbers. Is this a simple random sample?

Sampling Sampling
EXAMPLE: A construction engineer has just received a shipment of If, for example, a quality inspector draws a random sample of 40 bolts
1000 concrete blocks, each weighing approximately 25 kilograms. The from a large shipment, measures the length of each and finds that 32
blocks have been delivered in a large pile. The engineer wishes to of them (80%) meet a length specification. By chance, a second
investigate the compressive strength of the blocks by measuring the inspector got a few more good bolts, about 90% in her sample. The
strengths in a sample of blocks. What is the more appropriate method proportion of good bolts in the population is likely to be close to 80%
of selecting random samples? or 90%, but it is not likely that it is exactly equal to either value.
DEFINITION DEFINITION
• A sample of convenience is a sample that is not drawn by a well-defined • A sampling variation happens when two or more different samples from the
random method. same population will differ from each other as well.

Tangible vs. Conceptual Populations Independence


DEFINITION DEFINITION
• A tangible population is a population consist of actual physical objects that are • The items in a sample are said to be independent if, knowing the values of some
countable and always finite. After an item in a tangible population is sampled, of them does not help to predict the values of the others.
the population size decreases by 1.
• A conceptual population happens when all the values that might possibly occur
have been observed from a simple random sample. A simple random sample For example, if we draw a simple random sample of 2 items from the
may consist of values obtained from a process under identical experimental population {0 0 1 1}, the sampled items are found to be dependent.
conditions. (Why?)
Example: A geologist weighs a rock several times on a sensitive scale. However, if we draw 2 samples from this population: {One million 0’s,
Each time, the scale gives a slightly different reading. Under what one million 1’s}, the sampled items are practically independent. (Why?)
conditions can these readings be though of as a simple random
sample? What is the population?

2
22/01/2020

Sampling Types of Data


DEFINITION DEFINITION
• It is possible to make a population behave as though it were infinitely large, by • When a numerical quantity designating how much or how many is assigned to
replacing each item after it is sampled, known as the sampling with each item in a sample, the resulting set of values is called numerical or
replacement method. quantitative.
• In some cases, if sample items are placed into categories, and category names
OTHER SAMPLING METHODS
are assigned to the sample items, the data are categorical or qualitative.
• Weighted sampling is when some items are given a greater chance of being
selected than others (ex., lottery in which some people have more tickets than Example: In a loading test of column-to-beam welded connections,
others.) data may be collected both on the torque applied at failure and on the
• Stratified random sampling is then the population is divided into location of the failure (weld or beam).
subpopulations known as strata, and a simple random sample is drawn from
each stratum. Quantitative variable: Torque
• Cluster sampling is when items are drawn from the population in groups or Qualitative variable: Location (weld or beam)
clusters.

Sample Mean
The sample mean, also known as the “arithmetic mean” or the
“average” is the sum of the numbers in a sample, divided by how many
there are.
DEFINITION
Let 𝑋 , … , 𝑋 be a sample. The sample mean is:
Summary Statistics 𝑋=
1
𝑋
𝑛

Sample Variance and Standard Deviation Sample Variance and Standard Deviation
The sample standard deviation is a quantity that measures the degree DEFINITION
of spread in a sample. The square of the sample standard deviation is Let 𝑋 , … , 𝑋 be a sample. The sample standard deviation is the quantity:
the sample variance.
1
DEFINITION 𝑠= 𝑋 −𝑋
Let 𝑋 , … , 𝑋 be a sample. The sample variance is the quantity: 𝑛−1
1 An equivalent formula can be used:
𝑠 = 𝑋 −𝑋
𝑛−1
An equivalent formula can be used: 1
𝑠= 𝑋 − 𝑛𝑋
𝑛−1
1
𝑠 = 𝑋 − 𝑛𝑋
𝑛−1

3
22/01/2020

Outliers Sample Median


Sometimes, a sample may contain a few points that are much larger or The median is a measure of center.
smaller than the rest. Such points are called outliers. This may result DEFINITION
from data entry errors, and needs to be scrutinized and should be If n numbers are ordered from smallest to largest:
corrected or deleted.
• If n is odd, the sample median is the number in the position .
• If n is even, the sample median is the average of the numbers in the positions
and + 1

Quartiles Quartiles
If the median divides the sample in half, quartiles divide it nearly as Example: In the article “Evaluation of Low-Temperature Properties of
possible into quarters. A sample has 3 quartiles. HMA Mixtures” (P. Sebasly, A. Lake, and J. Epps, Journal of
Transportation Engineering, 2002-578-583), the following values of
fracture stress (in Mpa) were measured for a sample of 22 mixtures of
Let n represent the sample size. hot-mixed asphalt (HMA).
First quartile: 0.25(𝑛 + 1) 30 75 79 80 80 105 126 138 149 179 191
Second quartile: 0.50(𝑛 + 1) 223 232 236 240 242 245 247 254 274 384 470
Third quartile: 0.75(𝑛 + 1)
Find the first and third quartiles.
Note that the second quartile is the same as the median.

Percentiles Percentiles
The pth percentile of a sample, for a number p between 0 and 100, Example: In the article “Evaluation of Low-Temperature Properties of
divides the sample so that as nearly as possible p% of the sample HMA Mixtures” (P. Sebasly, A. Lake, and J. Epps, Journal of
values are less than the pth percentile and (100-p)% are greater. Transportation Engineering, 2002-578-583), the following values of
Let n represent the sample size. fracture stress (in Mpa) were measured for a sample of 22 mixtures of
hot-mixed asphalt (HMA).
pth percentile: (𝑛 + 1)
30 75 79 80 80 105 126 138 149 179 191
223 232 236 240 242 245 247 254 274 384 470
Note that the 25th percentile is the 1st quartile, the median is the 50th
percentile and 2nd quartile, and the 75th percentile is the 3rd quartile. If
the quantity is an integer, that is the percentile, otherwise, get the Find the 65th percentile.
average of the two sample values on either side.

4
22/01/2020

Stem-and-leaf Plot
Example: The table below shows a study of the bioactivity of a certain
antifungal drug. The drug was applied to the skin of 48 subjects. After 3
hours, the amount of drug remaining in the skin were measured in
units of ng/cm2. The list has been sorted in numerical order.

Graphical Summaries 3
15
4
16
4
16
7
17
7
17
8
18
9
20
9
20
12
21
12
21
22 22 22 23 24 25 26 26 26 26
27 33 34 34 35 36 36 37 38 40
40 41 41 51 53 55 55 74

Stem-and-leaf Plot Dotplot


3 4 4 7 7 8 9 9 12 12
Stem-and-leaf plot: 15 16 16 17 17 18 20 20 21 21 A dotplot is a graph that can be used to give a rough impression of the
22 22 22 23 24 25 26 26 26 26 shape of a sample, useful when the sample size is not too large and
27 33 34 34 35 36 36 37 38 40 when the sample contains some repeated values.
40 41 41 51 53 55 55 74
Stem Leaf
0 34477899
1 22566778
2 001122234566667
3 34456678
4 0011
5 1355
6
7 4

Class interval Frequency Relative


Histogram Histogram (g/gal)
1≤x <3 12
frequency
0.1935
3≤x<5 11 0.1774
A histogram is a graphic that gives an idea of the “shape” of a sample, Example: The table on shows PM
5≤x<7 18 0.2903
indicating regions where sample points are concentrated and regions emissions of 62 vehicles driven at
where they are sparse. high altitude. 7≤x<9 9 0.1452
9 ≤ x < 11 5 0.0806
Example: The table on shows PM emissions of 62 vehicles driven at high Construct a frequency table.
altitude. 11 ≤ x < 13 1 0.0161
7.50 6.28 6.07 5.23 5.54 3.46 2.44 3.01 13.63 13.02 23.38 9.24 3.22 13 ≤ x < 15 2 0.0323
Data will be counted into several 15 ≤ x < 17 0 0.0000
2.06 4.04 17.11 12.26 19.91 8.50 7.81 7.18 6.95 18.64 7.10 6.04 5.66 class intervals. There is no hard
and fast rule as to how to decide 17 ≤ x < 19 2 0.0323
8.86 4.40 3.57 4.35 3.84 2.37 3.81 5.32 5.84 2.85 4.68 1.85 9.14
how many class intervals to use. 19 ≤ x < 21 1 0.0161
8.67 9.52 2.68 10.14 9.20 7.31 2.09 6.32 6.53 6.32 2.01 5.91 5.60
21 ≤ x < 23 0 0.0000
5.61 1.50 6.46 5.29 5.64 2.07 1.11 3.32 1.83 7.56
23 ≤ x < 25 1 0.0161

5
22/01/2020

Class Frequency Relative


interval frequency

Histogram (g/gal)
1≤x <3 12 0.1935
Skewness
3≤x<5 11 0.1774
5≤x<7 18 0.2903
7≤x<9 9 0.1452
9 ≤ x < 11 5 0.0806
11 ≤ x < 13 1 0.0161
13 ≤ x < 15 2 0.0323
15 ≤ x < 17 0 0.0000
17 ≤ x < 19 2 0.0323
19 ≤ x < 21 1 0.0161
21 ≤ x < 23 0 0.0000 Skewness refers to the asymmetry of a histogram; a symmetric histogram has its right
23 ≤ x < 25 1 0.0161 half a mirror image of its eft half. A histogram skewed to the left or negatively skewed
To construct a histogram: (1) determine the number of classes to use and construct intervals of equal has long left-hand tail. On the same hand, a histogram skewed to the right or
width; (2) compute the frequency and relative frequency for each class; and, (3) draw a rectangle for each positively skewed has long right-hand tail.
class, the heights of the rectangles may be set equal to the frequencies or to the relative frequencies.

Histogram Modes

Histogram mode refers to the “peak”, or local maximum in a histogram. A histogram is


said to be unimodal if it has only one peak or mode, and bimodal if it has two clearly
distinct modes.
Bimodal histogram indicates that the sample can be divided into two subsamples that
differ from each other in some scientifically important way.

You might also like