You are on page 1of 5

Probability, Statistics, and Data Analysis

Notes # 1

Statistics

Engineers and scientists are constantly exposed to collections of facts, or data, both in
their professional capacities and in everyday activities.

Statistics is a branch of mathematics that deals with the collection, classification,


description, analysis, and interpretation of data obtained from the conduct of surveys and
experiments. Its fundamental purpose is to draw inferences about the numerical
properties of a population.

Two important terms in studying statistics are population and sample.

Populations, Samples, and Processes

An investigation will typically focus on a well-defined collection of objects constituting a


population of interest. For example, the population might consist of all gelatin capsules
of a particular type produced during a specified period.

When desired information is available for all objects in the population, we have a census.

Constraints on time, money, and other scarce resources usually make a census impractical
or infeasible. Instead, a subset of the population—a sample—is selected in some
prescribed manner. Thus we might obtain a sample of bearings from a particular
production run as a basis for investigating whether bearings are conforming to
manufacturing specifications, or we might select a sample of last year’s engineering
graduates to obtain feedback about the quality of the engineering curricula.

As mentioned previously, because of the difficulties of obtaining information about all


units in a population, it is common to use a small, representative subset of the population,
called a sample. An actual value of a population variable is called a parameter. An
estimate of a parameter derived from a sample is called a statistic. The goal of statistics
is precisely this: to estimate the parameter using a statistic. And because our results are
merely estimates, we have some chance of being incorrect. This difference is called
sampling error.

Variables

We are usually interested only in certain characteristics of the objects in a population that
we call a variable; for example, the number of flaws on the surface of each casing, the
thickness of each capsule wall, the gender of an engineering graduate, the age at which
the individual graduated, and so on.

1
A variable may be qualitative (categorical), such as gender or type of malfunction, or it
may be quantitative (numerical) in nature. Quantitative variables can be further
classified into discrete or continuous data. Discrete variables assume values that can
be counted (for example, number of books, number of defects, etc.). Continuous
variables can assume all values between any two specific values (for example: length,
time, etc.).

Scales of Measurement

Data can be measured at different scales depending on the type of variable and the
amount of detail that is collected. Qualitative or categorical data can either be in the
nominal or ordinal scale, whereas quantitative or numerical data can either be in the
interval or ratio scale. A widely used method for categorizing the different types of
measurement breaks them down into four groups starting from the lowest level to the
highest level. Nominal data are categorical data in which no ordering or ranking can be
imposed on the data. (e.g. gender: male or female). Ordinal data are categorical data
that can be ordered or ranked. (e.g. rating scale - poor, good, excellent; size: small,
medium, large). Interval data are numerical data that can be ranked and the differences
between units of measure do exist; however, there is no true starting point or absolute
zero. (e.g. sea level, temperature, angle measure, time, etc.). Ratio data are numerical
data that can be ranked; the differences and ratios between units of measure do exist, and
there exists a true starting point or absolute zero. (e.g. length, mass, volume, etc.)

Branches of Statistics

An investigator who has collected data may wish simply to summarize and describe
important features of the data. This entails using methods from descriptive statistics.

Some of these methods are graphical in nature; the construction of histograms, bar chart,
pie chart, line graphs, and frequency distribution table are some of the primary examples.

Other descriptive methods involve calculation of numerical summary measures, such as


means, standard deviations, and correlation coefficients. The wide availability of
statistical software packages has made these tasks much easier to carry out than they used
to be.

Having obtained a sample from a population, an investigator would frequently like to use
sample information to draw some type of conclusion (make an inference of some sort)
about the population. Techniques for generalizing from a sample to a population are
gathered within the branch of our discipline called inferential statistics.

2
Applications of Statistics

These days statistical methodology is employed by investigators in virtually all


disciplines, including such areas as molecular biology (analysis of microarray data),
ecology (describing quantitatively how individuals in various animal and plant
populations are spatially distributed), materials engineering (studying properties of
various treatments to retard corrosion), marketing (developing market surveys and
strategies for marketing new products), public health (identifying sources of diseases and
ways to teat them), civil engineering (assessing the effects of stress on structural elements
and the impacts of traffic flows on communities).

Collecting Data

Statistics deals not only with the organization and analysis of data once it has been
collected but also with the development of techniques for collecting the data. If data is
not properly collected, an investigator may not be able to answer the questions under
consideration with a reasonable degree of confidence.

When data collection entails selecting individuals or objects from a frame, the simplest
method for ensuring a representative selection is to take a simple random sample. This is
one for which any particular subset of the specified size (e.g., a sample of size 100) has
the same chance of being selected.

For example, if the frame consists of 1,000 serial numbers, the numbers 1, 2, . . . , up to
1,000 could be placed on identical slips of paper. After placing these slips in a box and
thoroughly mixing, slips could be drawn one by one until the requisite sample size has
been obtained.

Alternatively (and much to be preferred), a table of random numbers or a scientific


calculator’s random number generator could be employed.

Sometimes alternative sampling methods can be used to make the selection process
easier. One such method is by systematic random sampling which starts the sampling
procedure with a random start, that is, a starting point is chosen at random, and choices
thereafter are at regular intervals. For example, suppose you want to sample 8 houses
from a street of 120 houses. 120/8=15, so every 15th house is chosen after a random
starting point between 1 and 15. If the random starting point is 11, then the houses
selected are 11, 26, 41, 56, 71, 86, 101, and 116.

There are many more sampling techniques employed in practice. It is important to


consider that the sample we choose should be a representative sample of the population
we are studying. The science of statistics guarantees that employing some degree of
randomness in our selection process achieves that goal.

3
Measures of Center for Ungrouped Data

One way of describing a set of data is by stating a single numerical value associated with
it. This value is where all the other values in a distribution tend to cluster. There are three
measures of center: the mean, the median, and the mode.

The mean is the sum of the values divided by the number of values. It is also called the
average. Thus if x1, x2,. . ., xn denotes the n values, then the mean x is

x i
x i 1

Example 1: Find the mean of 19, 13, 15, 25, and 18

The median ~ x refers to the midpoint in a sequence of numbers. To find the median,
arrange the numbers from smallest to largest. If the number of values is odd, then the
median is the middle value. If the number of values is even, then the median is the
average of the two middle values.

Example 2: Find the median of 19, 29, 36, 15, and 20

Example 3: Find the median of 67, 28, 92, 37, 81, 75

The mode x̂ of a set of values is the value that occurs most often. A set of values may
have more than one mode or no mode.

Example 4: Find the mode of 15, 21, 26, 25, 21, 23, 28, 21

Example 5: Find the mode of 12, 15, 18, 26, 15, 9, 12, 27

Example 6: Find the mode of 4, 8, 15, 21, 23

4
Exercises

For each set of numbers, calculate the mean, the median, and the mode.

1. 18, 24, 17, 21, 24, 16, 29, 18

2. 75, 87, 49, 68, 75, 84, 98

3. 55, 47, 38, 66, 56, 64, 44, 63, 39

You might also like