Professional Documents
Culture Documents
StatisticsNotes1 PDF
StatisticsNotes1 PDF
Notes # 1
Statistics
Engineers and scientists are constantly exposed to collections of facts, or data, both in
their professional capacities and in everyday activities.
When desired information is available for all objects in the population, we have a census.
Constraints on time, money, and other scarce resources usually make a census impractical
or infeasible. Instead, a subset of the population—a sample—is selected in some
prescribed manner. Thus we might obtain a sample of bearings from a particular
production run as a basis for investigating whether bearings are conforming to
manufacturing specifications, or we might select a sample of last year’s engineering
graduates to obtain feedback about the quality of the engineering curricula.
Variables
We are usually interested only in certain characteristics of the objects in a population that
we call a variable; for example, the number of flaws on the surface of each casing, the
thickness of each capsule wall, the gender of an engineering graduate, the age at which
the individual graduated, and so on.
1
A variable may be qualitative (categorical), such as gender or type of malfunction, or it
may be quantitative (numerical) in nature. Quantitative variables can be further
classified into discrete or continuous data. Discrete variables assume values that can
be counted (for example, number of books, number of defects, etc.). Continuous
variables can assume all values between any two specific values (for example: length,
time, etc.).
Scales of Measurement
Data can be measured at different scales depending on the type of variable and the
amount of detail that is collected. Qualitative or categorical data can either be in the
nominal or ordinal scale, whereas quantitative or numerical data can either be in the
interval or ratio scale. A widely used method for categorizing the different types of
measurement breaks them down into four groups starting from the lowest level to the
highest level. Nominal data are categorical data in which no ordering or ranking can be
imposed on the data. (e.g. gender: male or female). Ordinal data are categorical data
that can be ordered or ranked. (e.g. rating scale - poor, good, excellent; size: small,
medium, large). Interval data are numerical data that can be ranked and the differences
between units of measure do exist; however, there is no true starting point or absolute
zero. (e.g. sea level, temperature, angle measure, time, etc.). Ratio data are numerical
data that can be ranked; the differences and ratios between units of measure do exist, and
there exists a true starting point or absolute zero. (e.g. length, mass, volume, etc.)
Branches of Statistics
An investigator who has collected data may wish simply to summarize and describe
important features of the data. This entails using methods from descriptive statistics.
Some of these methods are graphical in nature; the construction of histograms, bar chart,
pie chart, line graphs, and frequency distribution table are some of the primary examples.
Having obtained a sample from a population, an investigator would frequently like to use
sample information to draw some type of conclusion (make an inference of some sort)
about the population. Techniques for generalizing from a sample to a population are
gathered within the branch of our discipline called inferential statistics.
2
Applications of Statistics
Collecting Data
Statistics deals not only with the organization and analysis of data once it has been
collected but also with the development of techniques for collecting the data. If data is
not properly collected, an investigator may not be able to answer the questions under
consideration with a reasonable degree of confidence.
When data collection entails selecting individuals or objects from a frame, the simplest
method for ensuring a representative selection is to take a simple random sample. This is
one for which any particular subset of the specified size (e.g., a sample of size 100) has
the same chance of being selected.
For example, if the frame consists of 1,000 serial numbers, the numbers 1, 2, . . . , up to
1,000 could be placed on identical slips of paper. After placing these slips in a box and
thoroughly mixing, slips could be drawn one by one until the requisite sample size has
been obtained.
Sometimes alternative sampling methods can be used to make the selection process
easier. One such method is by systematic random sampling which starts the sampling
procedure with a random start, that is, a starting point is chosen at random, and choices
thereafter are at regular intervals. For example, suppose you want to sample 8 houses
from a street of 120 houses. 120/8=15, so every 15th house is chosen after a random
starting point between 1 and 15. If the random starting point is 11, then the houses
selected are 11, 26, 41, 56, 71, 86, 101, and 116.
3
Measures of Center for Ungrouped Data
One way of describing a set of data is by stating a single numerical value associated with
it. This value is where all the other values in a distribution tend to cluster. There are three
measures of center: the mean, the median, and the mode.
The mean is the sum of the values divided by the number of values. It is also called the
average. Thus if x1, x2,. . ., xn denotes the n values, then the mean x is
x i
x i 1
The median ~ x refers to the midpoint in a sequence of numbers. To find the median,
arrange the numbers from smallest to largest. If the number of values is odd, then the
median is the middle value. If the number of values is even, then the median is the
average of the two middle values.
The mode x̂ of a set of values is the value that occurs most often. A set of values may
have more than one mode or no mode.
Example 4: Find the mode of 15, 21, 26, 25, 21, 23, 28, 21
Example 5: Find the mode of 12, 15, 18, 26, 15, 9, 12, 27
4
Exercises
For each set of numbers, calculate the mean, the median, and the mode.