Engineering
Data Analysis
Point Estimation
Point Estimation
• 1. Descriptive Statistics • 6. Asymptotic Distributions of
Maximum Likelihood Estimators
• 2. Exploratory Data Analysis
• 7. Sufficient Statistics
• 3. Order Statistics
• 4. Maximum Likelihood • 8. Bayesian Estimation
Estimation • 9. More Bayesian Concepts
• 5. A Simple Regression Problem
2
6.1. Descriptive Statistics The weight of a miniature Baby Ruth candy bar could be
any number between 20 and 27 grams. Even though such
Descriptive statistics refers to a set of methods used times and weights could be selected from an interval of
to summarize and describe the main features of a dataset, values, times and weights are generally rounded off so that
such as its central tendency, variability, and distribution. the data often look like discrete data. If, conceptually, the
These methods provide an overview of the data and help measurements could come from an interval of possible
identify patterns and relationships. outcomes, we call them data from a distribution of the
continuous type or, more simply, continuous-type data.
In previous Chapter, we considered probability
distributions of random variables whose space S contains a Given a set of continuous-type data, we shall group the data
countable number of outcomes: either a finite number of into classes and then construct a histogram of the grouped
outcomes or outcomes that can be put into a one-to-one data. This will help us better visualize the data. The
correspondence with the positive integers. Such a random following guidelines and terminology will be used to group
variable is said to be of the discrete type, and its continuoustype data into classes of equal length (these
distribution of probabilities is of the discrete type. guidelines can also be used for sets of discrete data that
have a large range).
Of course, many experiments or observations of
random phenomena do not have integers or other discrete 1. Determine the largest (maximum) and smallest
numbers as outcomes, but instead are measurements (minimum) observations. The range is the difference,
selected from an interval of numbers. For example, you R =maximum − minimum.
could find the length of time that it takes when waiting in
line to buy frozen yogurt. Or the weight of a “1-pound” 2. In general, select from k = 5 to k = 20 classes, which
package of hot dogs could be any number between 0.94 are non over lapping intervals, usually of equal length.
pounds and 1.25 pounds. These classes should cover the interval from the minimum
to the maximum.
3. Each interval begins and ends halfway between two A frequency table is constructed that lists the class
possible values of themeasurements, which have been intervals, the class limits, a tabulation of the measurements
rounded off to a given number of decimal places. in the various classes, the frequency fi of each class, and the
class marks. A column is sometimes used to construct a
4. The first interval should begin about as much below relative frequency (density) histogram. With class intervals
the smallest value as thelast interval ends above the of equal length, a frequency histogram is constructed by
larges. drawing, for each class, a rectangle having as its base the
class interval and a height equal to the frequency of the
5. The intervals are called class intervals and the class. For the relative frequency histogram, each rectangle
boundaries are called class boundaries. We shall has an area equal to the relative frequency fi/n of the
denote these k class intervals by observations for the class. That is, the function defined by
(c0,c1],(c1,c2],...,(ck−1,ck].
fi h(x) = ( n)(ci − ci−1), for ci−1 < x ≤ ci,
i = 1,2,...,k,
6. The class limits are the smallest and the largest
possible observed (recorded) values in a class. is called a relative frequency histogram or density
histogram, where fi is the frequency of the ith class and n is
7. The class mark is the midpoint of a class. the total number of observations. Clearly, if the class
intervals are of equal length, the relative frequency
histogram, h(x), is proportional to the frequency histogram
fi, for ci−1 < x ≤ ci,i = 1,2,...,k. The frequency histogram
should be used only in those situations in which the class
intervals are of equal length. A relative frequency histogram
can be treated as an estimate of the underlying pdf.
Example 6.1-1 The weights in grams of 40 miniature Baby
Ruth candy bars, with the weights ordered, are given in A relative frequency histogram of these data is given in
Table 6.1-1. Figure 6.1-1. Note that the total area of this histogram is equal
to 1. We could also construct a frequency histogram in which
We shall group these data and then construct a histogram the heights of the rectangles would be equal to the frequencies
to visualize the distribution of weights. The range of the of the classes. The shape of the two histograms is the same.
data is R = 26.7 − 20.5 = 6.2. The interval (20.5,26.7) could Later we will see the reason for preferring the relative
be covered with k = 8 classes of width 0.8 or with k = 9 frequency histogram. In particular, we will be superimposing
classes of width 0.7. (There are other possibilities.) We on the relative frequency histogram the graph of a pdf.
shall use k = 7 classes of width 0.9. The first class interval
will be (20.45,21.35) and the last class interval will be
(25.85,26.75). The data are grouped in Table Table 6.1-2 Frequency table of candy bar weights
Class Interval Class Limits Tabulation Frequency (fi) h(x) Class Marks
Table 6.1-1 Candy bar weights (20.45,21.35) 20.5–21.3 5 5/36 20.9
(21.35,22.25) 21.4–22.2 4 4/36 21.8
20.5 20.7 20.8 21.0 21.0 21.4 21.5 22.0 22.1 22.5
(22.25,23.15) 22.3–23.1 8 8/36 22.7
22.6 22.6 22.7 22.7 22.9 22.9 23.1 23.3 23.4 23.5 (23.15,24.05) 23.2–24.0 7 7/36 23.6
(24.05,24.95) 24.1–24.9 8 8/36 24.5
23.6 23.6 23.6 23.9 24.1 24.3 24.5 24.5 24.8 24.8
(24.95,25.85) 25.0–25.8 5 5/36 25.4
24.9 24.9 25.1 25.1 25.2 25.6 25.8 25.9 26.1 26.7
(25.85,26.75) 25.9–26.7 3 3/36 26.3
Suppose that we now consider the situation in which
we actually perform a certain random experiment n times,
obtaining n observed values of the random variable—say,
x1,x2,...,xn. Often the collection is referred to as a sample. It
is possible that some of these values might be the same, but
we do not worry about this at this time. We artificially
create a probability distribution by placing the weight 1/n
on each of these x-values. Note that these weights are
positive and sum to 1, so we have a distribution we call the
empirical distribution, since it is determined by the data
x1,x2,...,xn. The mean of the empirical distribution is
which is the arithmetic mean of the observations
x1,x2,...,xn. We denote this mean by x and call it the sample
mean (or mean of the sample x1, x2, ... , xn). That is, the
sample mean is
which is, in some sense, an estimate of μ if the latter is Many find that the right-hand expression makes the
unknown. Likewise, the variance of the empirical computation easier than first taking the n differences,
distribution is xi − x, i = 1, 2, ... , n; squaring them; and then summing.
There is another advantage when x has many digits to the
right of the decimal point. If that is the case, then xi − x
must be rounded off, and that creates an error in the sum
which can be written as of squares. In the easier form, that rounding off is not
necessary until the computation is completed. Of course,
if you are using a statistical calculator or statistics
package on the computer, all of these computations are
is, because we will see later that, in some sense, s2 is a done for you.
better estimate of an unknown σ2 than is v. Thus, the The sample standard deviation, s = √s^2 ≥ 0, is a
sample variance is measure of how dispersed the data are from the sample
mean. At this stage of your study of statistics, it is
difficult to get a good understanding or meaning of the
standard deviation s, but you can roughly think of it as
the average distance of the values x1,x2,...,xn from the
REMARK It is easy to expand the sum of squares; we mean x. This is not true exactly, for, in general,
have
but it is fair to say that s is somewhat larger, yet of the same
magnitude, as the average of the distances of x1,x2,...,xn from
x.
2 2
𝑛𝑥 𝑛(𝑛−1)𝑥 𝑛𝑥 𝑛(𝑛−1)𝑥
Example 6.1-2 There is an alternative way of computing , because =
[n/(n − 1)]v and
Rolling a fair six-sided die five times could result in the
following sample of n = 5 observations: v
x1 = 3,x2 =1,x3 = 2, x4 = 6,x5 =3.
In this case, (3 +1+2+6 +3)
𝑥= =3
5
¿ 1 + + + …+ + +…
And
It follows that s = = 1.87. We had noted that s can roughly
1! 2! 1! 2!
be thought of as the average distance that the x-values are
away from the sample mean x. In this example, the
distances from the sample mean, x = 3, are 0, 2, 1, 3, 0,
with an average of 1.2, which is less than s = 1.87. In
general, s will be somewhat larger than this average
distance.