You are on page 1of 31

# Populations, Samples, and Processes.

## Populations and Samples

Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

## STAT400. Chapter 1. Overview and Descriptive

Statistics
Natalia Tchetcherina

January 26, 28

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Categorical data
Numerical data.
Describing data by tables and graphs.
Categorical Data.
Discrete Data.
Continuous data.
Measures of Location
Measures of Variability
Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Statistics.
Statistics as a subject provides a body of principles and
methodology for designing the process of data collection,
summarizing and interpreting the data, and drawing conclusions or
generalities.
Examples.
Employment. Monthly, as part of the Current Population Survey,
the Bureau of Census collects information about employment
status from a sample of about 65,000 households. Households are
contacted on a rotating basis with three-fourths of the sample
remaining the same for any two consecutive months.
The survey data are analyzed by the Bureau of Labor Statistics,
which reports monthly unemployment rates.
Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Statistics.

Examples.
Gallup Poll. This, the best known of the national polls, produces
estimates of the percentage of popular vote for each candidate
based on interviews with a minimum of 1500 adults. Beginning
several months before the presidential election, results are regularly
published. These reports help predict winners and track changes in
voter preferences.

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Statistics.
Examples.
Making in medical research studies. Heart decease is the most
common cause of death in the industrialized nations. In the US
and Canada nearly 30 % of deaths each year are due to heart
deceases, mainly heart attack. Does regular aspirin intake reduces
deaths from heart attacks? The Harvard Medical School
conducted a landmark study to investigate. The people
participating in the study regularly took either aspirin or placibo (a
tablet with no active ingredient). Of those who took aspirin 0.9%
suffered heart attacks during the study. Of those who took placibo
1.7 % had heart attacks. Could we conclude that its beneficial for
people to take aspirin?
Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

## Populations, Units and Characteristics

Population is a well-defined collection of objects or subjects.
Studies involve the investigation of certain characteristic(s) of
members (called units) of population(s).
I

## All items of a certain manufactured product (that have, or will

be produced). Characteristic: Proportion of defectives.

## All students enrolled in Big Ten universities during the

2013-14 academic year. Characteristics: Favorite type of
music; Political affiliation.

## Two types of cleaning products. Characteristic: cleaning

effectiveness.
Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

## Variable = a Numerical Characteristic

In most examples considered above, the characteristics we
considered are quantities that can be measured and expressed as
numbers, e.g. thermal expansion of a metal, hardness of cement,
mercury concentration. Such characteristics are called
quantitative.
Examples of non-quantitative characteristics are gender, make of
car, eye color, strength category, political affiliation. Such
characteristics are called categorical or qualitative.
Because statistical procedures are applied to numerical data sets,
the categories in categorical characteristic are labeled with
arbitrarily chosen numbers (i.e. male= 1, female= +1).
A characteristic expressed as a number is called a variable.
Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Types of Variables

## Multivariate, e.g. age, income category, education level,

race, gender.

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Branches of statistics.

## Descriptive statistics.Summarizing and describing the

prominent features of data.

## Inferential statistics. Evaluation of the information present

in data (making conclusion).

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Categorical data
Numerical data.

## Categorical (qualitative) data

When the characteristic under study concerns a qualitative trait
that is only classified in categories and not numerically measured,
the resulting data are called categorical data.
Examples.
I

## Employment status: employed, unemployed

Blood type:O, A, B, AB

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Categorical data
Numerical data.

## Numerical (measurement) data

If the characteristic is measured on a numerical scale, the resulting
data consist of a set of numbers and are called measurement data.
We will use the term (numerical) variable to refer to a
characteristic that is measured on a numerical scale.
Examples.
I

## The number of offspring in an animal litter.

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Categorical data
Numerical data.

## Discrete and continuous variables.

If the measurement scale is made up of distinct numbers with gaps
in between the variable is called discrete.
Some variables can ideally take any value in an interval. Since the
measurement scale does not have gaps, such variables are called
continuous.
Examples.
I

## The number of offspring in an animal litter:discrete

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Categorical Data.
Discrete Data.
Continuous data.

Frequency table.

## Frequency in the category

.
Total number of observations

## Opinion poll on new dorm regulations.

Responses Frequency Relative Frequency
Support
152
152/280 = .543
Neutral
77
77/280 = .275
Oppose
51
51/280 = .182
Total
280
1.000
Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Categorical Data.
Discrete Data.
Continuous data.

Frequency table.
Daily numbers (x) of internet system crashes.
Data: 1,3,1,1,0,1,0,1,1,0,2,2,0,0,0,1,2,1,2,0,0,1,6,4,3,3,1,2,4,0.
Value x
0
1
2
3
4
5
6
Total

Frequency
9
10
5
3
2
0
1
30

Relative Frequency
.300
.333
.167
.100
.067
.000
.033
1.000

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Categorical Data.
Discrete Data.
Continuous data.

## Histogram and line diagram

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Categorical Data.
Discrete Data.
Continuous data.

## Constructing a Frequency Distribution for a Continuous

Variable
I
I

Find the minimum and the maximum values in the data set.
Choose intervals or cells of equal length that cover the range
between the minimum and the maximum without overlapping.
These are called class intervals, and their endpoints class
boundaries.
Count the number of observations in the data that belong to
each class interval. The count in each class is the class
frequency or cell frequency.
Calculate the relative frequency of each class by dividing the
class frequency by the total number of observations in the
data:
Natalia Tchetcherina
STAT400.
Chapter
1. Overview and Descriptive Statistics
Class
frequency

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Categorical Data.
Discrete Data.
Continuous data.

Example.

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Categorical Data.
Discrete Data.
Continuous data.

Example.
Frequency Distribution for Bookstore Sales Data
(left endpoints included, but right endpoints
excluded)
Class Interval
\$ 0125
125250
250375
375500
500625
Total

Frequency
5
8
13
11
3
40

Relative Frequency
5/40 = .125
8/40 = .200
13/40 = .325
11/40 = .275
3/40 = .075
1.000

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Categorical Data.
Discrete Data.
Continuous data.

Example.

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

## Population Averages and Sample Averages

Consider a population of N units, and let v1 , v2 , . . . , vN denote the
statistical population corresponding to some variable.Then the
population average or population mean, denoted by , is the
arithmetic average of all values in the statistical population. Thus,
N
1 X
vi .
=
N
i=1

## If the random variable X denotes the value of the variable of a

randomly selected population unit, then a synonymous terminology
for the population mean is expected value of X , or mean value
of X , and is denoted by X or E (X ).
Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

## Population Averages and Sample Averages

EXAMPLE: In a population of 500 tin plates, the number of plates
with 0, 1 and 2 scratches is N0 = 190, N1 = 160 and
N2 = 150.Thus, in the statistical population v1 , . . . , v500 , 190 vi
equal 0, 160 equal 1, and 150 equal 2.The population mean is
500

1 X
0 N 0 1 N1 2 N2
vi =
+
+
= 0.92
500
500
500
500
i=1

## If a tin plate is selected at random and X is the rv denoting the

number of scratches, the mean value of X is 0.92. (We write
X = 0.92, or E (X ) = 0.92).
Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

## Population Averages and Sample Averages

If a sample of size n is taken, and x1 , x2 , . . . , xn denote the variable
values of the sample units, then the sample average or sample
mean, denoted by x, is
n
1X
xi
x=
n
i=1

## Under s.r. sampling, a sample mean approximates, but in general

is different from the population mean.
EXAMPLE: If a s.r. sample of n = 100 is taken from the 500 tin
plates, it could be that there are n0 = 40, n1 = 34 and n2 = 26
plates with 0, 1 and 2 scratches.In this case, x = 0.86.
Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Median
The sample median of a set of n measurements x1 , x2 . . . , xn is the
middle value when the measurements are arranged from smallest
to largest. It is denoted as x
How to compute the median
1. Order the data from smallest to largest.
2. When the number of observations n is ODD the median is
middle observation of the ordered sample.
3. When the number of observations n is EVEN, two
observations from the ordered sample fall in the middle, and
the median is their average.
Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

## The median is not affected by a few very small or very large

observations, whereas the presence of such extremes can have a
considerable effect on the mean. For extremely asymmetrical
distributions, the median is likely to be a more sensible measure of
center than the mean.

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Percentiles
The sample 100 p-th percentile is a value such that after the data
are ordered from smallest to largest, at least 100p% of the
observations are at or below this value and at least 100(1 p)%
are at or above this value.
Calculating the Sample 100p-th Percentile.
1. Order the data from smallest to largest.
2. Determine the product (sample size) (proportion) = np.
3. If np is not an integer, round it up to the next integer and
find the corresponding ordered value.
4. If np is an integer, say k, calculate the average of the kth and
(k + 1)st ordered values.
Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Sample Quartiles

## Lower (first) quartile (designated Q1 ) = 25th percentile

Second quartile (median) (designated Q2 ) = 50th percentile
Upper (third) quartile (designated Q3 ) = 75th percentile

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

## Population Variance and Sample Variance

Let v1 , v2 , . . . , vN be a statistical population with mean .
DEFINITION: The population variance, 2 , is defined as
2 =

N
1 X
(vi )2 .
N
i=1

The standard
deviation is the positive square root of the
variance: = 2 .
If the rv X denotes a randomly selected value from the statistical
population, then a synonymous terminology for the population
variance is variance of X , and is denoted by X2 , or Var(X ).
Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

## Population Variance and Sample Variance

q
Similarly, the standard deviation of X is X = X2 .
A simpler computational formula for the variance is
2 =

N
1 X 2
vi 2
N
i=1

## .EXAMPLE: Consider the tin plate example, so the statistical

population v1 , . . . , v500 , has 190 vi equal 0, 160 equal 1, 150 equal
2, and = 0.92.Then,
2 =

## 190 0 1 160 4 150

+
+
0.922 = 0.6736.
500
500
500
Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

## Population Variance and Sample Variance

If x1 , x2 , . . . , xn denotes a sample from the statistical population,
the sample variance and its computational value are:
X
 n 2 
n
n
1
1 X
1 X
2
2
2
S =
(xi x) =
xi
xi
.
n1
n1
n
i=1
i=1
i=1

## The sample standard deviation is S = S 2 . Under s.r.

sampling, a sample variance approximates, but in general is
different from the population variance.
EXAMPLE: Consider the s.r. sample of n = 100 tin plates, which
has 40, 34 and 26 plates with 0, 1 and 2 scratches.Then,
1
S 2 = [138 73.96] = 0.647
99
Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

## Five number summary:

(minimum, Q1 , Q2 (median), Q3 , maximum).

Natalia Tchetcherina

## Populations, Samples, and Processes.

Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.
Describing data by tables and graphs.
Measures of Location
Measures of Variability

Boxplot

Natalia Tchetcherina