You are on page 1of 13

Lesson 1: Getting started with Statistics

Objectives
 Learn basic vocabulary of statistics.
 Distinguish between population and sample.
 Distinguish among the two types of populations.
“The goal of the branch of mathematics(?) called statistics is to provide information so that
informed decisions can be made. Statistics is not a branch of mathematics, surely, we use
numbers, formulas but then they come from physics not in mathematics.
Statistics is a science it precedes forward base on inductive reasoning, tries to learn about the
larger whole base on your smaller sample. Mathematics, on the other hand, trust from the larger
bring down to be narrower- deductive reasoning.
Statistics is branch of science which enable to filter in your encounter so that you can be better
prepared for the decisions you make in your daily life.

STATICTICS

-Statistics is the science of gathering, describing, and analyzing data


-Statistics are the actual numerical descriptions of sample data/ function of your data.
A target population is a particular group of interest.
 A sampled population is a group from which the sample is taken.
 A sampling frame is a physical list of all members of the sampled population.
 A sample is a subject of the population from which data are collected.
A census is a study in which data are obtained from every member of the population.
Intra-Lecture Questions:
Q1. What is the difference between a population and a sample?
A variable is a value or characteristic that changes among members of the population.
Data are the counts, measurements, or observations gathered about a specific variable in a
population in order to study it.
A parameter is a numerical description of a population characteristics.
A sample statistic is a (numerical) description of sample characteristics.
Q2. What is the difference between a parameter and statistics?
“It is essential that you are mindful of the relationship between a population and a sample. The
next figure is a picture to help you visualize this relationship. The large oval represents the entire
population, and the smaller oval represents the sample chosen from the population.”

Target Population

Sampled Population

Sample

Table 1.1: Population vs. Sample

Population Sample

Whole group Part of the group

Group we want to know about Group we do know about

Characteristics are called parameters Characteristics are called statistics

Parameters are generally unknown Statistics are always known

Parameters are fixed Statistics change with the sample

Example 1.1: Identifying Population and Sample


a. In a survey, 359 college students at the University of Jackson were asked if they had tried to
October flavor of the month at the campus coffee shop. Eighty-three of the students
surveyed said yes.
b. A survey of 1125 households in the United States found that 24% subscribe to satellite radio.
Example 1.2: Identifying Population, Sample, Parameters, and Statistics

Read each of the shortened survey reports below. For each report:
a. Identify the population.
b. Identify the sample.
c. Determine whether the highlighted value is a parameter or statistic.
1. After an airplane security scare on Christmas day, 2009, the Gallup organization
interviewed 542 American air travelers about increased security measures at airports. The
report stated that 78% of American air travelers are in favor of United Staes airports
using full-body-scan imaging on airline passengers.
2. Rasmussen Reports also conducted a survey in response to the airport security scare on
Christmas day, 2009. The national telephone survey of 1000 adult Americans found that
59% of Americans surveyed favor racial profiling as a means of determining which
passengers to search at airport security checkpoints.
Two Branches of Statistics

The branch of descriptive statistics, as a science, gathers, sorts, summarizes, and displays
the data.
The branch of inferential statistics, as a science, involves using descriptive statistics to
estimate population parameters.

Q3. What is the difference between descriptive and inferential statistics?


Two types of Analysis

Exploratory analysis uses data to estimate parameters


Confirmatory analysis uses statistics to test claims about reality (hypothesis)

Example 1.3: Identifying Descriptive and Inferential Statistics


 In a news report on the state of the media by Tom Rosenstiel and Amy Mitchell, they
write the following: “AOL had 900 journalists, 500 of them at its local Patch news
operation… By the end if 2011, Bloomberg expects to have 150 journalists and analysts
for its new Washington operation, Bloomberg Government.”

Source: Rosenstiel, Tom and Amy Mitchell. “Overview.” The State of the News Media: An annual Report on
American Journalism, Pew Research Center’s Project for Excellence in Journalism. 2011.
http:/stateofthemedia.org/2011/overview-2/ (12 Dec.2011).

Identify the descriptive and inferential statistics used in this excerpt from their article.
SECTION 1.2
Data Classification

Objectives:
o Classify data as
 Qualitative or quantitative;
 Discrete, continuous, or neither; and
 Nominal, ordinal, interval, or ratio.
Qualitative vs. Quantitative Data

Qualitative data, also known as categorical data, consist of labels or descriptions


of traits.
Quantitative data, also known as numeric data, consist of counts on measurements.

Qualitative Quantitative

Descriptions and labels Counts and measurements

Example 1.4: Classifying Data as Qualitative or Quantitative

Classify the following ass either qualitative or quantitative… What are you assuming about the
described variables?
a. Shades of red paint in a home improvement store
b. Rankings of the most popular paint colors for the season
c. Amount of red primary dye necessary to make one gallon of each of red paint.
d. Numbers of paint choices available at several stores
Q1. Is the variable “number of toes” a qualitative (categorical) or a quantitative (numeric)
variable?
Continuous vs. Discrete Data

Discrete data are quantitative data that can take on only particular values and are usually counts.
Continuous data are quantitative data that can take on any value in a given interval and are
usually measurements.
Example 1.5: Classifying Data ass Continuous or Discrete

Determine whether the following data are continuous or discrete.


a. Temperatures in Fahrenheit of cities in South Carolina
b. Numbers of houses in various neighborhoods in a city
c. Numbers of elliptical machines in every YMCA in your state
d. Heights of doors
Q2. Is the variable “Grade Point Average” discrete or continuous?
Levels of Measurement
The level of measurement of a variable describes the amount of information that variable
contains. The four levels are
 Nominal --description
 Ordinal -- ordering
 Interval --differences between levels
 Ratio –true zero
Data at the nominal level of measurement are qualitative data consisting of labels or names.
Nominal comes from Latin word “nominos” which means name.
Example 1.6: Understanding the Nominal Level of Measurement

a. Suppose all students in a statistics class were asked what pizza topping is their favorite.
Explain why these data are the nominal level of measurement.
b. Suppose instead that you wish to know the number of students whose favorite pizza
toppings is sausage. Explain why this data value is not at the nominal level of
measurement.
Data at the ordinal level of measurement are qualitative data that can be arranged in a
meaningful order, but calculations such as addition or division do not make sense.
Example 1.7: Classifying Data as Nominal or Ordinal

Determine whether the data are nominal or ordinal.


a. The seat numbers on your concert tickets, such as A23 and A24.
b. The genres of the music performed at the 2013 Grammys.
Data at the interval level of measurement are quantitative data that can be arranged in a
meaningful order, and differences between data entries are meaningful.
Example 1.8: Classifying Data by the level of Measurement

The birth years of your classmates are collected. What level of measurement are these data?
Data at the ratio level of measurement are quantitative data that can be ordered, differences
between data entries are meaningful, and the zero point indicates the absence of something.
Example 1.9: Classifying Data by the level of Measurement

Consider the ages in whole years of US presidents when they were inaugurated. What level of
measurement are these data?
Q3: Give an example of a ratio-level variable not provided in the slides or text.
Example 1.10: Classifying Data

Determine the following classifications for the given data sets: qualitative or quantitative;
discrete, continuous, or neither; and level of measurement.
a. Finishing times for runners in the Labor Day 10k race.
b. Colors contained in a box of crayons.
c. Boiling points (on the Celsius scale) for various caramel candies.
d. The top ten Spring Break destinations as ranked by MTV.
Section 1.3
The process of a Statistical Study
Objectives:
o Describe the process of a statistical study.
o Understand the primary sampling schemes.
o Identify various types of studies.

Conducting a Statistical Study


1. Determine the design of the study.
a. State the question to be studied.
b. Determine the population and variables.
c. Determine the sampling method.
2. Collect the data.
3. Organize the data.
4. Analyze the data to answer the question.
Example 1.11: Identifying Population and Variables.

Neurologists want to study the effect of vitamin C on nerve disorders. The goal of the study
is to see if taking an intravenous dose of vitamin c will reduce the amount of nerve pain
reported by patients. Identify the population of interest and the variables in this study.
 An observational study observes data that already exist.
 An experiment generates data to help identify cause-and-effect relationships.
Note: these are the “proper” definitions as used by scientists. A statistician will refer to any “theoretical” data
collection as an experiment. This differences in terminology comes from the fact that statisticians will experiment to
better understand their field of study.

Example 1.12: Identifying Observational Studies and Experiments

Which type of study would you conduct: an observational study or an experiment?


a. You want to determine the average age of college students across the nation.
b. Researchers wish to determine if flu shots actually help prevent severe cases of the flu.
Observational Studies
Representative sample- has the same relevant characteristics as the population and does not
favor one group from the population over another.
Note that a sample could be representative for one characteristics of the population (parameter) but not for another.

Here is an interesting question: How do you know if a sample representative of the population?

Q1. My sample is all females in this class. Is it a representative sample?


Sampling Techniques
 Simple Random Sample
-every sample from the population has an equal chance of being selected.
 Stratified Sample
-the population is divided into subgroups called strata. The grouping variable is
correlated with the measurement variable. A sample is drawn from each stratum.
 Cluster Sample
-the population is divided into subgroups, called clusters. The grouping variable is not
correlated with the measurement variable. A sample is drawn from at least one of the
clusters. E.g., rice field.
 Systematic Sample
-selecting every n^th member of the population

 Convenience Sample
-the sample is convenient for the researcher to select.

Note: this is an ethical sample.


Example 1.13: Identifying Sampling Methods

Identify the type of sampling used in each of the following scenarios.


a. A pollster surveys 50 people in each of a senator’s 12 voting precincts.
b. The quality control department at a cereal manufacturer measures the weight of every 10 th box off
of the assembly line.
c. A female student walks down the halls in her dorm asking students how much money they would
spend in a food court in the dorm lobby in an effort to persuade the administration to offer such
an option.
d. An educator chooses 5 of the school districts in the Chicago area and asks each household in
those districts how many school-age children are in the home.
e. To determine who will win a $100,000 shopping spree at the mall, the manager draws a name out
of a box of entries.
Q2. What is the primary difference between cluster and stratified sampling?

Two types of Observational Studies

 Cross-sectional study – data are collected at a single point in time.


 Longitudinal study – data are gathered by following a particular group over a period of time.

Example 1.14: Classifying Studies as Cross-Sectional or Longitudinal

Categorize the following studies as either cross-sectional or longitudinal.


a. A group of 220 patients is followed for 15 years in order to determine the long-term health effects
resulting from gastric surgery.
b. A gastroenterologist surveys 130 of his patients six months after having gastric bypass surgery to
determine the average amount of weight lost.
EXPERIMENTS

 TREATMENT – is some condition that is applied to a group off subjects in an experiment.


 Subjects (participants) – are people or things being studied in an experiment.
 Response variable – is the variable in an experiment that responds to the treatment. (dependent)
 Explanatory variable - is the variable in an experiment that causes the change in the response
variable. (Independent)
3 Principles of Experimental Design
1. Randomize the control and treatment groups.
2. Control for outside effects on the response variable.
3. Replicate the experiment a significant number of times to see meaningful patterns.
 Control Group – is a group of subjects to which no treatment is applied in an experiment.
 Treatment Group – is a group of subjects to which researchers apply a treatment in an
experiment.
 Confounding Variables – are unmeasured factors other than the treatment that cause an effect on
the subjects of an experiment.
Q3. How do we know if there are confounding variable in a statistical study?

 Placebo – is a substance that appears identical to the actual treatment but contains no intrinsic
beneficial elements.
 Placebo Effect – is a response to the power of suggestion, rather than the treatment itself, by
participants of an experiment.
 Single -Blind experiment – subjects do not know if they are in the control group or the treatment
group, but the people interacting with the subjects in the experiment know in which group each
subject has been placed.
 Double-Blind experiment - - neither the subjects nor the people interacting with the subjects
know to which group each subject belong.
Example 1.16: Analyzing an Experiment

Consider the study from Example 1.11, in which neurologists want to determine if taking an
intravenous dose of vitamin C will reduce the amount of nerve pain reported by patients. Suppose that
the study was narrowed to focus only on patients with the nerve disorder, multiple sclerosis (MS).
After study approval, the neurologists solicit volunteers who are patients with MS who are reporting
nerve pain. The participants are then randomly assigned to two groups, each having 20 participants.
Participants in Group A are administered intravenous doses of vitamin C, and their nerve pain is
tracked. Participants in Group B are administered intravenous doses of saline (which has no active
ingredients) and their pain levels are also tracked. The patients are not told which of the two groups
they are in; however, the nurses administering the IVs are aware of the group assignments. After a
predetermined length of time, the amounts of pain reported by the separate groups are compared to
determine if an intravenous dose of vitamin C will reduce the amount of nerve pain.
a. identify the explanatory and response variables.
b. What is the treatment?
c. Which group is the treatment group and which group is the control group?
d. What is the purpose of administering saline to Group B?
e. Is this a single-blind or double-blind study?
Institutional Review Boards

An Institutional Review Board (IRB) is a group of people who review the design of the study to make
sure that it is appropriate and that no unnecessary harm will come to the subjects involved.
Informed Consent involves completely disclosing to participants the goals and procedures involved in a
study and obtaining their agreement to participate.

SECTION 2.1
FREQUENCY DISTRIBUTIONS

Objectives
o Construct a frequency distribution.
o Create an ungrouped frequency distribution.
o Create a grouped frequency distribution.

A Distribution is a way to describe the structure of a particular data set or population.


A Frequency Distribution is a display of the values that occur in a data set and how often each value, or
range of values, occurs.
Frequencies (f) are the numbers of data values in the categories of a frequency distribution.
A Class is a category of data in a frequency distribution.
An ordered array is an ordered list of the data from largest to smallest or vice versa.
A Probability distribution is a theoretical distribution used to predict the probabilities of particular data
values occurring in a population.
An ungrouped frequency distribution is a frequency distribution where each category or class
represents a single value.
A grouped frequency distribution is a frequency distribution where the classes are ranges of possible
values.
Constructing a Frequency Distribution (Ungrouped)

To create an ungrouped frequency distribution,

 Determine the levels of the categorical variable.


 Count the number of observed values in each level.
Example:
The eye color of my research students this term are as follows:
(blue, brown, brown, blue, brown, brown, brown, green)
The frequency distribution is

Color Frequency
Blue 2
brown 5
Green 1

Q1: What is the difference between a grouped and an ungrouped frequency distribution?

Constructing a Frequency distribution (Grouped)

Constructing a frequency Distribution


1.Decide how many classes should be in the Distribution. There are typically between 5 and 20 classes in
a frequency distribution. Several different methods can be used to determine the number of classes that
will show the data most clearly, but in this textbook, the number of classes for a given data set will be
suggested.
2. Choose an appropriate class width. In some cases, the data set easily lends itself to natural divisions,
such as decades or years. At other times, we must choose divisions for ourselves. When starting a
frequency distribution from scratch width, one method of finding an appropriate frequency class width is
to be begin by subtracting the lowest number in the data set and dividing the different by the number of
classes.
3. Find the class limits. The lower-class limit is the smallest number that can belong to a particular class,
and the upper-class limit is the largest number that can belong to a class. Using the minimum data value,
or a smaller number, as the lower limit of the first class is a good place to begin. However, judgment is
required. You should choose the first lower limit so that reasonable classes will be produced, and it
should have the same number of decimal places as the largest number of decimal places in the data.
4. Determine the frequency of each class. Make a tally mark for each data value in the appropriate class.
Count the marks to find the total frequency for each class.
Class width – is the difference between the lower limits or upper limits of two consecutive classes of a
frequency distribution.
Lower class limit – is the smallest number that can belong to a particular class.
Upper class limit – is the largest number that can belong to a particular class.

Example 2.1: Constructing a Frequency Distribution

Create a frequency distribution using five classes for the list of 3-D TV prices given in Table 2.2.

Table 2.2: 3-D TV Prices (in Dollars and in an Ordered Array)


1595 1599 1685 1699 1699
1699 1699 1757 1787 1799
1799 1885 1888 1899 1899
1899 1984 1999 1999 1999

Solution
Because we were told how many classes to include, we will begin by deciding on a class width.
Subtract the lowest data value from the highest and divide it by the number of classes. As shown
below.
1999-1595 = 80.8 ≈ 81
5
This would give us a class width of $81.
We will stop here and consider some options. Choosing a class width of $81 does seem perfectly
reasonable from a theoretical point of view. However, one should consider the impression
created by having TV prices grouped in intervals of $81. Can you imagine presenting this data to
a client? Instead, it would be more reasonable to group TV prices by intervals of $100.
Therefore, we will choose our class width to be $100.
Now let’s continue building the class limits. Adding the class width of $100 to $1500, we obtain
a second lower class limit of $1600. We continue in this fashion until we have five lower class
limits, one for each of our five classes.
Finally, we need to determine appropriate upper-class limits. Again, be reasonable. Remember,
too, that the classes are not allowed to overlap.
Because the data are in whole dollar amounts, it makes sense to choose upper class limits that are
one dollar less than the next lower limit. The classes we have come up with are as followa.
3-D TV Prices
Class Frequency
$1500-$1599
$1600-$1699
$1700-$1799
$1800-$1899
$1900-$1999

Note that the last upper class limit is also the maximum value in the data set. This will not
necessarily occur in every frequency table. However, we have included all the data values in our
range of classes, so no adjustments to the classes are necessary.
Tabulating the number of data values that occur in each class produces the following frequency
table.
3-D TV Prices
Class Frequency
$1500-$1599 2
$1600-$1699 5
$1700-$1799 4
$1800-$1899 5
$1900-$1999 4

Note that the sum of the frequency column should equal the number of data values in the set.
Check for yourself that this is true.

Characteristics of a frequency Distribution

Class Boundary – is the value that lies halfway between the upper limit of one class and the
lower limit of the next class. After finding one class boundary, add or(subtract) the class width to
find the next class boundary. The boundaries of a class are typically given inn interval form:
boundary-upper boundary.

You might also like