Professional Documents
Culture Documents
1
CH 225: Data Analysis and Interpretation
2
3
THE NATURE OF STATISTICS
Statistics (NUMERICAL SCIENCE) is the art of learning from data. It is concerned with the
collection of data, their subsequent description, and their analysis, which often leads to the
drawing of conclusions.
Descriptive Statistics:
Part of statistics concerned with the description and summarization of data
Inferential Statistics:
Part of statistics concerned with the drawing of conclusions from data
To be able to draw logical conclusions from data, it is usually necessary to make some
assumptions about the chances (or probabilities) of obtaining the different data values. The
totality of these assumptions is referred to as a probability model for the data.
Statistical inference starts with the assumption that important aspects of the phenomenon
under study can be described in terms of probabilities, and then it draws conclusions by
using data to make inferences about these probabilities. 5
THE NATURE OF STATISTICS
Population: The total collection of all the elements that we are interested in
Sample: A subgroup of the population that will be studied in detail
Random Sample: A sample of k members of a population is said to be a random
sample, sometimes called a simple random sample, if the members are chosen in such a way
that all possible choices of the k members are equally likely
6
DESCRIBING DATA SETS
For a Large Data Set it is very important that the numerical findings of any study be
presented clearly and concisely and in a manner that enables one to quickly obtain a feel for the
essential characteristics of the data
Bar Graph
Frequency Polygon
8
DESCRIBING DATA SETS
9
DESCRIBING DATA SETS
Relative Frequency Graph : n is the total number of observations
Sum of Relative Frequencies should be 1
10
DESCRIBING DATA SETS
Pie Charts : A pie chart is often used to plot relative frequencies when the data are
nonnumeric.
A circle is constructed and then is sliced up into distinct sectors, one for each different data
value.
If the relative frequency of the data value is f/n, then the area of the sector is the fraction f/n of
the total area of the circle
PROBLEMS P 25 11
DESCRIBING DATA SETS
12
DESCRIBING DATA SETS
Frequency Table of Blood Cholesterol Levels
A bar graph plot of the data, with the bars placed adjacent to each other, is called a
histogram. The vertical axis of a histogram can represent either the class frequency or the
relative class frequency
13
DESCRIBING DATA SETS
Frequency Histogram
The importance of a histogram is that it enables us to organize and present data graphically so
as to draw attention to certain important features of the data. For instance, a histogram can
often indicate
14
DESCRIBING DATA SETS
Frequency Histogram
15
Problem Set P 39
DESCRIBING DATA SETS
16
DESCRIBING DATA SETS
Per Capita Personal Income (Dollars per Person), 2002
17
DESCRIBING DATA SETS
18
DESCRIBING DATA SETS
The following stem-and-leaf plot represents the weights of 80 attendees at a sporting
convention. The stem represents the tens digit, and the leafs are the ones digit
This could be broken into two stem lines. On the top stem line in the pair we could
include all leaves having values 0 through 4, and on the bottom stem line all leaves
having values 5 through 9.
21
DESCRIBING DATA SETS
first consider each part of the paired data separately and then plot the relevant
histograms or stem-and-leaf plots for each
IQ Scores Salaries
Is there a relationship between IQ and Salary: For this we need to plot it differently
22
DESCRIBING DATA SETS
Scatter Diagram