You are on page 1of 23

CH 225: Data Analysis and Interpretation

The role of statistics, Graphical and numerical methods for


describing and summarizing data. Probability. Population
distributions. Sampling variability and sampling
distributions. Estimation using a single sample. Hypothesis
testing a single sample. Comparing two populations or
treatments. Simple linear regression and correlation. Case
studies.

1
CH 225: Data Analysis and Interpretation

Ø 80% attendance is must

Ø 15% Quiz (In class performance)

Ø 5% Class Participation in terms of Attendance


(1/2 marks cut for each absent, max of 5 marks)

Ø 30% midsem and 50% final exam

2
3
THE NATURE OF STATISTICS

Statistics...the most important science in the whole world

Blind Men and Elephant: Indian Fable

Statistics are numbers measured for some purpose.

Statistics is a collection of procedures and principles for


gathering data and analyzing information in order to help
people make decisions when faced with uncertainty

Turning Data into Information


THE NATURE OF STATISTICS

Statistics (NUMERICAL SCIENCE) is the art of learning from data. It is concerned with the
collection of data, their subsequent description, and their analysis, which often leads to the
drawing of conclusions.
Descriptive Statistics:
Part of statistics concerned with the description and summarization of data

Inferential Statistics:
Part of statistics concerned with the drawing of conclusions from data

To be able to draw logical conclusions from data, it is usually necessary to make some
assumptions about the chances (or probabilities) of obtaining the different data values. The
totality of these assumptions is referred to as a probability model for the data.

Statistical inference starts with the assumption that important aspects of the phenomenon
under study can be described in terms of probabilities, and then it draws conclusions by
using data to make inferences about these probabilities. 5
THE NATURE OF STATISTICS

Population: The total collection of all the elements that we are interested in
Sample: A subgroup of the population that will be studied in detail
Random Sample: A sample of k members of a population is said to be a random
sample, sometimes called a simple random sample, if the members are chosen in such a way
that all possible choices of the k members are equally likely

Stratified Random Sample: First the population is stratified into


subpopulations, and then the correct number of elements is randomly chosen from each of the
subpopulations. As a result, the proportions of the sample members that belong to each of the
subpopulations are exactly the same as the proportions for the total population.

6
DESCRIBING DATA SETS
For a Large Data Set it is very important that the numerical findings of any study be
presented clearly and concisely and in a manner that enables one to quickly obtain a feel for the
essential characteristics of the data

Frequency Table and Graph: For a relatively small data set


No. of days of leave by 50 workers in last six weeks
2, 2, 0, 0, 5, 8, 3, 4, 1, 0, 0, 7, 1, 7, 1, 5, 4, 0, 4, 0, 1, 8, 9, 7, 0,
1, 7, 2, 5, 5, 4, 3, 3, 0, 0, 2, 5, 1, 3, 0, 1, 0, 2, 4, 5, 0, 5, 7, 5, 1

Sum of all the Frequency should be 50

How many workers had at least 1 day of sick leave?


How many workers had between 3 and 5 days of sick leave?
How many workers had more than 5 days of sick leave? 7
DESCRIBING DATA SETS

Bar Graph

Frequency Polygon

8
DESCRIBING DATA SETS

Bar Graphs and Symmetry

9
DESCRIBING DATA SETS
Relative Frequency Graph : n is the total number of observations
Sum of Relative Frequencies should be 1

Graphs looks similar to Frequency Graph


except that the scale of Y-Axis are different

10
DESCRIBING DATA SETS

Pie Charts : A pie chart is often used to plot relative frequencies when the data are
nonnumeric.

A circle is constructed and then is sliced up into distinct sectors, one for each different data
value.

If the relative frequency of the data value is f/n, then the area of the sector is the fraction f/n of
the total area of the circle

PROBLEMS P 25 11
DESCRIBING DATA SETS

Grouped Data and Histograms

Ø For Large Data Sets


Ø Divide values in groups called Intervals
Ø 5 to 10 class intervals
Ø Left-Hand Inclusion Convention (10-20)

Blood Cholesterol Levels of 40 Subjects

Arrange in Increasing Order

12
DESCRIBING DATA SETS
Frequency Table of Blood Cholesterol Levels

A bar graph plot of the data, with the bars placed adjacent to each other, is called a
histogram. The vertical axis of a histogram can represent either the class frequency or the
relative class frequency

13
DESCRIBING DATA SETS

Frequency Histogram

The importance of a histogram is that it enables us to organize and present data graphically so
as to draw attention to certain important features of the data. For instance, a histogram can
often indicate

14
DESCRIBING DATA SETS
Frequency Histogram

15
Problem Set P 39
DESCRIBING DATA SETS

STEM-AND-LEAF PLOTS: For Small to Moderate Data Set


Divide the data in two parts: Stem and Leaf
e.g. following data can be written in step-leaf plot as shown
125, 135, 124, 115, 145, 133, 132, 122, 124, 129
Stem Leaf * values of the leaves are put in the plot in increasing order
11 5
* The choice of stems should always be made so that
12 2,4,4,5,9
the resultant stem-and-leaf plot is informative
13 2,3,5
about the data (not too many values in either Stem
14 5 or in Leaf)

16
DESCRIBING DATA SETS
Per Capita Personal Income (Dollars per Person), 2002

17
DESCRIBING DATA SETS

18
DESCRIBING DATA SETS
The following stem-and-leaf plot represents the weights of 80 attendees at a sporting
convention. The stem represents the tens digit, and the leafs are the ones digit

The numbers in parentheses on the right represent


the number of values in each stem class

Stem-and-leaf plots are quite useful in showing all the


data values in a clear representation that can be the
first step in describing, summarizing, and learning
from the data.

It is most helpful in moderate-size data sets


19
DESCRIBING DATA SETS
Sometimes a stem-and-leaf plot appears to have too many leaves per stem line and
as a result looks cluttered.

This could be broken into two stem lines. On the top stem line in the pair we could
include all leaves having values 0 through 4, and on the bottom stem line all leaves
having values 5 through 9.

Problem Set P47 20


DESCRIBING DATA SETS
SETS OF PAIRED DATA
Salaries vs IQ for 30 Workers

21
DESCRIBING DATA SETS

first consider each part of the paired data separately and then plot the relevant
histograms or stem-and-leaf plots for each

IQ Scores Salaries

Is there a relationship between IQ and Salary: For this we need to plot it differently

22
DESCRIBING DATA SETS
Scatter Diagram

Problem Set P54 &P63 23

You might also like