Professional Documents
Culture Documents
(Stat 2161)
Chapter 1
Introduction
1
Chapter Goals
After completing this chapter, you will be
able to:
• Explain the reasons for studying statistics
5
Statistical
Data Information
tools
6
Classification of Statistics
o Descriptive statistics Mainly concerned with the
methods and techniques used in collection, organization,
presentation, and analysis of a set of data without
making any conclusions or inferences
oInferential statistics Deals with the method of
inferring or drawing conclusion about the characteristics
of the population based upon the results of a sample.
7
Descriptive Statistics
• Collect data
– e.g., Survey
• Present data
– e.g., Tables and graphs
• Summarize data
– e.g., Sample mean =
X i
8
Inferential Statistics
• Estimation
– e.g., Estimate the population
mean weight using the sample
mean weight
• Hypothesis testing
– e.g., Test the claim that the
population mean weight is 65 kg
9
1.2 Stages in Statistical Investigation
➢There are five basic stages for any statistical investigation.
1.Collection of Data refers to the process of collecting
observations (measurements, survey responses, etc.).
2.Organization of Data: The arrangement of data in a suitable
form. It constitutes editing, classifying and tabulation.
3.Presentation of Data is the process of displaying data in a
precise manner using tables, graphs & diagrams.
4. Analysis of Data is the process of systematically applying
statistical and/or logical techniques to describe, illustrate, and
evaluate data.
5. Interpretation of Data it is related with generalization of some
characteristics from sample to population.
10
1.3: Application, Uses and Limitations of Statistics
•Applications
•Statistics is applied in almost all areas of research such as in
• Industry – control charts and inspection plans.
• Commerce – demand and supply.
• Agriculture – mean comparison (ANOVA).
• Economics – index number, time series and estimation.
• Education – formulation of policies to start new course.
• Planning – data related to production and consumption.
• Medicine – testing efficacy of a new drug.
• Modern Applications, for example, software engineering.
11
1.3: Application, Uses …
•Uses of Statistics
12
Limitations
13
1.4 Types of Variables and Measurement Scales
• Variable is a characteristic which takes on
different values.
• Value: A specific amount possible for a variable to
be.
Types of Variables
o Qualitative Variables:
oAttributes, categories
o Examples: male/female, registered to vote/not,
ethnicity, eye color, etc.
o Quantitative Variables
Discrete variable can assume only a countable number of
values.
Continuous variable can take on any value along an
interval – measurements, how much
14
Scales of Measurement
Differences between
measurements, true Ratio Scale
zero exists
Quantitative Variable
Differences between
measurements but Interval Scale
no true zero
Ordered Categories
(rankings, order, or Ordinal Scale
scaling but no exact
difference) Qualitative Variable
Categories (no
ordering or direction) Nominal Scale
15
Example
• Marital status
• Eye color
• Nominal: • Gender
• Race
• Stage of disease
• Ordinal: • Severity of pain
• Level of satisfaction
• Interval • Temperature
• Ratio: • Distance
• Length
• Time until death
• Weight 16
Chapter Two
Presentation
17
Chapter Goals
After completing this chapter, you are expected to:
• Explain why we collect data
• Identify sources of data
• Describe the various methods of data collection
• Create and interpret diagrams to describe categorical
variables:
– frequency distribution, bar chart, pie chart
• Create and interpret graphs to describe numerical
variables:
– frequency distribution, histogram, ogive, stem-and-leaf
plot
18
2.1 Methods of Data Collection
• Why we collect data?
– To answer questions,
– To make decisions, and
– To gain a deeper understanding of some
phenomena.
• Example
– Does lowering speed limit reduce the number of
fatal traffic accidents?
– What fractions of students in a college belong to
blood group O?
• Data: A plural noun (the singular form is datum) means a
set of known or given facts.
• Data can be collected using survey or experiment.
19
2.1.1 Sources of Data
• Primary
– Data generated by the immediate user(s) of the data.
– Survey, experimental and observational research are
most popular.
– Tend to require more time and expense than secondary
data.
• Secondary
– Data gathered from another source for a similar or
different purpose.
• Internal sources within the researcher’s organization
• External sources, including governmental, trade,
commercial and internet sources.
20
Sources of Data . . .
• Example: If it is required to know the average
CGPA of students at a university, then data can be
accessed from the registrar office of that particular
university.
• Uses of Secondary Data
• Secondary data save time and cost as compared to primary
data.
• They are less subject to intentional bias.
• Secondary data are the only option for inaccessible
information.
• Drawback of Secondary Data
• They may not fit all the requirements that we need.
21
2.1.2 Types of Data
Data
Categorical Numerical
(Qualitative) (Quantitative)
Examples: Data on
◼ Marital Status
◼ Cause of death Discrete Continuous
◼ Eye Color
(Defined categories or
groups) Examples: Data on Examples: Data on
24
Data Collection …
5. Indirect Oral Interview: The researcher contacts third
parties called witnesses capable of supplying the
necessary information.
– Recommended if the information is of complex
nature or the informants are not inclined to respond.
6. Mailed Enquiry Method: Letters with a set of
questions are sent to the respondents and responses are
collected afterwards.
Recommended if the survey covers large area and the
respondents are scattered around.
7. Old Records: A researcher uses data collected by
others & stored in some forms such as in books,
newspapers, almanacs(handbook) or even
unpublished sources.
25
2.2 Methods of Data Presentation
• Data in raw form are usually not easy to use for
decision making.
Class Width = 10
29
Frequency Distribution …
Frequency Distribution
Qualitative Quantitative
Ungrouped Grouped
• Frequency Distribution: A table useful to present data in
classes and shows the number of observations in each class.
• Qualitative FD: a frequency distribution where the data to be
presented are only nominal or ordinal.
• Ungrouped FD: a frequency distribution where each number
in a dataset represents a single class.
• Grouped FD: several values are grouped into one class.
30
Frequency Distributions . . .
• Categorical Frequency Distribution
– The categorical frequency distribution is used for data which can be placed
in specific categories such as nominal or ordinal level data
– The major components of categorical frequency distribution are class, tally and
frequency (or proportion).
• Percentages are also usable
31
Frequency Distributions . . .
• Example: Data on smoking status by gender of a sample of 20 health workers
in Jimma Hospital 1986 E.C was given. Construct categorical frequency
distribution.
Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Gender M F M M F F F M M M F F F F M F M F M M
Smoking Y N N Y N N Y N N N N N N Y Y Y N N Y Y
status
Characteristics Tally Frequency
Gender
Male //// //// 10
Female //// //// 10
Smoking status
No //// //// // 12
Yes //// /// 8
32
Frequency Distributions . . .
• Ungrouped Frequency Distribution
– It is the distribution that use individual data values along with their
frequencies.
– often constructed for small set of data on discrete variable (when data are
numerical), and when the range of the data is small.
– sometimes it is complicated to use ungrouped frequency distribution for
large mass of data, as result we use grouped frequency distribution.
– The major components of this type of frequency distributions are class, tally,
frequency, and cumulative frequency (less than/more than).
– Cumulative frequency is used to determine the number of observations that lie
above (or below) a particular value in a data set.
33
Frequency Distributions . . .
Example: Age in year of 20 women who attended health education at Jimma
Health center in 1986 are given as follows. Construct ungrouped frequency
distribution
30 25 23 41 39 27 41 24 32 29 29 35 31 36 33 36 42
35 37 41
Age(xj) 23 24 25 27 29 30 31 32 33 35 36 37 39 41 42
Tally / / / / // / / / / // // / / /// /
Frequency(f) 1 1 1 1 2 1 1 1 1 2 2 1 1 3 1
34
Steps in the Construction of Grouped FD
1. Find the difference between the smallest and largest
values in the raw data and denote as R.
2. Set the number of classes (K); usually in between 5 &
20 or use Struges’ rule K=1+3.322(log10 n)
3. Estimate the class width W= R/K; round the estimate to
a convenient value.
4. Determine the LCL for the first class by selecting a
convenient number that is <= the lowest data value.
Then add to it the class width to get the lower class
limit of the second class. Keep adding until the
desired number of classes is reached.
5.1. If the observations are whole numbers (e.g., 12, 23, 78,
etc.), subtract ONE from the lower class limit of the second
class to get the upper class limit of the first class. 35
Steps in the Construction of Grouped FD
5.2. If the observations are fractions (e.g., 1.2,
2.3, 7.8, etc.), subtract 0.1 from the lower class
limit of the second class to get the upper class
limit of the first class.
5.3. If the observations are fractions (e.g., 1.32,
2.35, 7.84, etc.), subtract 0.01 from the lower
class limit of the second class to get the upper
class limit of the first class.
6. Count number of frequencies in each class and put
them with the corresponding classes.
36
Relative and Cumulative FD
• Relative frequency table: a table showing relative
frequencies in each class.
– Relative frequency can be expressed in terms of a a
percentage.
• Cumulative frequency (cf): the sum of the frequencies
succeeding or preceding a class k including the frequency
of the class k.
– The cumulative relative frequency expresses the same
information as a percent by multiplying by 100%/n.
• Less than cf counts the number of observations less than
or equal to upper class boundary of a class.
• More than cf is obtained by adding frequencies of
observations greater than lower class boundary of a class.
37
Example
• Consider the following data
30 40 41 33 70 51 37 10 31 21 60 44 63 72 23 37 65 14
25 28 64 39 17 74 53 34 51 27 43 45 33 16 23 68 47 32
36 19 48 49 67 60 45 54 44 30 15 38 22 46 61 25 29 55
48 49 35 13 37 36
• Prepare i) absolute frequency distribution;
ii) relative frequency distribution;
iii) less than and more than cumulative
frequency distributions.
38
Example …
R= 74 – 10 = 64 , n = 60
Using Sturges’ Rule:
K=1+3.322(log10 60) = K=1+3.322( 1.778151 ) = 6.9070 7
W = 64/ 7 = 9.14 10
39
Example …
30 25 23 41 39 27 41 24 32 29 29 35 31 36 33 36 42
35 37 41
n=20
solution :
41
Exercise
1. Given below are raw data on ages of 40 employees of an
organization. Construct a frequency distribution including the
class boundaries, class marks the relative frequencies, the less
than and more than cumulative frequencies.
62 58 53 27 30 31 26 34 49 47 48 41 50 61 40 47 41 43 50 45
43 32 37 31 35 38 29 65 58 43 44 41 37 27 62 65 36 42 63 50
Solution
42
2.2.2 Diagrammatic Presentation of Data
• It includes bar chart, pie diagram and steam and leaf
plot.
• Bar charts are the simplest and most widely used
diagrams for data presentation.
• Bar charts display absolute or relative frequency
distributions for categorical variables.
Bar Chart
43
Simple Bar Chart
44
Simple Bar Chart . . .
Year No. of students
2000 3005
2001 3567
2002 3800
2003 4300
2004 3650
2005 5000
45
Two Way Bar Chart
• To represent data having both negative and positive
values.
• Example
Year 1990 1991 1992 1993
Net Migration 50,000 -5,000 20,000 40,000
46
Multiple Bar Chart
• To make comparison between two or more variables.
• Example: A number of accounting firms were audited, and
classified according to size status (I [large], II [medium] and
III [small]) and the degree to which income-changing
accounting practices were used in preparing clients' tax
returns.
Degree of Change
Size No changes Some changes Total
Large 23 36 59
Medium 52 61 113
Small 22 21 43
Total 97 118 215
47
Multiple Bar Chart
48
Subdivided Bar Chart
• To show and compare the breakup of one variable into
several components.
Year 2000 2001 2002 2003 2004
No. of females 800 824 856 768 900
No. of males 1389 2450 1245 1655 1445
Total 2189 3274 2101 2423 2345
49
Broken Chart
• To represent data having broad variations in value.
• One observation may be extremely larger as compared to the
others.
• If we use a scale proportional to the value (frequency), then it
will be almost impossible to see the bars of small values.
• Example
• Represent the data given below using a suitable chart.
Year 1990 1991 1992 1993 1994 1995
Value 899 543 787 35323 121 234
50
Broken Bars . . .
• Simple bar: • Broken bar:
51
Pie Diagram
• Pie Chart
54
Steam and . . .
• Give a stem-and-leaf plot for the following data.
• 3.584, 3.615, 3.586, 3.712, 3.823, 3.616, 3.580, 3.888,
3.617, 3.584, 3.882, 3.912, 3.91, 3.712, 3.580, 3.917
• Stem Leaf
• 3.58 0 0 4 4 6
• 3.61 5 6 7
• 3.71 2 2
• 3.82 3
• 3.88 2 8
• 3.91 0 2 7
• 3.58|4 represents 3.584
55
2.2.4 Graphical Presentation of Data
56
Histogram Example
Daily High
Temperature Frequency
Histogram : Daily High Tem perature
10 but less than 20 3
20 but less than 30 6 7 6
30 but less than 40 5
40 but less than 50 4
6 5
50 but less than 60 2 5 4
Frequency
4 3
3 2
2
1 0 0
(No gaps 0
between 0 0 10 10 2020 30 30 40 40 50 50 60 60 70
bars) Temperature in Degrees 57
Frequency Polygon
• This is a line graph of class frequencies plotted against
class marks.
• End points must be joined to the x-axis (y = 0) at mid
points of empty classes: one before the first class and the
other after the last class.
• They serve the same purpose as histograms, but are
especially helpful for comparing sets of data.
• Example
• 1. Represent the following data using a frequency polygon.
Class 14.5-24.5 24.5-34.5 34.5-44.5 44.5-54.5 54.5-64.5
Frequency 3 4 8 6 7
58
Frequency Polygon . . .
59
Frequency Polygon . . .
• 2. The following frequency distribution refer to test scores
for 28 students in an examination. Plot frequency polygons
for the two datasets.
Score 0-5 5-10 10- 15 15-20 20-25
Test1 3 4 8 6 7
Test2 1 2 5 12 8
60
Ogive
o The ogive is a frequency polygon (line plot) of
cumulative frequency or the relative cumulative frequency.
oExample
Price in Birr Frequency Less than More than
Frequency Frequency
10-20 2 2 26
20-30 3 5 24
30-40 6 11 21
40-50 8 19 15
50-60 5 24 7
60-70 2 26 2
61
Ogive …
62