You are on page 1of 7

Module 1

Two areas of Statistics:


1. Descriptive Statistics: consists of methods for organising, summarizing and
presenting data in a convenient and useable form.

2. Inferential Statistics (Statistical Inference): consists of a body of methods for


drawing conclusions about characteristics of a population based on
information contained in a sample taken from that population.

Statistical Inference (Inferential Statistics)


In general, we never see the whole (population) but must make our decisions
based on information gathered from the part (sample).
Whenever we draw conclusions about the whole population based on sample
information we are practising Statistical Inference.

Some intuitive definitions:


• Confidence level – The chance we are correct in our inference.
• Significance level - The chance we are prepared to take in making a
wrong decision in our inference.

Types of data
1. Quantitative Data (Interval Data)
- are numerical observations. These data are said to have an interval
data scale.
Eg. Height, weight, time (values are real numbers)
Valid descriptive measure: mean (average), standard deviation,
median, mode, quartiles, percentiles.

2. Qualitative Data (Nominal Data)


- categorical data, values are arbitrary names of categories with no
particular order.
Eg. Male/ Female, Yes/No, Married/Divorced/Single
Valid descriptive measure: proportion in each category.

3. Ranked Data (Ordinal Data)


- categorical data with order.
Eg. Your satisfaction with a course, Age category
Valid descriptive measure: median, quartiles, percentiles, proportions.

Example: For each of the following examples, determine the data type.
a) The starting salaries of graduates from an M.B.A program.
Interval / Quantitative

b) Do you recommend this product to your patient?


Nominal / Qualitative
c) How often does your patient visit your surgery?
Interval / Quantitative

d) What age group is your patient?


- 12 or younger
- 13 to 45
- over 45
Ordinal / Ranked

Descriptive Statistics
Definition: Methods of organising, summarising and presenting numerical data.
Eg. Summary statistics (mean, standard deviation), tables, charts, graphs.

Numerical summary measures

Notation:
Sample statistic Population parameter
Mean x µ
Standard deviation s σ
proportion p̂ p
Note : the sample value (known) estimates the population value (unknown)

1) Measures of Central Tendency


i) Mean
• simple average, i.e. the sum of all the measurement
values divided by the number of measurements.
• the best measure of central tendency for purposes of
statistical inference. One serious drawback in that it is
influenced by extreme values.
Eg. 3, 4, 6, 43 has a mean of 14.
* May be used for interval data.

ii) Median (Me)


• the value that falls in the middle when the data is
arrange in ascending order.
Eg. 26, 60, 32, 30, 26, 29, 31
Arrange in ascending order,
26, 26, 29, 30, 31, 32, 60
Me = 30 (middle value)
*May be used for ordinal or interval data.

iii) Mode
• the value that occurs most frequently.
Eg. 31, 34, 36, 33, 28, 34, 34
Mode=34
*May be used for nominal, ordinal or interval data.

2) Measures of Spread (variability)


i) Range: numerical difference between the largest and the smallest
value. It does not give information about the spread of scores between the
lowest and highest scores.

ii) Interquartile range (looks at the middle 50% of data when data are in
ascending order)
Eg. 2, 4, 4, 5, 7, 8, 10, 12, 17, 18, 18, 21, 27, 29, 30

Q1 Me Q3

Me = 12
Lower quartile = Q1 = 5
Upper quartile = Q3 = 21
Interquartile range = IQR = Q3 – Q1 = 21-5 = 16

iii) Variance: the mean of the squared deviations from the mean, is very
important in statistical inference.

Standard deviation is the positive square root of the variance.


Eg. Sample data : 3.4, 2.5, 4.1, 1.2, 2.8, 3.7

Mean = x =
∑ xi = 3.4 + 2.5 + 4.1 + 1.2 + 2.8 + 3.7 = 2.95
n 6

Variance = s 2 =
∑ ( xi − x ) 2

n −1
[(3.4 − 2.95) + (2.5 − 2.95) 2 + ... + (3.7 − 2.95) 2
2
=
6 −1

=1.075

Standard deviation = s =1.037


Note: the mean and standard deviation can be easily obtained with the
use of any scientific calculator without using the above formulae.

Graphical data representation

Data may be presented in tabular or graphical form. Both of these methods


effectively summarize the data so that overall trends can be readily observed.
Histogram
Ogives quantitative / interval data
Stem and leaf

Bar chart
Column chart nominal, ordinal data
Pie chart

Interval data
Eg. The following data are the test scores of 30 students:

45, 48, 52, 54, 55, 58, 58, 59, 61, 61, 62, 64, 65, 66, 66, 67, 70, 73,
75, 77, 77, 78, 79, 80, 82, 83, 86, 86, 91, 94

Note: There is no hard and fast rule on determining the appropriate number of
classes, it may be considered that any number between 5 and say 20 may be
considered sufficient. It will be sufficient for you to use your judgement in
deciding on the number of classes or the class width. However, the class
intervals should be:
i) of equal size
ii) cover the full range of data
iii) mutually exclusive i.e. : a number can fit into only one class .

This data can now be summarised into frequency distribution


Class limits frequency
40 < 50 * 2
50 < 60 6
60 < 70 8
70 < 80 7
80 < 90 5
90 < 100 2
Total 30

* 40 up to 50 includes 40 but not 50.


Histogram
Frequency
10
9
8
7
6
5 Frequency
4
3
2
1
0
50 60 70 80 90 100
Test scores

Graphical Techniques for Nominal data/Ordinal data


• The only allowable calculation on nominal data is to count the frequency
of each value of a variable.
• When the raw data can be naturally categorized in a meaningful manner,
we can display frequencies by
1. Bar charts – emphasize frequency of occurrences of the different
categories.
2. Pie chart – emphasize the proportion of occurrences of each category.

The Pie Chart


• The pie chart is a circle, subdivided into a number of slices that
represent the various categories.
• The size of each slice is proportional to the percentage corresponding
to the category it represents.

Example
• The student placement office at a university wanted to determine the
general areas of employment of last year school graduates.
• Data was collected, and the count of the occurrences was recorded for
each area.
• These counts were converted to proportions and the results were
presented as a pie chart and a bar chart.
The Pie Chart

Other
11%
Accounting
29%

Marketing
25%

Finance
General 21%
management
14%

The Bar Chart


• Rectangles represent each category.
• The height of the rectangle represents the frequency.
• The base of the rectangle is arbitrary

Bar Chart

80 73
70 64
60 52
Frequency

50
40 36
28
30
20
10
0
Accounting Finance General Marketing Other
management
Area
The Bar Chart
• Use bar charts also when the order in which nominal data are presented is
meaningful.
Total number of new products introduced in
North America in the years 1989,…,1994

0
‘89 ‘90 ‘91 ‘92 ‘93 ‘94
Relationship between two variables
• Nominal / ordinal – use table
Eg. Grade and Gender
Fail Pass Credit Distinction Total
Female 18 26 19 5 68
Male 13 32 11 7 63
Total 31 58 30 12 131

You might also like