You are on page 1of 20

INTRODUCTION TO STATISTICS

PRESENTATION OF DATA
BUSINESS STATISTICS
Major topics:
1) Descriptive Methods Tabular, graphical and numerical summaries.
2) Sampling and Design of Experiments
3) Relationship between Quantitative and Categorical Variables
4) Probability Theory How to measure what is probable?
5) Probability Distributions Binomial, Poisson and Normal probabilities.

6) Sampling Theory Sampling Distributions.

7) Estimation and Hypothesis Testing


Making inferences about populations using sample data

8) Regression and Correlation


How to determine the relationship between variables?

9) Analysis of Variance : Comparison of population parameters


2
STATISTICAL METHODS

Two basic categories


1) Descriptive Statistics
Methods used to summarize the mass of data available to us, so that
we can easily and clearly see the overall ‘picture’.

Eg. Tabular, graphical and numerical summaries.

2) Inferential Statistics
Methods which facilitate the making of generalizations (i.e. inferences)
about population characteristics from information obtained from a
sample.

Eg. Estimating the average income of all Australian households by


surveying a small number of households from around the
country.

3
STATISTICAL TERMS
• Data: Individual pieces of information.
e.g. the number of Statistics students in semester 1, 2005 at Angell.
Many students, Many characteristics (e.g. age, income, gender)

• Census: A complete survey carried out on population data.


• Population: The complete set of items/data that are of interest in a
statistical investigation. (Eg. the ages of all Statistics students at
ABSF)
Must state 1.Population item 2.Boundary 3.Characteristic of interest.

• Sample: A ‘small’ subset of the population.


(Eg. the ages of Statistics students in tutorial group 1.)
• Parameter: A numerical summary measure of a population.
(Eg. average age of Statistics students.)
• Statistic: A numerical summary measure of a sample.
(Eg. average age of the students in tutorial group 1.)

4
Statistical data is usually stored in a computer file (eg.spreadsheets).
•Case: All responses from a person in a sample or census. Each row is a
case. E.g. One respondent’s answers to all questions on a questionnaire.
•Variable: All responses to a particular question in a sample or census.
Each column is a variable. E.g. Answers to a particular question on a
questionnaire filled in by all respondents.

Ex 1: A councilor who is running for the office of mayor of a city with 25 000
registered voters commissions a survey. In the survey, 48% of the 200
registered voters interviewed say they planned to vote for her.

a) What is the population of interest?


The intentions of the 25 000 registered voters to vote for her or not.
b) What is the sample?
The intentions of the 200 selected registered voters.
c) Is the value 48% a parameter or a statistic?
It is a statistic, since it refers to the proportion of registered voters in the
sample who intend to vote for this politician.
5
Ex 2: You are shown a coin that its owner says is fair in the sense that it will
produce the same number of heads and tails when flipped repeatedly.

a) Describe an experiment to test this claim.


Flip the coin, e.g. 30 times, and observe the outcomes (H or T).
The claim is false, if H (or T) is not produced exactly 50% of the trials.

b) What is the population in your experiment?


The set of outcomes (H or T) of large number (e.g. 10 000) of possible
trials.

c) What is the sample?


The set of outcomes (H or T) of the 30 trials actually performed.

d) What is the parameter?


The proportion of H (or T) in the population.

e) What is the statistic?


The proportion of H (or T) in the sample.
6
TYPES OF DATA
• Qualitative: Data which indicate the category or group that an
object or item is in. Not measurable (e.g. gender of students). These
are nominal qualitative data. Occasionally the categories can be ranked
(eg. BBus, Hon, Mas, Phd). These are ordinal data.

Quantitative: Data which arise from a counting or measuring


activity. Numerical observations, (e.g. exam results).
Data measured in real numbers are interval data.
Discrete quantitative: Data measured in whole numbers. The possible
values can be listed and counted.(eg. number of
students in different tutorial classes).
Continuous quantitative: (e.g. height, length, duration, volume).
Data can assume an infinite set of values within a
given interval. Eg. 2.5 is within the interval 2.49
and 2.51. So are 2.501 and 2.5011. The possible
values are too many to be listed or counted,
hence the term continuous.
Basic arithmetic operations (+, –, x, ) make sense.
7
Ex 3: Information concerning a magazine’s readership is of interest both to the
publisher and to the magazine’s advertisers. A survey of 20 subscribers
included the following questions. For each, determine the data type of possible
responses.
a) What is your age?
Quantitative, theoretically continuous, but discrete in practice.
b) What is your gender?
Qualitative.
c) What is your marital status?
Qualitative.
d) Is your annual income less than $20 000, between $20 000 and
40 000, or over $40 000?
Qualitative, ranked.
e) How many other magazines do you subscribe to?
Quantitative, discrete.

8
DESCRIPTIVE METHODS
Summarising Quantitative Data
Example 4
What was the average annual family income in Freiburg in 2005?
Population: annual incomes of all families living in Freiburg in 2005.

Survey a random sample of 5000 Freiburger families out of a


population of about 1 million and request their average family income in
2005.
Let’s suppose that 4500 families return the questionnaire.

Family Income ($’000) Discrete


1 29 quantitative
2 53 variable
Raw 3 48
data set 4 Missing
: :
5000 87
9
Example 4 continued:

In order to make it easier to interpret this list we can rank the


observations, 4500 valid data in ascending order.
• Ordered array: a list of the observations in ascending or order.

Family Income ($’000)


173 26 smallest
928 26
37 27
2027 27
: :
318 113 largest
Range: largest – smallest
i.e. 113 – 26 = 87

This table is still too large. We can construct a summary table.

10
Examples of tabular summaries are the frequency distribution, the relative
frequency distribution and the cumulative (relative) frequency distribution.

• Frequency distribution: a table that groups the data into class


intervals (or categories) and records the
corresponding number of observations,
i.e. frequencies.

• Relative frequency: the proportion of all observations in a given


class interval (often in percentage form).

• Cumulative (relative) frequency:


the sum of (relative) frequencies from the first class
through a given class.

Histogram: a graphical presentation of a (relative) frequency distribution.


–Each class is illustrated with a rectangle;
–The base of the rectangle corresponds to the class interval;
–The height of the rectangle is proportional to the
corresponding frequency.
11
Some rules and suggestions:

i. The class intervals must not overlap each other, but they must be
exhaustive.
(Each data point must belong to one and only one class.)

ii. Use class intervals of equal width.

The difference between consecutive


lower (or upper) class limits

However, if there are some extremely small or large observations, it is


better to leave the first and last intervals open (e.g. less than 30).

iii. Select a width and starting value that are convenient to work with
(i.e. integer, some multiple of 5, 10 etc. units).

iv. For numerical data, do not use less than 5, or more than 15 class
intervals.

12
There is a simple relationship between the number of classes, the range of the
data and the class width:
number of classes  range  class width

(Ex 4) If the common class width is 5, 87/5= 17.4 18 classes (too many)
If the common class width is 10, 87/10= 8.7 9 classes

Frequency distribution for annual family incomes:

Income Frequency
25 – < 35 230
35 – < 45 420
45 – < 55 475
55 – < 65 635
65 – < 75 930
75 – < 85 720
85 – < 95 590 590 families have
95 – < 105 395 annual income b/w
105 - < 115 105 85 and 95 ($’000)
Total 4500

13
Ex 4: Histogram of annual family incomes in Freiburg in year 2000,

1000

800
Frequency

600
… and polygon of annual
400 family incomes.
200

Annual family income ($'000)

25-35 35-45 45-55 55-65 65-75


75-85 85-95 95-105 105-115

Note: – The polygon is closed at both ends of the distribution.


– Histograms and polygons convey the same information.
14
From this table we can develop the relative frequency, cumulative frequency
and cumulative relative frequency distributions for annual family incomes:

Income Freq. R. freq. C. freq. C. R. freq.


25 – < 35 230 0.051 230 0.051
35 – < 45 420 0.093 650 0.144
45 – < 55 475 0.106 1125 0.250
55 – < 65 635 0.141 1760 0.391
65 – < 75 930 0.207 2690 0.598
75 – < 85 720 0.160 3410 0.758
85 – < 95 590 0.131 4000 0.889
95 – <105 395 0.088 4395 0.977
105 - <115 105 0.023 4500 1.000

10.6% (= 475 / 4500 × 100) of the 25.0% (= 1125 / 4500 × 100)


families have annual income b/w have income below 55
45 and 55 ($’000). ($’000).

1125 (= 230 + 420 + 475)


have income below 55
($’000).

15
DESCRIPTIVE METHODS Summarising Qualitative Data
Example 5
Let’s suppose a second question was asked in the previous survey:
What do you think of your current standard of living? Is it lower, higher
or the same as five years ago?
… the frequency distribution of the replies are tabulated as follows:

Qualitative variable Standard of living Frequency


Lower 1500
Categories Same 500
(not classes) Higher 2500
Total 4500

Qualitative data can also be illustrated with graphs.

16
• Column/bar chart: a graphical presentation of a (relative)
frequency distribution of qualitative data.
– Each category is illustrated with a rectangle;
– Each rectangle has the same width;
– The rectangles do not touch each other.

2500

2000
Frequency

1500

1000

500

0
Lower Same Higher

17
• Pie chart: an alternative type of graphical tool for (relative)
frequency distributions of qualitative data.
– The whole data set is illustrated with a circle;
– The circle is subdivided into slices that represent the
categories;
– The size of each slice is proportional to the corresponding
(relative) frequency.

1500

(Ex 5)
2500

500

Lower Same Higher


18
Example 6:
The following table shows the frequency distribution of professional
degrees of employees at the Frankfurt headquarters of a
multinational company:

Area Males Females


Science 45 25
Business 67 80 Contingency or
Engineering 30 10 cross-classification
Arts 34 15 table for degree
Other 25 55 and gender
Total 201 185

This data set can be illustrated by a clustered column chart, a


stacked column chart or pie charts.

19
Clustered bar chart Stacked bar chart

80
70 250
60
200
50
40 150
30 100
20
10 50
0 0
Males Females Males Females

Science Business Engineering Arts Other Science Business Engineering Arts Other

Pie chart - Males Exploded pie chart - Females

12% 22%
17%

15% 34%

Science Business Engineering Arts Other Science Business Engineering Arts Other
20

You might also like