You are on page 1of 46

BASIC CONCEPTS OF STATISTICS

Topic covers:
•What is statistics all about
•The main branches of statistics
•Types of data and measurement scales
•The role of statistics in decision making
By the end of this topic you should be able
to:
 Define statistics
 Define and distinguish between descriptive
and inferential statistics
 Define and describe the different types of data
and measurement scales
 Describe the different sources of data
 Explain the role of statistics in decision making
Introduction

• Definition of statistics
A subject / field which deals with the following
aspects:
Collection
Organisation
Analysis
Critical evaluation
Interpretation of data for decision making purposes
Statistics in management
• Use in decision making – management decision
support systems.
• Information – high quality info (accurate, relevant,
timely, adequate, easily accessible.) – needs to be
generated from data.
• Data – more readily available from variety of sources.
• Statistics – when a large number of data values are
collected, collated, summarised, analysed and
presented in easily readable ways so that useful and
useable info from management DM is generated –
the role of Statistics in management.
Def. Statistics
A set of mathematically based tools and
techniques to transform raw / unprocessed
data into a few summary measures that
represent useful and useable info to support
effective decision making.
The summary measures are used to describe
profiles / patterns of data, test relationships
between sets of data and identify trends in
data over time.
Important terms / concepts/symbols used in
statistics.
• A random variable – any attribute of interest
on which data is collected and analysed.
• Data – the actual values or outcomes recorded
on a random variable.
e.gs of random variables and their data:
- the travel distances of delivery vehicles ( 34km,
13 km, 20 km)
- The duration of machine down time ( 40 mins, 25
mins, 6 mins)
- brand of coffee preferred (Nescafe, Ricoffy, Frisco)
• Sampling unit – the object being measured counted
or observed with respect to the random variable
under study e.g consumer, household, company,
product.
An employee can be measured for age ,gender
,qualifications etc.
• A population – the collection of all possible data
values that exist for the random variable under study
• A population parameter – a measure that
describes a characteristic of a population e.g
pop average, pop proportion – it uses all the
pop data to computer its value.
• A sample – a subset of data values drawn from
a population – used because its not possible
to record every data value of the pop ( cost,
time, possible item distruction.)
• Sample statistic – a measure that describes a
characteristic of a sample e.g sample
average/mean, sample proportion.
Examples of population and associated
samples
Random variable Population Sampling unit Sample

Size of bank All current accounts A CBZ client with a 400 randomly
overdraft with CBZ current account selected clients’
current account.

Mode of daily All commuters to A commuter to 600 randomly


commuter Harare CBD Harare CBD selected
transport to work commuters to
Harare CBD

TV programme All TV viewers in A TV viewer in 2000 randomly


preferences Masvingo Masvingo selected viewers in
Masvingo
Symbolic notation for sample and population
measures

Statistical measure Sample statistic Population parameter

Mean μ

Standard deviation s δ

Variance s2 δ2

Size n Ν

Proportion p π

Correlation r ρ
Main Branches of Statistics

• Descriptive statistics / Deductive statistics


• Inferential statistics / Inductive statistics
• Statistical Modelling
Descriptive statistics
Definition : A branch of statistics that involves collection,
presentation and characterisation of a set of data in
order to properly describe the features of that data set.
• Emphasis is on data presentation methods.
• Data is grouped or ungrouped and frequency of how
data occurs is determined and illustrated graphically –
summarised data – allow user to identify
profiles,patterns,relationships & trends.
• Characteristics of data set also identified by use of
measures of central location or of variability.
Inferential statistics
Definition : The process of using sample statistics to
draw conclusions about the true / underlying
population parameters.
Provides means by which one is able to draw
conclusions about the population using sample
data.
e.gs of questions answered by inferential statistics:
• Are 2 variables correlated?
• Are the means of two samples the same?
Hypothesis testing Estimation : Aim
• Is to develop tests for assessing if a statement
is true or false.
• Estimating population parameters using
sample data – confidence interval.
• Checking the validity of a claim made in
relation to a parameter or a population –
hypothesis testing.
• Statistical Modelling
-Builds models of relationships between random
variables that are related which are used to
estimate /predict values in the future-
forecasting decisions.
Why do we study statistics in business
Management course:
• To make decisions that affect profitability and continuity.
• Useful in research and development.
• Useful in general management, finance, production,
sales, advertising – all functional areas of business
• Need to interpret information contained in numbers in
order to make intelligent / informed decisions
• Statistics is about what information to collect, how to
collect it and how to operate effectively in modern
business.
Types of data and measurement scales
Data – measurements of one or more variables of a
sample unit drawn from a population
Data enables user to assess data quality and to select
appropriate statistical method to use to analyse data-
these affect validity & reliability of statistical findings.
Data quality is influenced by:
• data type
• data source, and
• methods of data collection
Classification of data
Data come in a variety of types, affecting the
ways they can be measured and the
appropriate statistical methods for analysing
them.
2 types – qualitative and quantitative.
Qualitative data
Definition : data that cannot be measured
numerically(categorical- language, gender, profession
etc)but can fall in one or more non-numerical categories

1. Nominal scaled data – qualitative data where codes are


assigned purely for identification purposes i.e codes are
of equal importance e.g. types of churches; A.F.M,
Methodist, R.C.
Weakest form of data to be analysed-limited range of
statistical methods to be applied
2.Ordinal data – where codes are assigned but
are not of equal of equal importance e.g. t-
shirt sizes; L, S, M, XL or Company size; small,
medium ,large.(prod usage category)
-Stronger than nominal data
- Is categorical but has an implied ranking
- Has numeric property of order but dist. bet .
ranks are not equal.
Quantitative data

Definition – data that can be measured


numerically QRV generate a numeric response
i.e variables can be expressed as numerical
quantities e.g
• Height of a person
• Number of customers
It falls in 2 classes i.e discreet and continuous
Discreet data
Discreet data – takes distinct/specific values on a given scale of
measurement i.e. assume whole numbers.eg No of accidents
on Hre Mtre rd
Discreet random variable – a random variable whose observation
take discreet values.
Random variable – is a numerically valued function whose value is
determined by a random experiment e.g.
• number of pages in a book
• Number of students that graduated at CUZ in the past 2 years
• number of accidents that occurred on Harare Masvingo road
last month
Continuous data
Continuous data – i.e. if the observation can
take any values in an interval.
Continuous random variable – random variable
that takes any values in an interval i.e. whole
or fraction e.g. speed of car, temperature in
room, height of person, weight of adult.
Measurement scales
To choose the appropriate statistical methods for summarising
and analysing data we need to distinguish among different
measurement scales or levels of measurement. All data
measurements are taken on one of four major scales :
nominal, ordinal, interval or ratio.
Definition : the assignment of numerals to objects or events
according to rules. There are four measurement scales
• nominal scale
• ordinal scale
• interval scale and
• ratio
Nominal scale
Definition : the weakest of the 4
Used when objects / events are distinguished from one
another by naming e.g race, gender, employment status.
If we assign codes e.g 1 for race, 2 for gender etc. these are
labels we cannot do any arithmetic calculations on.
e.g blood type, sex of person, colour of cars.
They categorise data but do not rank them.
We can only say one object / event is different from the
other.
Ordinal scale
Definition – measurement has ordinal scale if it
also tells when the individual has more or less
of the property than does another individual.
But its possible to tell how much more or less of
the characteristics one object has than
another.
e.gs size of shoes, size of accounts receivables,
rate of typing by typist.
Interval scale
The relative order of numbers is important and the difference
between them e.g greater than or less than by what amount.
(rank & order)
Generated mainly from rating scales used to measure eg
respondents’ attitudes, motivations ,preferences etc
• It does not have a true origin
• It assumes a hypothetical or arbitrary “o” value i.e zero does not
indicate / imply the absence of the value being measured e.gs
temperature, time at which an event is happening, IQ tests.
• N.B – time at a moment is interval but time taken to do an event
is ratio
Ratio scale
Ratio data consists of all real numbers associated
with quantitative random variables.

-Ratio data has all the properties of numbers (order,


distance and an absolute origin of
zero) that allow such data to be manipulated using
all arithmetic operations (addition,
subtraction, multiplication and division).
-The zero origin property means that ratios can be
computed (5 is half of 10, 4 is one-quarter of 16, 36
is twice as great as 18, for example).
Ratio data is the strongest data for statistical
analysis.
It has both a unit of measurement and a true zero
where zero is the absence of the attribute being
measured – the numbers tell us that one individual
has so many times as great as another individual
e.g marks of student, profits for a company, output
for production, (age of employee, height of
building, distance travelled)

Summary of Classification of Data and Influence on


Statistitical Analysis...............(diag. below)
Data sources
Data must be accurate & reliable for findings to be
valid.
Internal sources – from within the
organisation(Financial ,Production ,HR
,Marketing)
e.g. marks of students, sales, salaries, production.
External sources – data sourced outside the
organisation e.g. from private institutions, from
government bodies, from employee associations
(central statistics, newspapers etc.)
Primary data
Definition – data that is captured at point where they are
generated, for the first time with a specific purpose in mind.
Can be internal or external
e.gs – as for internal source but includes data (personal,
market).
Advantages :
Its high quality
External-HR surveys, Economic surveys, Consumer surveys.
Disadvantage:
it can be time consuming and
expensive to collect, particularly if sourced using surveys
Secondary data
Secondary data – data which has been collected for
purposes other than the problem at hand.
Exists in a processed format.
Examples-monthly stock report, absenteeism report,
exports/employment statistics from CSO
advantages
Relatively short access time, generally inexpensive to
acquire.
disadvantages (what are they.)
Data Collection Methods
The method(s) used to collect data can
introduce bias into the data and also affect
data accuracy.-3 main methods
1.Observation
Primary data- observing respondent /process in
action eg traffic surveys,in-store consumer
behaviour, quality control inspections.
Advantages and disadvantages
Surveys
gather primary data through the direct
questioning of respondents using
questionnaires to structure and record the data
collection.- attitudinal type data
• Examples - personal
interviews, telephone surveys or e-surveys
(replacing postal surveys).
• Personal- a face-to-face contact with a
respondent during which a
questionnaire is completed.
Advantages
• a higher response rate is generally achieved
• it allows probing for reasons
• the data is current and generally more
accurate
• it allows questioning of a technical nature
• non-verbal responses (body language and
facial expressions) can be observed and noted
• more questions can generally be asked
• the use of aided-recall questions and other
visual prompts is possible
Disadvantages
Time consuming & expensive
Telephone Interviews(used esp in snap opinion polls)
Advantages

• It keeps the data current by allowing quicker


contact with geographically dispersed (and
• often highly mobile) respondents (using mobile
phone contacts).
• Call-backs can be made if the respondent is not
available right away.
.
• The cost is relatively low.
• People are more willing to talk on the
telephone, from the security of their home.
• Interviewer probing is possible.
• Questions can be clarified by the interviewer.
• The use of aided-recall questions is possible.
• A larger sample of respondents can be
reached in a relatively short time.
Disadvantages
• the loss of respondent anonymity
• the inability of the interviewer to observe non-verbal
(body language) responses
• the need for trained interviewers, which increases costs
• the likelihood of interviewer bias
• the possibility of a prematurely terminated interview (and
therefore loss of data) if the
• respondent puts down the telephone
• the possibility that sampling bias can be introduced into
the results if a significant
• percentage of the target population does not have access
to a telephone (i.e. landline or
• mobile phone).
An e-survey approach uses the technology of e-
mails, the internet and mobile phones (e.g.
sms) to conduct surveys and gather respondent
data
Suitable when the target population from which
primary data
is required is geographically dispersed and it is
not practical to conduct personal interviews
State the advantages and disadvantages.
3 Experimentation
Primary data can also be obtained by conducting
experiments.
-Analyst manipulates certain variables under
controlled conditions.
- Conscious effort is made to control the
effects of a number of influencing factors.
- Statistical methods called experimental design
models are used to analyse experimental data.
• Examples
• price manipulation of products to monitor
demand elasticity
• testing advertising effectiveness by changing
the frequency and choice of advertising media
• altering machine settings in a supervised
manner to examine the effects on product
quality.
State the advantages and disadvantages.
Topic summary
You are now able to:
• define statistics.
• distinguish between the main branches of statistics.
• define data.
• classify data.
• define and describe the four measurement scales of
data.
• explain the various sources of data and scale.
• their advantages and disadvantages
Data preparation – it must be relevant , clean & in
the correct format
Relevancy- the random variables must be problem
specific(to address business problem under
investigation)
Data cleaning – check for typo errors, out-of-range
values and outliers to reduce poor quality info
for DM.
Data enrichment – to make it more relevant –need
for more meaningful measures eg turnover &
store size combined to create turnover per
square meter.
Role of statistics in decision making
Statistics is the technology of the scientific
method which consist of tools used to
facilitate decision making under conditions of
uncertainty.
Statistics provides decision maker with relevant
information and provides an estimate of the
probability or consequences of making a
wrong decision.

You might also like