You are on page 1of 18

Makerere University Business School

BUSINESS STATISTICS

amugarura@mubs.ac.ug

Definition: Statistics is a science that deals with the planning and collection of data, organizing
and classification of data, then presentation of data, Analyzing of data and interpretation of that
data which will help in drawing meaningful or credible conclusions.

Statistics play an important role in business. A successful businessman must be very quick and
accurate in decision making. Must know what his/her customers wants, Should therefore, know
what to produce and sell and in what quantities. Statistics helps businessman to plan production
according to the taste of the costumers, the quality of the products by using statistical methods.
So all the activities of the businessman are mostly based on statistical information. Making
correct decision about the location of business, marketing of the products, financial resources etc,
all depend on statistical tools.

Statistics in business environment


Helps
 To deal with uncertainties by forecasting seasonal, cyclic and general economic
fluctuations
 In making sound decisions by providing accurate estimates, or predictions about costs,
demand, prices, sales etc
 Business planning on the basis of sound predictions and assumptions
 Measuring variations in the performance of products, employees, business units etc
 Allows comparison of two or more products, business units, sales teams etc
 In identifying relationship between various variables and their effects
 Validating generalizations and theoretical concepts formulated by managers

Basic terms in statistics


1. Population
A set of data containing all possible observations at a given phenomenon is referred to a
population. An inquiry that entails population is referred to a census or the totality of all
things/elements/members/objects under the study.
The totality of all things /members/objects under study or the entire group that you want to draw
conclusions about.
2. Sample

1
A set of observations selected from a population is termed as a sample. Also defined as
part/portion or a subset of the population. A sample is the specific group that you will collect
data from
3. Parameter
This is a numerical value that describes the characteristics of a population. population mean or
population variance. E.g. population mean (µ), Population variance (σ 2 ), population standard
deviation

5. Statistic
Is a numerical value that describes the characteristics of a sample e.g. sample mean ( x ), sample
variance (s 2 ), sample standard deviation (s)

6. Census; total enumeration of the population under study


7. Survey; the collection of data or an inquiry done on a sample selected from a population

8. Data; these are raw facts, ideas or beliefs


9. Information; this is processed data

10. Sampling; the process of obtaining a sample


11. Sampling frame; list of all sampling units/elements in the population ready for sampling
into a sample.
12. Sampling errors; these are errors which occur because observations are taken from the
sample rather than the population
13. Non sampling errors; Errors that occur in the research process due to numerous factors
at every stage of sampling up to the report writing excluding the sampling error.
14. Variable; A variable is a characteristic of a unit being observed that may assume more
than one of a set of values to which a numerical measure or a category from a
classification can be assigned (e.g. income, age, weight, temperature, no of students in a
class, no of children born, Kilos/bags of rice harvested etc., and “occupation”, “industry”,
“disease”, religion, race, region, education level. etc.
15. Random variables are associated with random processes and give numbers to outcomes
of random events.

16. Quantitative and Qualitative variables


Variables can be classified as qualitative (categorical) or quantitative (numeric).

. Qualitative variables are descriptions; these variables can be described by the shape, nature,
colour, type or take on names or labels. Or any variable that can’t be counted (i.e. has no
numerical value). The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie,

2
shepherd, and terrier) would be examples of qualitative or categorical variables. Nominal and
ordinal variables fall under this umbrella term

Quantitative variables. Any variable that can be counted, or has a numerical value associated
with it. Or these are variables that can be measured or counted. Examples of variables that fall
into this category include discrete variables and ratio variables. They represent a measurable
quantity. For example, when we speak of the population of a city, we are talking about the
number of people in the city - a measurable attribute of the city. Therefore, population, number
of children, number of cars, age, length, distance, number of children, etc. would be a
quantitative variable.

17. Nominal variable: another name for categorical variable. Examples; Gender, race etc.
18. Ordinal variable: similar to a categorical variable, but there is a clear order. For example,
income levels of low, middle, and high could be considered ordinal.
19. Discrete random and Continuous variable
Quantitative variables can be further classified as discrete (variables whose observation are
obtained by counting) or continuous variables (ones whose observation values are obtained by
measuring).

20. Discrete statistics


Descriptive statistics is the term given to the analysis of data that helps to describe, show or
summarize data in a meaningful way. Simply ways to describe our data. The measures used to
describe the data set are measures of central tendency and measures of variability or dispersion.

21. Inferential Statistics


The branch of statistics dealing with statistical methods used to make conclusions,
generalizations, predictions, and estimations based on data from samples. These are statistical
methods used to make inference or infer from the sample data what the population might think.

Types of data
By source
Primary data- Data from the field
Secondary data- Data from journals, newspapers, text books, NGO’s reports and Gov’t
departmental reports and agencies (UBOS. URSB, UNRA, URA, etc.)
By nature
Qualitative data (Categorical)
Categorical data represents characteristics. Therefore it can represent things like person’s
gender, language, education level, religious affiliation, race, residence etc.
Quantitative (Numerical) Data
The numerical values that are obtained by measuring or counting. Eg. Age, temperature,
distance, weight, no. of students etc. Numerical data can be put into two groups i.e.
Discrete Data: We speak of discrete data if its values are distinct and separate. This type

3
of data can’t be measured but it can be counted. An example is the number of students
in a class.
Continuous Data: Continuous Data represents measurements and therefore their values
can’t be counted but they can be measured. An example would be the height of a
person, temperature, distance, weight, etc

Functions of statistics
i. It presents the facts in the form of numerical data.
ii. It helps in prediction or forecasting.
iii. It helps in formulation of policies.
iv. It helps in formulating and hypothesis testing for the purpose of correlation and other
aspects
v. It condenses numerical data.

Application of statistics in business


Statistics can be used in many areas in business environment; for example,
1. Production. Production depends on demand. This demand must be predicted accurately
using statistical techniques. Decisions on what to produce must be analyzed statistically.
2. Entrepreneurs. To ensure success in any new venture, proper analysis using statistical
techniques must be done on past records and current market trends.
3. Marketing. Data on population, disposable income, competition, quality of goods and
others must be analyzed for a good marketing strategy
4. Purchasing. This is used to make decisions on the purchasing of raw materials and
others. In any organization, there is purchasing department responsible.
5. Investment. To invest you must have the reliable facts on the production or other
commodities or whatever the investment you intend to put up, well analyzed facts to rely
on to make decisions.
6. Banking. Most banks have research departments which have to collect data and analyze
it, to come up with facts about businesses which are directly or indirectly involved with
them and about general economic conditions.
7. Quality control. For quality assurance, statistical techniques must be used to have better
results of the data and make decisions.
8. Research. This is collecting, presenting, analyzing the data for making reliable decisions

4
DATA CLASIFICATION AND COLLECTION
Data can be classified according to the source and nature
Sources
a) Primary source; this refers to collecting data directly from the field. Such as data
collected by population census enumerators, business survey enumerators’ etc. Therefore,
you get primary data

b) Secondary source; this refers to collecting data from published or unpublished


compilations e.g. journals, newspapers, magazines, production records, text books,
population secretariats, government department and agencies etc. Therefore, you get
secondary data

Nature
a) Qualitative data
b) Quantitative data

Levels/scales of measurement
In statistics, there are four data measurement scales: nominal, ordinal, interval and ratio. These
are simply ways to sub-categorize different types of data.

Nominal
Nominal scales are used for labeling variables, without any quantitative value. “Nominal” scales
could simply be called “labels.” Here are some examples. Notice that all of these scales are
mutually exclusive (no overlap) and none of them have any numerical significance. A good way
to remember all of this is that “nominal” sounds a lot like “name” and nominal scales are kind of
like “names” or labels.
Examples of Nominal variable measured on a nominal Scales include: Gender, Marital status,
Race, Religion etc.

Ordinal
With ordinal scales, the order of the values is what’s important and significant, but the
differences between each one is not really known. Take a look at the example below. In each
case, we know that a #4 is better than a #3 or #2, but we don’t know–and cannot quantify–
how much better it is. For example, is the difference between “OK” and “Unhappy” the same as
the difference between “Very Happy” and “Happy?” We can’t say.
Ordinal scales are typically measures of non-numeric concepts like satisfaction, happiness,
discomfort, etc.
“Ordinal” is easy to remember because it sounds like “order” and that’s the key to remember
with “ordinal scales”–it is the order that matters, but that’s all you really get from these.
Examples; satisfaction, income level, age group, education level etc.

5
Interval
Interval scales are numeric scales in which we know both the order and the exact differences
between the values. An interval scale can be defined as a quantitative measurement scale where
variables have an order, the difference between two variables is equal, and the presence of zero is
arbitrary (random choice/not planned).
The classic example of an interval scale is Celsius temperature because the difference between
each value is the same. For example, the difference between 60 and 50 degrees is a measurable
10 degrees, as is the difference between 80 and 70 degrees. Also Score grade, IQ test

Ratio
Ratio scales are the ultimate nirvana when it comes to data measurement scales because they tell
us about the order, they tell us the exact value between units, and they also have an absolute zero
(a meaningful zero), which allows for a wide range of both descriptive and inferential statistics to
be applied.
Unlike on an interval scale, a zero on a ratio scale means there is a total absence of the variable
you are measuring. Ratio scales provide a wealth of possibilities when it comes to statistical
analysis. These variables can be meaningfully added, subtracted, multiplied, divided (ratios).
Central tendency can be measured by mode, median, or mean; measures of dispersion, such as
standard deviation and coefficient of variation can also be calculated from ratio scales.
Examples; distance, weight, years of experience, time, etc.

In summary
Nominal variables are used to “name,” or label a series of values. Ordinal scales provide good
information about the order of choices, such as in a customer satisfaction survey. Interval scales
give us the order of values + the ability to quantify the difference between each one.
Finally, Ratio scales give us the ultimate–order, interval values, plus the ability to calculate
ratios since a “true zero” can be defined.

6
Methods of data collection

There are various methods of data collection. The most popular are: observation, interview and
questionnaire, telephone, and Focus Group Discussion methods.

1. Direct observation
This involves numerators taking observations directly from the sampling unit of interest e.g. in
agriculture surveys enumerators’ observe and measure accurately the area under cultivation.
Advantages
a) It is free from errors due to memory lapse as enumerators record everything as they see it
b) Non response errors are never encountered
c) High response rate
d) Good for sensitive studies since it allows explanations
Disadvantages
a) It may be expensive and time consuming
b) Transport and communication is a problem. In case of poor infrastructure
c) It is not feasible when investigating human behavior
d) People’s attitude about something cannot be got by mere observation

2. Personal interviews/contact
In this method, the enumerator is brought into contact with the respondent and asks him or her
questions out of the subject under study.
Advantages

7
a) On spot clarification is possible due to the interaction between enumerator and
respondent.
b) It is important where the respondent may not be sure of the kind of response required.
c) There is creation of rapport between the interviewer and respondent.
d) Non response errors are not common because of the interaction.
Disadvantages
a) It involves high expenses on transport and other field related exercises.
b) It is prone to interview bias. In event of leading or suggestive questions.
c) Memory lapse is common.
d) Time consuming

3. Focus Group Discussion method


This is where the researcher will have to sit with a group of respondents (key informants) of
small size and they are put to discussion of the matter under the study.
Advantages.
a) First-hand information is obtained
b) Detailed data/in-depth information obtained
c) May be good for a sensitive study
d) High response rate
Disadvantages
a) Time consuming
b) Expensive as you sponsor the respondents
c) Very tiresome

4. Telephone Survey
The investigator contacts the respondents on telephone and collects the information from them
by use of telephones
Advantages
a) Most convenient method
b) Less time consumed
c) Good for remote areas
Disadvantages
a) Less response rate
b) Incorrect information/second hand information can collected
c) It excludes those without phones
d) More subjective in nature

5. Questionnaire method
Refers to a document which has questions which are either open or pre-coded questions or
structured questions.

8
Types
- Self-administered – here the researcher take the questionnaires to the respondents him/her
self and then administered.
- Mail questionnaire – The researcher sends the questionnaires via emails to the
respondents who will send them back through the same means after answering.
Advantages
i) Speed and cost reduction as it does not involve the movement of people
ii) It is very effective where sample units are scattered
iii) It’s possible to get correct answers to sensitive questions since they can be filled privately
iv) Errors due to interview bias are reduced
v) Correct information can be got since consultations can be made
Disadvantages
i) Assumes high level of literacy among respondents which is not the case with African
countries.
ii) Assumes a good and efficient postal system
iii) Follow ups are very difficult to conduct
iv) There is a high rate of non-responses
v) Response is usually slow
vi) A wrong person may fill the questionnaires thereby biasing the results
vii) Consultations may bias the results

Principles of Good Research

 Problem statement; There is a clear statement of research aims, which defines the research
question.
 There is an information sheet for participants, which sets out clearly what the research is
about, what it will involve and consent.
 Clear Methodology; the methodology is appropriate to the research question. So, if the
research is into people’s perceptions, a more qualitative, unstructured interview may be
appropriate.
 Unbiased research; the research should be carried out in an unbiased fashion. As far as
possible the researcher should not influence the results of the research in any way. If this is
likely, it needs to be addressed explicitly and systematically.
 Resources; from the beginning, the research should have appropriate and sufficient resources
in terms of people, time, transport, money etc. allocated to it.
 Trained interviewers/enumerators; the people conducting the research should be trained in
research and research methods and this training should provide: • Knowledge around appropriate
information gathering techniques, • An understanding of research issues, • An understanding of
the research area, • An understanding of the issues around dealing with vulnerable social care

9
clients and housing clients, especially regarding risk, privacy and sensitivity and the possible
need for support.
 Those involved in designing, conducting, analyzing and supervising the research should have a
full understanding of the subject area.
 Justification of research; if applicable, the information generated from the research will
inform the policy-making process.
 Ethical considerations; all research should be ethical and not harmful in any way to the
participants.
Dissemination of results; after the report writing, the findings should be shared for public use or
consumption not keeping the results. This will help to realize the importance of the study.

Sampling
Sampling; this is the process or procedures of selecting a sample from the population. A sample
is used for making estimates to represent the population.
A sample; this is a sub set/portion/part of the population to be used in the study.

Reasons for sampling (Why sample not a census or a population?)


 Resources available
 Time available
 Accurate data
 Quality
 Accessibility/Where there is destructive environment etc.

Sampling techniques
These are grouped into two categories; Probability and Non probability sampling techniques
1. Probability sampling techniques.
Here the sample selected depends on the chance. It uses scientific measures i.e. you can be able
to determine the mean and standard deviation. They are usually used when the size of the
population is known. These include;
-Simple Random Sampling (SRS). This is a method in which each unit in the population has an
equal chance of being selected into the sample. It is important that you have a sampling frame
be4 sampling. Advantages are that it can be easily carried out, convenient, and Cheap in terms of
costs involved. It is cheap but not very accurate.

Example
Follow these steps to extract a simple random sample of 100 employees out of 500.
i. Make a list of all the employees working in the organization. (as mentioned above there
are 500 employees in the organization, the record must contain 500 names).

10
ii. Assign a sequential number to each employee (1,2,3…n). This is your sampling frame
(the list from which you draw your sample).
iii. Figure out what your sample size is going to be. (In this case, the sample size is 100).
iv. Use a random number generator to select the sample, For example, if your sample size is
100 and your population is 500, generate 100 random numbers between 1 and 500.

-Stratified random sampling. Here the population is divided into a number of groups, using
variable of interest, which are homogeneous within called strata and simple random sampling is
used to select few units from each stratum. However, this method needs a highly qualified
personal and its costly and time consuming
- Cluster sampling. The population is sub-divided into groups using administration boundaries,
which are heterogeneous within called clusters. The boundaries of the clusters must be well
known by the people or administrative boundaries and the simple random sampling is used to
select few clusters into the sample. Very cheap but not very inaccurate
- Systematic sampling. The population has unites which are selected in a systematic manner. In
this method, the first unit into the sample is obtained using simple random sampling and the rest
of the units are obtained automatically (systematically) with a help of some pre-determined
pattern. It is considered the most expensive method but most accurate. This is because the
method needs you to have the list of all the units in the population before sample selection.
-Multistage sampling. This is sampling for two or more times using any two methods
mentioned above. E.g. Uganda as population, you may sample a few districts and from each
district, few counties are selected to act as a sample

2. Non-probability sampling techniques


This is where the size of the population is not known and there is no element of chance
Convenience Sampling; is probably the most common of all sampling techniques.
Convenience sampling is a non-probability sampling technique where samples are selected from
the population only because they are conveniently available or accessible to the researcher.
Researchers choose these samples just because they are easy to recruit into the sample. Most
researchers rely on convenience sampling, as non-probability sampling method, because of its
speed, cost-effectiveness, and ease of availability of the sample.

Judgmental Sampling; is more commonly known as purposive sampling. In this type of


sampling, subjects are chosen to be part of the sample with a specific purpose in mind. With
judgmental sampling, the researcher believes that some subjects are fit for the research compared
to other individuals. This is the reason why they are purposively chosen as subjects. In other
words, researchers choose only those people who they deem fit to participate in the research
study

11
Snowball Sampling; is usually done when the respondents of interest are rare/scarce/hard to
get from population. In this type of sampling, the researcher will identify the first respondent
with all the characteristics of interest and then this will lead to others and so on. If the population
is hard to access, snowball sampling can be used to recruit participants via other participants.

Difference between non-probability sampling and probability sampling:

Non-probability sampling Probability sampling

Sample selection based on the subjective judgment of the


The sample is selected at random.
researcher.
Everyone in the population has an equal
Not everyone has an equal chance to participate.
chance of getting selected.
The researcher does not consider sampling bias. Used when sampling bias has to be reduced.
Useful when the population has similar traits. Useful when the population is diverse.
The sample does not accurately represent the population. Used to create an accurate sample.
Finding respondents is easy. Finding the right respondents is not easy.

DATA PRESENTATION
(How do we present data?)
There are basically four ways of data presentation:
i) Text presentation (flow charts)
ii) Tabular presentation
iii) Graphical presentation (Histogram, Freq. polygon, Cum. Freq. curve)
iv) Diagrammatic presentation (Bar-chart, pie-charts)

a) Tabular presentation

Tabular method includes frequency distribution tables and relative frequency distributions.
Data can be presented in various forms depending on the type of data collected. A frequency
distribution is a table showing how often each value (or set of values) of the variable in question
occurs in a data set. Frequencies can also be presented as relative frequency tables, that is, the
percentage of the total number in the sample.

Example
The following are marks scored by students of BBC in a coursework test.
78, 58, 48, 83, 70, 66, 55, 45, 53, 24,
12
34, 20, 49, 58, 68, 77, 66, 55, 43, 47,
44, 38, 58, 89, 57, 56, 44, 34, 38, 88,
66, 54, 59, 87, 76, 66, 63, 61, 62, 56,
53, 45, 75, 64, 33, 68, 87, 56, 78, 88
o Construct a frequency distribution table starting with class interval 20-30, 30-
40…….. Hence relative frequency table

Solution
There are two types of frequency distribution tables; Exclusive frequency distributions and an
Inclusive frequency distributions

Frequency distribution table; this is a table showing each value (set of values) with their
corresponding frequency of occurrence in the data set.

cumulative
Frequency Relative frequency
Class tallies (f) frequency (r.f) (c.f)
20-30 // 2 (2/50)*100 = 4 2
30-40 ///// 5 (5/50)*100 =10 7
40-50 ///// /// 8 (8/50)*100= 16 15
50-60 ///// ///// /// 13 (13/50)*100 = 26 28
60-70 ///// ///// / 11 (11/50)*100 = 22 39
70-80 ///// 5 (5/50)*100 = 10 44
80-90 ///// / 6 (6/50)*100 = 12 50
Total 50 100
Note;
Read about these terms as used in statistics; Model class, Median class, class limits, class
boundaries, middle-mark (x), class-interval, class width

b) Graphical method: Frequency distributions are usually illustrated graphically by


plotting various types of graphs:

i. Histogram - A histogram is a way of summarizing data that are measured on an interval


scale (either discrete or continuous). It is a graph showing frequency on the y-axis
against class limits (classes)/class boundaries on x-axis.

Example: The table below shows the marks scored by students of BCOM in CW I
test. Represent the data on a histogram

13
class limits freq. f
20-24 2
25 -29 10
30-34 25
35-39 70
40-44 140
45-49 50
50+ 5

Solution

ii. Frequency Polygon; this is a graph showing/ with frequency on y-axis and mid-points
on the x-axis. This can also be obtained from the histogram.
iii. Cumulative frequency curves (Ogive); showing cumulative frequencies on y-axis
against class boundaries on x-axis
iv. Line graph- A line graph is particularly useful when we want to show the trend of a
variable over time. Time is displayed on the horizontal axis (x-axis) and the variable
is displayed on the vertical axis (y- axis).

c) Diagrammatic methods; these are simple diagrams like bar-charts, pie-charts and
pictograms; with bar graph, they include; simple bar graphs, Compound or combined bar
graphs and multiple bar graphs.

i. Bar graph - A bar graph is a way of summarizing a set of categorical data. It displays the
data using a number of rectangles, of the same width, each of which represents a
particular category. Bar graphs can be displayed horizontally or vertically and they
are usually drawn with a gap between the bars (rectangles).

Simple bar graph

Country Exports
(1000 $)
Uganda 45

14
Rwanda 60
Tanzania 75
Kenya 70
Burundi 40

A bar-graph showing total exports per country

Compound bar graph

Given the data below in the table, use it to construct/draw a compound bar graph

Exports GDP Imports


Country (1000 $) (1000 $) (1000 $)
Uganda 45 20 60
Rwanda 60 15 70
Tanzania 75 35 40
Kenya 70 40 35
Burundi 40 15 65

Solution

Exports GDP Imports


Country
(1000 $) (1000 $) (1000 $) Total
Uganda 45 20 60 125
Rwanda 60 15 70 145
Tanzania 75 35 40 150
Kenya 70 40 35 145
Burundi 40 15 65 120

15
Figure showing a compound bar graph of exports, imports and GDP per country

Multiple bar graph

ii. Pie chart - A pie chart is used to display a set of categorical data. It is a circle, which is
divided into segments. Each segment represents a particular category. The area of
each segment is proportional to the number of cases in that category.

16
Example. The table below shows the results from the opinion poll that was conducted at
MUBS campus from 2500 students, to see their views and opinions about the presidential
candidates in Uganda.

Candidates no of voters (yes)


Mayambala 98
Besigye 1041
Katumba 112
Karembe 201
Mao 741
not sure 44

Solution

Candidates no of voters (yes) Percent (%)


Mayambala 98 =(98/2237)*100 4
Besigye 1041 =(1041/2237)*100 47
Katumba 112 =(112/2237)*100 5
Karembe 201 =(201/2237)*100 9
Mao 741 =(741/2237)*100 33
not sure 44 =(44/2237)*100 2
Total 2237 100

A pie-chart showing the opinion poll results from the mubs students

17
EXERCISE

a) Construct an inclusive frequency distribution table for the data below using a class
interval of 10-19, 20-29, ………
b) You are required to represent the data in (a) above on;

i. Histogram and
ii. Frequency polygon
iii. Cumulative frequency curve

78, 58, 48, 13, 70, 66, 55, 45, 53, 24,
34, 20, 49, 58, 68, 77, 66, 55, 43, 17,
44, 38, 58, 89, 57, 56, 44, 34, 38, 88,
66, 54, 59, 87, 76, 26, 63, 61, 62, 56,
53, 45, 75, 64, 33, 68, 87, 56, 78, 18

18

You might also like