You are on page 1of 54

Advanced Business Research Methods (MBA 522)

Chapter Six
Processing, Analyzing and Interpretation of Data
By: Andinet A. (Assist. Prof.)

1
Chapter objectives
• Identify the main issues that you need to
consider when preparing data for analysis;

• Able how to organize and analyze data using


various statistical tools and techniques.

2
5.1. Introduction
Flow diagram of data analysis and interpretation

Data Analysis
Collection
Data

Interpretation
of results

Data processing:
getting data ready Feel for data
for analysis (mean, standard
• Editing data deviations, Research
• Handling blank frequency
Discussion
question
spaces distributions,
• Coding data
answered
correlations),
• Categorizing data goodness of
• Creating data file data, hypotheses
programming testing
3
5.2. Data Processing
Editing data: is the process of checking and adjusting the
data for omissions, legibility, and consistency.
 It detects errors and omissions, corrects them when possible, and
certifies that minimum data quality standards are achieved.

 The editing purpose is to assure that data are:


1. accurate,
2. consistent with other information,
3. uniformly entered,
4. complete, and
5. arranged to simplify coding and tabulation.

4
Data processing cont’d ...

Especially for responses to open-ended questions of interviews and


questionnaires, or unstructured observations
 It should be done at the same day that the data are collected so
that the respondents may be contacted for any further
information and clarification, as needed.
 The edited data should be identifiable through the use of
different color pencil or ink so that the original information is still
available in case of further doubts later on.
 Whenever possible, it would be better to follow up with the
respondent and get the correct data while editing.
 If editing task is not appropriately done, the validity and
reliability of the study could thus be impaired.
5
Data processing-cont’d
Handling blank responses:
• Not all respondents answer every item in the questionnaire.
• Answers may have been left blank because respondent:
 Did not understand the question,
 Did not know the answer,
 Was unwilling to answer,
 Was simply indifferent to responding to entire questionnaire
• If a substantial number of questions-say, 25 percent of the items
in the questionnaire- have been left unanswered, it may be
advisable to throw the questionnaire and not include in the data
set for analysis.
 Important to mention the number of returned but unused
responses due to excessive missing data in the final report
6
Data processing-cont’d
• If, however, only two or three items are left blank in a
questionnaire with, say, 30 or more items, a decision must
be made about how these blank responses are to be
handled.
 Assign the midpoint in the scale for an interval-scaled item.

 Ignore the blank responses when the analyses are done (this, of
course, will reduce the sample size whenever that variable is
involved), the best way of handling missing items in case of sample
size is large.
 Assign to the item the mean value of the responses of all those
who have responded to that particular item.
• Treat the ‘don’t know’ responses as that of missing
items
7
Data processing- Cont’d
• The next step is to code the responses using numerical codes (coding at the time of
designing the questionnaire or after the data collection).
• Coding: the process of identifying and classifying each answer with a numerical
score or other character symbol.
• The responses to demographic variables can be coded as follows:
1. Age 2. Education 3. Job level 4. Sex 5. Work 6. Employment
(years) shift status

[1] under 25 [1] high school [1] manager [1] M [1] first shift [1] part time
[2] 25-35 [2] Diploma [2] supervisor [2] F [2] second [2] full time
[3] 36-45 [3] Bachelor’s [3] Clerk [3] third
degree
[4] 46-55 [4] Master’s [4] Secretary
degree
[5] over 55 [5] doctoral [5] Technician
degree
[6] Other [6] other
(specify] (specify)
8
Data processing- Cont’d
Example: The following questions are used to measure Involvement and satisfaction variables
• To what extent would you agree with the following statements, on the scale of 1 to 7, 1 denoting very
low agreement, and 7 denoting very high agreement?
1 2 3 4 5 6 7
6. The major happiness of my life comes from my job
7. Times at work flies by quickly
8. I live, eat, and breathe my job
9. My work is fascinating
10. My work gives me a sense of accomplishment
11. My supervisor praises good work
12. The opportunities for advancement are very good here
13. My coworkers are very stimulating
14. People can live comfortable with their pay in this organization
15. I get a lot of cooperation at the workplace
16. My supervisor is not very capable
17. Most things in life are more important than work
18. Working here is a drag
19. The promotion policies here are very unfair
20. My pay is barely adequate to take care of my expenses
21. My work is not the most important part of my life 9
Data processing- Cont’d
• The purpose of coding responses from open-ended questions is to
reduce the large number of individual responses to a few general
categories of answers that can be assigned numerical score.

Note: The usual reason for using open-ended questions is that the research
has no clear hypothesis regarding the answers.

Categorization:
• It involves categorizing the variables such that the several
items measuring a concept are all grouped together.
• Responses to some of the negatively worded questions have
to be reversed so that all answers are in the same direction.
 This can be done on the computer through a RECODE.

10
Data processing- Cont’d

Entering data:
• Raw data can be entered (manually or using scanner sheet-a
machine-readable form) through any soft-ware program. For
instance, the SPSS Data Editor:
 It looks like a spread sheet
 Can enter, edit, view of the contents of the data file
 Each row of the editor represents a case, and each column
represents a variable
 All missing values will appear with a period (dot) in the cell.

Data cleaning: check for wrongly coded variables- a check to make


sure that all codes are legitimate. For example, if sex is coded 1=
male and 2=female and a 3 code is found, it is obvious that a
mistake has been made that requires an adjustment.

11
Data processing- Cont’d
. Data summary code sheet

Responde Total
nts
Part I Part II
Living Fulfilling
Sex together needs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 0 0 1 4 5 5 5 4 4 4 5 4 4 3 4 5 4 3 63

2 1 1 2 2 2 3 2 1 1 2 2 2 2 3 2 2 2 3 31

228

229

12
5.3. Data Presentation
• Data in raw form are usually not easy to use for decision making

• Some type of organization is needed


• Table
• Graph
• Data presentation: The process of transforming a mass of raw data into
tables and charts-as a part of making sense of the data.
• refers to the preparation of data in a manner that could be used by general audience

Tables:
 They can be used with just about all types of numerical data.

Graphical:
• The type of graph to use depends on the variable being summarized
13
Data presentation:
The Frequency Distribution Table
Summarize data by category

Example: Hospital Patients by Unit


Hospital Unit Number of Patients
Cardiac Care 1,052
Emergency 2,245
Intensive Care 340
Maternity 552
Surgery 4,630

(Variables are
categorical)
14
Graphical
Presentation of Data
On the basis of types of variables:
Categorical Numerical
Variables Variables

• Frequency distribution • Line chart


• Bar chart • Frequency distribution
• Pie chart • Histogram
• Scatter plot

15
Tables and Graphs for Categorical
Variables

Categorical Data

Tabulating Data Graphing Data

Frequency
Distribution Bar Chart Pie Chart
Table

16
Data presentation-Cont’d
• Diagrammatic representation of data (bar charts, pie charts,
histogram, line graphs, frequency polygon)
• Bar charts:
 A bar chart is a graph that shows the frequency distribution of a variable.
 They can be used with nominal and with discrete data
 Bars should be of equal width, with the height of the bars representing the
frequency (height of the bar is proportional to frequency) or the amount for each
separate category.
 For each category a vertical bar is drawn
 There is a gap between each bar.

Types of bar charts: simple bar chart, multiple bar chart, component bar chart
 A simple bar chart: shows the total of each category
 A multiple bar chart is used when you are interested in changes in the
components but the totals are of no interest
 A component bar chart: this helps to compare totals and seeing how the totals
are made up a component bar chart.
17
Simple Bar Chart Example
Hospital Number
Unit of Patients
Cardiac Care 1,052
Emergency 2,245 Hospital Patients by Unit
Intensive Care 340 5000

Maternity 552 4000


Surgery 4,630 patients per year
Number of

3000

2000

1000

0
Cardiac

Maternity
Emergency

Intensive

Surgery
Care

Care
18
A multiple bar Chart Example
• Sales by quarter for three sales territories:
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East 20.4 27.4 59 20.4
W est 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9

60

50

40
East
30 West
North
20

10

0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
19
A Component bar Chart Example
• Sales by quarter for three sales territories:
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East 20.4 27.4 59 20.4
W est 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9

20
Data Presentation cont’d ...

Pie chart:
• Presents data as segments of the whole pie.
• Each category is represented by a segment of a
circle.
• The segments are presented in terms of
percentages
• The size of each segment reflects the
frequency of that category and can be
represented as an angle.

21
Pie Chart Example

Hospital Number % of
Unit of Patients Total
Hospital Patients by Unit
Cardiac Care 1,052 11.93
Emergency 2,245 25.46 Cardiac Care
12%
Intensive Care 340 3.86
Maternity 552 6.26
Surgery 4,630 52.50

Emergency
Surgery 25%
53%

Intensive Care
(Percentages are 4%
Maternity
rounded to the
nearest percent) 6%

22
Graphs to Describe
Numerical Variables

Numerical Data

Frequency Distributions and


Cumulative Distributions

Histogram

23
Graphs for Time-Series Data
• A line chart (time-series plot) is used to show
the values of a variable over time

• Time is measured on the horizontal axis

• The variable of interest is measured on the


vertical axis

24
Line Chart Example
Magazine Subscriptions by Year

350

300
Thousands of subscribers

250

200

150

100

50

0
1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006
25
Data presentation-Cont’d

 A frequency distribution (or frequency table) is a set of data which records


the number of times a particular vlaue of a variable, or range of values of a
variable, occurs

 Example:

Amount Deposited (Rs.) Frequency

Less than 50,000 6700


50,000 – 100,000 1240
Above 100,000 375
Total 8,315

26
Histogram
• A graph of the data in a frequency distribution is called a
histogram
• The interval endpoints are shown on the horizontal axis
• the vertical axis is either frequency, relative frequency, or
percentage
• Bars of the appropriate heights are used to represent the
number of observations within each class

• Differences from bar chart:


 The horizontal axis is a continuous scale, just like a normal graph-there
should not be gaps between bars
 It is the area of the bars that is being compared, not the heights

27
Histogram Example

Interval Frequency
His togram : Daily High Te m pe rature
10 but less than 20 3
20 but less than 30 6 7 6
30 but less than 40 5
40 but less than 50 4
6 5
50 but less than 60 2 5 4
Frequency

4 3
3 2
2
go cy
ly en
fre aw

1 0 0
n
po qu
Dr

(No gaps 0
between 0 0 1010 20 20 30 30 40 40 50 50 6060 70
bars) Temperature in Degrees
28
Data presentation-Cont’d

Essentials for the presentation of tables and charts :


 present enough information without ‘drowning’ the reader with
information overload;
o a title;
o Information about the units being represented in the columns of the
table or the axes of the chart (sometimes this is placed by the axes,
sometimes by the bars or lines, sometimes in a legend);
Note:
 the horizontal axis is the ‘x axis’ and is used for the independent
variable;
 The vertical axis is the ‘y axis and is used for the dependent variable.
o The source of the data, if they were originally produced elsewhere
 Help the reader to interpret the table or chart through visual clues and
appropriate presentation;
 Use an appropriate type of table or chart for the purpose at hand.
29
Data presentation-Cont’d

Conventions important in constructing visual presentations:

1. Provide a descriptive title for each table. This applies as well


to charts, graphs, and figures.

2. Label variables and variable categories. All rows and columns


should also be fully labeled.

3. The sources of the data should be indicated.

4. All terms open to interpretation should be defined in the


footnote section below table.

30
5.4. Data Analysis: Basic concepts
Some terminologies:
• Data analysis: is the application of logic to understand and
interpret the data that have been collected about a subject. It
involves determining consistent patterns and summarizing the
appropriate details revealed in the investigation.
 Statistical analysis may range from portraying a simple frequency distribution to
very complex multivariate analysis, such as multiple regression.

• Univariate analysis is the examination of the distribution of cases


on only one variable at a time. Example:
Awareness of the No. of respondents Frequency
HIV/Aids
Aware 60 60
unaware 40 40
Total 100 100
31
Data Analysis: Basic concepts cont’d
• Bivariate analysis: an analysis of the relationship between two
variables. Example:
Awareness of the Men Women Total
HIV/Aids
Aware 50 10 60
Unaware 15 25 40
Total 65 35 100

• Multivariate analysis: Statistical techniques involving more than one


variable at once.

 Cross-tabulation (also known as a contingency table): a table or set of tables


which show the number or percent of observations (i.e.’ frequencies) existing at
every combination of the levels of two or more variables.
 Contingency table- values of the dependent variable are contingent on values of
the independent variable.

32
Data Analysis: Basic concepts cont’d
• Tabulation: refers to the orderly arrangement of data in a table or other
summary format. Counting the number of responses to a question and putting
them in a frequency distribution is a simple, or marginal tabulations.

Example: Do you support parliamentary system?, n=450


Responses Frequency
Yes 330
No 120
Total 450

• Tallying: tabulation process by hand, Tabulation can be done by computer.

• Cross-tabulation (also known as a contingency table): is a technique


for comparing two classification variables.
 Most statistical analysis software (e.g., Chi square) allows you to add totals, and row and
column percentage when designing your table 33
Data analysis: Basic concepts-Cont’d
Example: ‘Do you support the proposition that men and women should be
treated equally in all regards?, n= 450
Yes No Row Total
225
Men 150 75
Women 180 45 225
Column Total

330 120 450


• The frequency counts for the question ‘-----’ are represented as column totals
• The total number of men and women in the sample are presented as row totals.
• These row and column totals are often called marginals, because they appear in the
table’s margin.
• In the above table, there are four cells, each representing a specific combination of
the two variables. E.g., the cell representing women who said they do not support the
proposition has a frequency count of 45.
• Any cross-tabulation table may be classified according to the number of rows and
columns (R by C). In the above case the table is referred to as a “2 X 2” because it has
two rows and two columns. 34
Data analysis: Basic concepts-Cont’d
• Descriptive analysis refers to the transformation of raw data in the form that will
make them easy to understand and interpret.
• Used to summarize and describe the data on cases included in a study.
• The calculation of averages, frequency distributions, and percentage distributions is the
common form of summarizing data- one form of analysis
Type of Type of descriptive
measurement analysis
Frequency table, proportion
Two categories (percentage), mode
Nominal
Frequency table, category
More than two proportion (percentage), mode
categories
Rank order, median + the above
Ordinal

Arithmetic mean + the above


Interval
Index numbers, Geometric mean and
Ratio + the above
35
Descriptive analysis-Cont’d
• Descriptive statistics gives numerical and graphic procedures to
summarize a collection of data in a clear and understandable
way.
• Descriptive statistics are a way of summarizing the complexity of the data
with a single number.
• The simplest method of analysis
• It either analyse the responses in percentages or will contain actual
number
• Concerned with the development of certain indices from the raw data

• The important statistical measures that are used to summarise


the survey research data are:
A. Measures of central tendency or statistical averages
B. Measures of dispersion
C. Measures of asymmetry (skewness)
D. Measures of relationship
36
Descriptive analysis-Cont’d
A. Measures of central tendency or statistical average
 Also known as known as statistical averages 
 The purpose of measures of central tendencyis to determine the average value in a set of
values.
 Mean, median and mode are the most popular averages
Mean
• Also known as arithmetic average
• The most common measure of central tendency
• The average of all values in a set of data
• Calculated by adding all the values in the group and then dividing by the number of values
• Helps to summarising the essential features and enables comparison
Median
• Is the value of the middle item of series when it is arranged in ascending or descending
order
• It divides the series into two half
• It is positional average
Mode
•  Mode is the frequently occurring value in a series - maximum frequency
– The mode in a distribution is that item around which there is maximum
concentration 37
Example: Measures of Central Tendency
(Arithmetic Mean)

Reflection: Calculate the Arithmetic Mean


Branch Revenue
(in million birr)
1 50
2 150
3 40
4 60

38
Example: Measures of Central Tendency
(Arithmetic Mean)
 The arithmetic mean is the average of all the values under
consideration
Branch Revenue

1 50,000,000
2 150,000,000
3 40,000,000
4 60,000,000
Total = 300,000,000
Arithmetic Mean = 300,000,000 / 4 = 75,000,000
39
Example: Measures of Central Tendency
(Median)
Reflection: Calculate/identify Median

Salesperson Number of Sales


Calls
1 4
2 3
3 2
4 5
5 3
6 3
7 1
8 5

40
Example: Measures of Central Tendency
(Median)
 The Median is the midpoint of the distribution of values under
consideration

Salesperson Number of Sales


Calls
1 4
2 3
Median = 3
3 2
4 5
5 3 1 2 3 3 3 4 5 5
6 3
7 1
8 5

41
Example: Measures of Central Tendency
(Mode)
Reflection: identify Mode

Salesperson Number of Sales


Calls
1 4
2 3
3 2
4 5
5 3
6 3
7 1
8 5

42
Example: Measures of Central Tendency
(Mode)
 The Mode is the value that occurs most frequently in the distribution
of values under consideration

Salesperson Number of Sales


Calls
1 4
2 3
Mode = 3
3 2
4 5
5 3
6 3
7 1
8 5

43
Descriptive analysis-Cont’d

B. Measures of dispersion  
• An average can represent a series only as best as a single figure can, but it certainly cannot
reveal the entire story of any phenomenon under study
• It shows the degree by which numerical data tend to spread around an average value/mean .
• Averages do not tell anything about the scatterness of observations within the distribution.
• In order to measure the degree of scatter, the statistical device called measures of dispersion
are calculated

Important measures of dispersion are:


 Range = highest value – lowest value
 It shows the difference b/n the highest value and the lowest value, hence it is the
weakest measure of dispersion
 Mean deviation
 First calculate the mean, then deduct the mean from each value in the group and divide
the result by the number of values 
 Variance
 First calculate the mean, then deduct the mean from each value in the group square the
result and divide the result by the number of values 
 Standard deviation
 The most reliable measurement of the degree to which the data is spread around the
mean
 Putting the variance in square root 44
Justifications for ‘Dispersion’ measures
• Averages are representatives of a frequency distribution but they fail to give a
complete picture about the distribution. Suppose that we have the distribution
of the yields (Kg per plot) of two wheat varieties from 5 plots each. The
distribution may be as follows:
Variety I 45 42 42 41 40

Variety II 54 48 42 33 30

• It can be seen that the mean yield for both varieties is 42 Kg. But we cannot say
that the performance of the two varieties are the same. There is greater
uniformity of yields in the first variety whereas there is more variability in the
yields of the second variety. The first variety may be preferred since it is more
consistent in yield performance
• From the above example, it is obvious that a measure of central tendency alone
is not sufficient to describe a frequency distribution.
• In addition to it, we should have a measure of scatterness of observations
• The scatterness or variation of observations from their average is called the
dispersion.
45
Descriptive analysis-Cont’d
C. Measures of asymmetry (skewness)-it measures the shape of
distribution
• The shape of the distribution is said to be symmetric if the observations are
balanced, or evenly distributed, about the center.

46
Descriptive analysis: Measures of asymmetry cont’d
• The shape of the distribution is said to be skewed if the
observations are not symmetrically distributed around the
center.

Positively Skewed Distribution

A positively skewed distribution (skewed 12

to the right) has a tail that extends to the


10

Frequency
right in the direction of positive values. 6

0
1 2 3 4 5 6 7 8 9

A negatively skewed distribution


(skewed to the left) has a tail that
extends to the left in the direction of
negative values.

47
48
Descriptive analysis-cont’d
D. Measures of relationship:
• Need to determine whether there is a relationship
between variables

49
Correlation (cont’d)
• Magnitude
• Direction

50
Agreement Level Classification
Agreement level
(Range) Meaning
• 1.00-1.80 Strongly disagree
• 1.81-2.60 Disagree
• 2.61-3.40 Neutral
• 3.41-4.20 Agree
• 4.21-5.00 Strongly agree
Source: Best (1977: 174)

51
Data Analysis-Cont’d: Inferential analysis
• Inferential analysis:
 Provides procedures to draw inferences about a population from a sample.
 It is concerned with the various tests of significance for testing hypotheses in order to
determine with what validity data can be said to indicate some conclusion or
conclusions.
 It is concerned with the estimation of population values.
 It is mainly on the basis of inferential analysis that the task of interpretation is
performed.

Examples:
• The demand for a new Product X based on a sample conducted in Region Y
 The general election result based on a representative survey of voters in electoral district Z

• Used for drawing conclusion about the population from a sample:


– Estimation
• Estimate true value of the parameter from a sample
– Hypothesis testing
• Determine if there is a difference in a parameter value for two groups. 52
Summary of data analysis
Levels of quantitative analysis
 Descriptive statistics: variable frequencies, averages, ranges
o For nominal or ordinal data: proportions, percentages, ratios
o For interval or ratio data:
• Measures of central tendency: mean ( total sum of values divided by the number
of cases), median ( the value of the middle case), mode (the most frequently
occurring value)
• Measures of dispersion: range (the difference between the highest and lowest
values, standard deviation( the square root of the mean of the squared
deviations from the mean)
 Inferential statistics: assessing the significance of your data and results
 Simple inter-relationships: cross-tabulation or correlation between
two variables
 Multivariate analysis: studying the linkages between more than two
variables
53
THANK YOU!
Kindly exercise application of SPSS for data
processing and analysis

54

You might also like