Professional Documents
Culture Documents
Chap 6 Advanced Business RMs
Chap 6 Advanced Business RMs
Chapter Six
Processing, Analyzing and Interpretation of Data
By: Andinet A. (Assist. Prof.)
1
Chapter objectives
• Identify the main issues that you need to
consider when preparing data for analysis;
2
5.1. Introduction
Flow diagram of data analysis and interpretation
Data Analysis
Collection
Data
Interpretation
of results
Data processing:
getting data ready Feel for data
for analysis (mean, standard
• Editing data deviations, Research
• Handling blank frequency
Discussion
question
spaces distributions,
• Coding data
answered
correlations),
• Categorizing data goodness of
• Creating data file data, hypotheses
programming testing
3
5.2. Data Processing
Editing data: is the process of checking and adjusting the
data for omissions, legibility, and consistency.
It detects errors and omissions, corrects them when possible, and
certifies that minimum data quality standards are achieved.
4
Data processing cont’d ...
Ignore the blank responses when the analyses are done (this, of
course, will reduce the sample size whenever that variable is
involved), the best way of handling missing items in case of sample
size is large.
Assign to the item the mean value of the responses of all those
who have responded to that particular item.
• Treat the ‘don’t know’ responses as that of missing
items
7
Data processing- Cont’d
• The next step is to code the responses using numerical codes (coding at the time of
designing the questionnaire or after the data collection).
• Coding: the process of identifying and classifying each answer with a numerical
score or other character symbol.
• The responses to demographic variables can be coded as follows:
1. Age 2. Education 3. Job level 4. Sex 5. Work 6. Employment
(years) shift status
[1] under 25 [1] high school [1] manager [1] M [1] first shift [1] part time
[2] 25-35 [2] Diploma [2] supervisor [2] F [2] second [2] full time
[3] 36-45 [3] Bachelor’s [3] Clerk [3] third
degree
[4] 46-55 [4] Master’s [4] Secretary
degree
[5] over 55 [5] doctoral [5] Technician
degree
[6] Other [6] other
(specify] (specify)
8
Data processing- Cont’d
Example: The following questions are used to measure Involvement and satisfaction variables
• To what extent would you agree with the following statements, on the scale of 1 to 7, 1 denoting very
low agreement, and 7 denoting very high agreement?
1 2 3 4 5 6 7
6. The major happiness of my life comes from my job
7. Times at work flies by quickly
8. I live, eat, and breathe my job
9. My work is fascinating
10. My work gives me a sense of accomplishment
11. My supervisor praises good work
12. The opportunities for advancement are very good here
13. My coworkers are very stimulating
14. People can live comfortable with their pay in this organization
15. I get a lot of cooperation at the workplace
16. My supervisor is not very capable
17. Most things in life are more important than work
18. Working here is a drag
19. The promotion policies here are very unfair
20. My pay is barely adequate to take care of my expenses
21. My work is not the most important part of my life 9
Data processing- Cont’d
• The purpose of coding responses from open-ended questions is to
reduce the large number of individual responses to a few general
categories of answers that can be assigned numerical score.
Note: The usual reason for using open-ended questions is that the research
has no clear hypothesis regarding the answers.
Categorization:
• It involves categorizing the variables such that the several
items measuring a concept are all grouped together.
• Responses to some of the negatively worded questions have
to be reversed so that all answers are in the same direction.
This can be done on the computer through a RECODE.
10
Data processing- Cont’d
Entering data:
• Raw data can be entered (manually or using scanner sheet-a
machine-readable form) through any soft-ware program. For
instance, the SPSS Data Editor:
It looks like a spread sheet
Can enter, edit, view of the contents of the data file
Each row of the editor represents a case, and each column
represents a variable
All missing values will appear with a period (dot) in the cell.
11
Data processing- Cont’d
. Data summary code sheet
Responde Total
nts
Part I Part II
Living Fulfilling
Sex together needs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 0 0 1 4 5 5 5 4 4 4 5 4 4 3 4 5 4 3 63
2 1 1 2 2 2 3 2 1 1 2 2 2 2 3 2 2 2 3 31
228
229
12
5.3. Data Presentation
• Data in raw form are usually not easy to use for decision making
Tables:
They can be used with just about all types of numerical data.
Graphical:
• The type of graph to use depends on the variable being summarized
13
Data presentation:
The Frequency Distribution Table
Summarize data by category
(Variables are
categorical)
14
Graphical
Presentation of Data
On the basis of types of variables:
Categorical Numerical
Variables Variables
15
Tables and Graphs for Categorical
Variables
Categorical Data
Frequency
Distribution Bar Chart Pie Chart
Table
16
Data presentation-Cont’d
• Diagrammatic representation of data (bar charts, pie charts,
histogram, line graphs, frequency polygon)
• Bar charts:
A bar chart is a graph that shows the frequency distribution of a variable.
They can be used with nominal and with discrete data
Bars should be of equal width, with the height of the bars representing the
frequency (height of the bar is proportional to frequency) or the amount for each
separate category.
For each category a vertical bar is drawn
There is a gap between each bar.
Types of bar charts: simple bar chart, multiple bar chart, component bar chart
A simple bar chart: shows the total of each category
A multiple bar chart is used when you are interested in changes in the
components but the totals are of no interest
A component bar chart: this helps to compare totals and seeing how the totals
are made up a component bar chart.
17
Simple Bar Chart Example
Hospital Number
Unit of Patients
Cardiac Care 1,052
Emergency 2,245 Hospital Patients by Unit
Intensive Care 340 5000
3000
2000
1000
0
Cardiac
Maternity
Emergency
Intensive
Surgery
Care
Care
18
A multiple bar Chart Example
• Sales by quarter for three sales territories:
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East 20.4 27.4 59 20.4
W est 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9
60
50
40
East
30 West
North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
19
A Component bar Chart Example
• Sales by quarter for three sales territories:
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East 20.4 27.4 59 20.4
W est 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9
20
Data Presentation cont’d ...
Pie chart:
• Presents data as segments of the whole pie.
• Each category is represented by a segment of a
circle.
• The segments are presented in terms of
percentages
• The size of each segment reflects the
frequency of that category and can be
represented as an angle.
21
Pie Chart Example
Hospital Number % of
Unit of Patients Total
Hospital Patients by Unit
Cardiac Care 1,052 11.93
Emergency 2,245 25.46 Cardiac Care
12%
Intensive Care 340 3.86
Maternity 552 6.26
Surgery 4,630 52.50
Emergency
Surgery 25%
53%
Intensive Care
(Percentages are 4%
Maternity
rounded to the
nearest percent) 6%
22
Graphs to Describe
Numerical Variables
Numerical Data
Histogram
23
Graphs for Time-Series Data
• A line chart (time-series plot) is used to show
the values of a variable over time
24
Line Chart Example
Magazine Subscriptions by Year
350
300
Thousands of subscribers
250
200
150
100
50
0
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
25
Data presentation-Cont’d
Example:
26
Histogram
• A graph of the data in a frequency distribution is called a
histogram
• The interval endpoints are shown on the horizontal axis
• the vertical axis is either frequency, relative frequency, or
percentage
• Bars of the appropriate heights are used to represent the
number of observations within each class
27
Histogram Example
Interval Frequency
His togram : Daily High Te m pe rature
10 but less than 20 3
20 but less than 30 6 7 6
30 but less than 40 5
40 but less than 50 4
6 5
50 but less than 60 2 5 4
Frequency
4 3
3 2
2
go cy
ly en
fre aw
1 0 0
n
po qu
Dr
(No gaps 0
between 0 0 1010 20 20 30 30 40 40 50 50 6060 70
bars) Temperature in Degrees
28
Data presentation-Cont’d
30
5.4. Data Analysis: Basic concepts
Some terminologies:
• Data analysis: is the application of logic to understand and
interpret the data that have been collected about a subject. It
involves determining consistent patterns and summarizing the
appropriate details revealed in the investigation.
Statistical analysis may range from portraying a simple frequency distribution to
very complex multivariate analysis, such as multiple regression.
32
Data Analysis: Basic concepts cont’d
• Tabulation: refers to the orderly arrangement of data in a table or other
summary format. Counting the number of responses to a question and putting
them in a frequency distribution is a simple, or marginal tabulations.
38
Example: Measures of Central Tendency
(Arithmetic Mean)
The arithmetic mean is the average of all the values under
consideration
Branch Revenue
1 50,000,000
2 150,000,000
3 40,000,000
4 60,000,000
Total = 300,000,000
Arithmetic Mean = 300,000,000 / 4 = 75,000,000
39
Example: Measures of Central Tendency
(Median)
Reflection: Calculate/identify Median
40
Example: Measures of Central Tendency
(Median)
The Median is the midpoint of the distribution of values under
consideration
41
Example: Measures of Central Tendency
(Mode)
Reflection: identify Mode
42
Example: Measures of Central Tendency
(Mode)
The Mode is the value that occurs most frequently in the distribution
of values under consideration
43
Descriptive analysis-Cont’d
B. Measures of dispersion
• An average can represent a series only as best as a single figure can, but it certainly cannot
reveal the entire story of any phenomenon under study
• It shows the degree by which numerical data tend to spread around an average value/mean .
• Averages do not tell anything about the scatterness of observations within the distribution.
• In order to measure the degree of scatter, the statistical device called measures of dispersion
are calculated
Variety II 54 48 42 33 30
• It can be seen that the mean yield for both varieties is 42 Kg. But we cannot say
that the performance of the two varieties are the same. There is greater
uniformity of yields in the first variety whereas there is more variability in the
yields of the second variety. The first variety may be preferred since it is more
consistent in yield performance
• From the above example, it is obvious that a measure of central tendency alone
is not sufficient to describe a frequency distribution.
• In addition to it, we should have a measure of scatterness of observations
• The scatterness or variation of observations from their average is called the
dispersion.
45
Descriptive analysis-Cont’d
C. Measures of asymmetry (skewness)-it measures the shape of
distribution
• The shape of the distribution is said to be symmetric if the observations are
balanced, or evenly distributed, about the center.
46
Descriptive analysis: Measures of asymmetry cont’d
• The shape of the distribution is said to be skewed if the
observations are not symmetrically distributed around the
center.
Frequency
right in the direction of positive values. 6
0
1 2 3 4 5 6 7 8 9
47
48
Descriptive analysis-cont’d
D. Measures of relationship:
• Need to determine whether there is a relationship
between variables
49
Correlation (cont’d)
• Magnitude
• Direction
50
Agreement Level Classification
Agreement level
(Range) Meaning
• 1.00-1.80 Strongly disagree
• 1.81-2.60 Disagree
• 2.61-3.40 Neutral
• 3.41-4.20 Agree
• 4.21-5.00 Strongly agree
Source: Best (1977: 174)
51
Data Analysis-Cont’d: Inferential analysis
• Inferential analysis:
Provides procedures to draw inferences about a population from a sample.
It is concerned with the various tests of significance for testing hypotheses in order to
determine with what validity data can be said to indicate some conclusion or
conclusions.
It is concerned with the estimation of population values.
It is mainly on the basis of inferential analysis that the task of interpretation is
performed.
Examples:
• The demand for a new Product X based on a sample conducted in Region Y
The general election result based on a representative survey of voters in electoral district Z
54