You are on page 1of 34

IM 205:BUSINESS RESEARCH

METHODS

DATA ANALYSIS
AND
INTERPRETATION OF FINDINGS
DATA ANALYSIS AND
INTERPRETATION OF FINDINGS

• Data analysis is a process of gathering, modeling, and


transforming data with the goal of highlighting
useful information, suggesting conclusions, and
supporting decision making.

• Is the process of organizing and summarizing the


collected data into a meaningful form for
interpretation purposes.
Data analysis involves:-

• Data editing, coding, data entry and computation


processes.
• Summarizing the open-ended questions for
interpretation
• Determination of descriptive summaries such as
measures of central tendency ,distributions and
general trends for quantitative data.
• It also involves test of hypotheses that are intended to
answer the research questions
• Data obtained from a study may either be in qualitative
or numerical(quantitative )form, that is, in the form of
numbers.
• If are in a qualitative form, then we can still carry out a
qualitative analysis based on the experiences of the
individual participants.
• If are in numerical form, the analysis typically starts by
working out some descriptive statistics to summarize the
pattern of findings. These descriptive statistics include
measures of central tendency within a sample (e.g.
mean) and measures of the spread of scores within a
sample (e.g. range),distributions and general trends.
• Another useful way of summarizing the findings is by
means of graphs and figures.
Data cleaning
• Data cleaning/cleansing is the act of identifying and correcting (or
removing) corrupt or inaccurate records from a set of records ,a table or a
database.

• It is identifying incomplete, incorrect, inaccurate, irrelevant elements of the


collected data and then replacing, modifying or deleting this unwanted data.

• The inconsistencies detected or removed may have been originally caused by


different data definitions of similar entities in different stores, may have been
caused by user entry errors, or may have been corrupted in transmission or
storage.

• During data cleaning, erroneous entries are inspected and corrected where
possible. In some cases, it is easy to substitute suspect data with the correct
values. However, when it is unclear what caused the erroneous data or what
should be used to replace it, it is important that no subjective decisions are
made to ensure the quality of data.
RELIABILITY AND VALIDITY OF RESEARCH FINDINGS

• Before engaging the gathered data into a processes to


transform them into meaningful and useful information,
it has to be verified to see it quality. Thus the reliability
and validity of the research data has to be tested.

• Reliability and validity of data are the outcomes of


reliable and valid measurement instruments

• The poor quality of data impairs reliability and validity


of research findings and conclusions
Reliability of data

• Testing the reliability of data is based on the fact of


an instrument producing same results when
successfully used by different researchers

“Does the index measure what it sets out to


measure consistently and in a stable manner?”
Forms of reliability

• Internal reliability:-
Considers the extent to which a measurement
process is giving consistent and stable results when
applied to similar situation

• External reliability:-
Considers the fact that: Is the measurement stable
when replicated to other research setting?
Validity of data

• Testing the validity of data is based on the fact of the


researcher to measure what he/she set out to measure.
“Is the index describing what we think it is
describing?”
For example: A darts player intends to hit a bull. To what
extent does he/she hit it?

• The problem with validity arises when measurement


process/instruments results in inaccurate data
Forms of validity
• Content /logical face:-Extent to which professionals
agree that the scale logically appears to measure the
concepts
• Predictive:-Is there a correlation between what the
index predicts and the behaviour?
• Concurrent:-Is there a correlation between what the
index says and what the behaviour is?
• Convergent:-Is there a correlation between the index
and other network of indices measuring the same
concept?
• Sensitivity:-Ability to accurately measure variability
in the stimuli or responses
ANALYZING QUALITATIVE DATA

• Qualitative data is that kind of data that is non numerical.

• Mostly, analysis of qualitative data hasn’t been established by


literatures

• Also there are some software packages such as ATLAS.ti, NVivo,


Ethnograph, e.t.c for such analysis.

• Thus analysis of such kind of data mostly depend on the strength


of argument and art of the researcher. However, there are some
techniques recommended for analysis of qualitative data.
Techniques for analysis of qualitative data
a)Explanation building:
Attempting to build relationship and implications from
observed phenomenon
b)Use comparisons:
These may be achieved by observing patterns of data
specifically when analyzing case studies
c)Theory fitting:
The researcher tries to fit the observed behaviour into an existing
theory to be able to draw some conclusions
d)Analyzing different scenarios:
A researcher analyses different scenarios developed by other
researchers from which he/she comes with
opinions, suggestions or recommendations
QUANTITATIVE DATA ANALYSIS
• Quantitative data :Are quantifiable variables measured
or identified on a numerical scale(e.g50kgs,30ft etc).
Analyzed using statistical methods. Results displayed
using tables, charts, histograms and graphs.
• Analyzed using complex statistical modeling using
computer software such as Ms Excel, Lotus
123,SPSS,SuperCalc, Risrel, Statview, Minitab etc.
• Analysis use a variety of methods including formulating
simple tables or diagrams for establishing statistical
relationships
• Data prepared with a quantitative analysis mind and a
researcher need be aware of know how and when to use
different statistical techniques.
Data presentation and exploring of variables

• Presentation of quantitative data for presentation of


variables(e.g. workers income, company sales, age
distribution, levels of production etc.) includes use of
a variety of methods. These are categorized as:-

 Descriptive statistics methods


 Statistical inference methods
DESCRIPTIVE STATISTICS METHODS
Aim at computing mode, median, range, proportions,
mean, variance, trend behaviours and other patterns of
variables. Includes:-
• Preparing frequency distributions and tables for all/each
variable considered
• Preparing graphs, charts or histograms for each variable
• Preparing cross-tabulations/multivariate tabulations for
theoretically related variables from basic definitions
• Determining a relationship and dependence between
dependent and independent variables implied in the
research problem
A: EXPLORING AND PRESENTING DATA FOR
SINGLE VARIABLES
i)Showing specific variables:
• Use a frequency distribution table for the variables
(Refer example on Frequency distribution of the lengths of leaves).

• The frequency distribution can further be presented in a


Histogram and a frequency Polygon
(Refer example on Histogram and Frequency Polygon of the lengths of leaves).

ii)Comparing observations of one variable aiming at showing


high and lower values
• In such a situation, use of bar charts or histograms can be
appropriate
(Refer example on Steel production by a company for a period of nine
years)
iii)Organizing data to depict the existing trend over
time
• In such a situation, use of a line graph would be the
appropriate method
(Refer example on Line trend for production levels at XYZ
Company)

iv)Showing proportions of a single variable


• In such a situation, use of Pie charts can be
appropriate
(Refer example on Tanzanian Export Income for year 200X)
ANALYSIS OF SINGLE VARIABLE DATA

• In all these forms of single variable cases, summarized


data aims at providing the measures of central tendency
(mean, median, mode),measure of variation(standard
deviation, range) and proportions.
Once these are established they can provide answers to set
questions.
For example :
• With the calculated mean and standard deviation, an
investment can measure its association of interest from the
established mean and once it is known, it can also use its
standard deviation to determine its variation from that
established mean (average).
B: EXPLORING AND PRESENTING DATA FOR
BIVARIATE AND MULTIVARIATE VARIABLES

• Sometimes research needs to compare or relate


variables e.g. sales of different companies, income
over expenditure, birth over death rate, employment
to labour turnover rates e.t.c.

• Techniques used
i)Showing specific values of variables and their
interdependence

• Use of contingency table can be appropriate


(Refer example on a company with three homeowner policies
and three field offices)

ii)Comparing between highest and lowest values of


variables
• In such a situation, use of a multiple bar chart would
be the appropriate method
(Refer example on Yield of Paddy in two areas)
iii)Showing relationship between variables

• Use of a scatter diagram can be appropriate


(Refer example on a company’s sales and expenditure)

iv)Comparing data distributed according to classes


or categories
• In such a situation, use of a multiple frequency
polygon would be the appropriate method
(Refer example on Frequency polygon for comparing student's
performances)
Read more for other techniques for bivariate and multivariate variables data
presentation using tables, graphs and charts

• For comparisons of proportions:


Use the percentage component bar chart

• For comparisons of trends and conjunctions in


data:
Use multiple line graphs

• For comparison of total values of variables:


Use the stacked bar chart
ANALYSIS OF DATA IN BIVARIATE AND OTHER
MULTIVARIATE VARIABLES

• Analysis can be done to determine the


relationship(correlation) between variables

• Analysis can also be done to determine the


interdependence between variables

• In each test, there are different techniques to use


Testing relationship/association between variables

• Can use Spearman’s correlation coefficient (Rsp)


If correlated variables close to +1:Highly positively correlated
If correlated variables are close to -1:Highly negatively correlated
If (Rsp) is Zero: No relationship between variables.

• Can use the Analysis of variance(ANOVA),


Multivariate Analysis of variance (MANOVA),
discriminant analysis, Paired T-test, e.t.c. to determine
the magnitude and size of causal coefficients for
asymmetric relationship between variables
Testing dependency and independency between
variables
• Here there are two variables and one of them is
statistically dependent on the other .A model is
constructed to represent the relationship. The easiest
model is the Least Square Regression Model:
Yi =a+bxi,
Where Y is the dependent variable and X is the
independent variable
• The model can be extended to where there are many
independent variables to a Model:
Yi =a+bxi +czi ,
Use of Chi-Square Goodness of fit Test
• Tests the extent to which the two variables are
independent or not
• The expression is: X2 =∑{(Oi-ei)2/ei
• Applied to probability distributions
• Tests the significance of associations
• Tests homogeneity or significance of population
variance
• (Read the conditions for use of Chi-Square goodness of fit
test)
Example
• The following pattern was observed in a random
sample of 400 consumers in Tanzania concerning
cellular phone card purchases on a regular basis.
Is it fair to say that purchases for the cards of
different values are not evenly distributed?

Cards
regularly $2 $5 $ 10 $ 50
purchased

Number 80 170 90 60
Solution

H0:Card purchases are evenly distributed


H1:Card purchases are not evenly distributed

Card $2 $5 $10 $50


Number 100 100 100 100
From the above data:
X2 =∑(0i – Ei )2 / Ei = (80-100)2 / 100 + (170-100)2 /
100+ (90-100)2 / 100 +(60-100)2 / 100
=400/100 +4900/100+100/100+1600/100
= 60
This value of X2 ≥ 11.34 (the critical value at
three degrees of freedom and α = 0.05)

Thus, at three degrees of freedom and α = 0.05,the


null hypothesis is rejected and we conclude that card
purchases are not evenly distributed. It appears that the
lower valued cards are purchased more often.
STATISTICAL INFERENCE METHODS

• Include parametric and non-parametric statistical


techniques.
• Parametric entails making general conclusions about
quantitative population characteristics or parameters
such as {mean(μ),difference of means (μ1 – μ2),ratio
of variance(σ12 /σ22 ),correlation coefficient etc.}.
• Non parametric techniques are inference methods for
qualitative research.
Parametric statistical techniques
• Parametric statistical inference methods aim at
analyzing deductive research designed studies by
testing hypotheses
• Hypothesis testing entails emphasizing on the making
general conclusions about population characteristics
on the basis of sample information
• Example:
A research can have a hypothesis concerning a certain
variable: For example the mean investment per
investor in the whole community is X shillings. A test
of hypothesis will be done to justify or reject the
hypothesis.
Hypothesis Testing
• Exercise to establish whether certain assertions made
at the beginning of the study are true or not.
• A pair of opposing hypotheses is formulated, the null
hypothesis and the alternative hypothesis.
• It can be proved either true or false with a certain
degree of probability.
• The null hypothesis is always the one that suggests no
significant difference exists in the variables being
compared or related.
Example on testing of hypothesis
• A researcher investigating television views begins with a
hypothesis that: “the average number of hours spent per week
viewing TV for upper social class Dar residents is 22 hours".
After a study with a random sample of 50 residents came up with
a mean viewing time of 19.5 hours with a standard deviation of
3.75 hours. Test the hypothesis that the average number of hours
spent per week is 22 hours.
• Solution
Setting hypothesis:
Null hypothesis: The mean viewing time does not differ from 22 hrs
Symbolically: Ho: μ = 22 hrs
Alternative hypothesis: The mean viewing time differs from 22 hrs
Symbolically: H1: μ ≠ 22 hrs
From the sample of 50 residents we see that:
Mean viewing time(x) = 19.5 hrs
Standard deviation(s)= 3.75 hrs

Computing the standard normal deviate (z) we have:


Z = (X- μ )/(s/√n)
= (19.5 – 22)/(3.75/ √50) =-4.71
We compare this absolute value with the critical Z value for 95%
confidence level(i.e 5% level of significance or α = 0.05) which is
Zc = ±1.96

In this we can conclude that:


Since the calculated Z value falls in the rejection region, we reject
the null hypothesis at 5% level of significance.

xxxxxxxxxxxxxxxxxxxxx

You might also like