You are on page 1of 54

Analyzing the Data Part 1

Analyzing Quantitative Data


• Quantitative analysis techniques such as tables, graphs and
statistics allow us to do this, helping us to explore, present,
describe and examine relationships and trends within our data.
• Quantitative data refer to all such primary and secondary data
and can range from simple counts such as the frequency of
occurrences to more complex data such as test scores, prices or
rental costs.
• To be useful these data need to be analysed and interpreted.
Quantitative analysis techniques assist you in this process.
Preparing, entering and checking data
• number of cases of data, that is the sample size
• type or types of data (scale of measurement);
• data layout and format required by the analysis
software;
• impact of data coding on subsequent analyses (for
different types of data);
• process of entering (or inputting) data;
• need to weight cases;
• process of checking the data for errors.
Types of data
• Categorical data refer to data whose values
cannot be measured numerically but can be
either classified into sets (categories)
according to the characteristics that identify or
describe the variable or placed in rank order.
• Numerical data are those whose values are
measured or counted numerically as
quantities
Categorical data
They can be further subdivided into
descriptive and ranked.
• Descriptive data or nominal data is
impossible to define the category numerically
or to rank it.
• Ranked (or ordinal ) data are a more
precise form of categorical data.
Numerical data
• Interval data can state the difference or ‘interval’
between any two data values for a particular
variable.
• Ratio data can also calculate the relative difference
or ratio between any two data values for a variable.
• Continuous data are those whose values can
theoretically take any value provided that
researchers can measure them accurately enough.
• Discrete data can be measured precisely.
Data layout
• Some primary data collection methods automatically
enter and save data to a computer file at the time of
collection, normally using predefined codes.
• These data can subsequently be exported in a range
of formats to ensure they are compatible with
different analysis software. For example, google
sheet or survey money.
• For other data collection methods, you will have to
prepare and enter your data for computer analysis.
Data Layout
• Virtually all analysis software will accept the data
if they are entered in table format which is called
a data matrix.
• The multiple-response method of coding uses the
same number of variables as the maximum
number of different responses from any one case.
• The multiple-dichotomy method of coding, uses a
separate variable for each different answer.
Coding
• Actual numbers are often used as codes for
numerical data, even though this level of
precision may not be required.
• Once your data are recorded in a matrix, an
analysis software can be used to group or
combine data to form additional variables
with less detailed categories.
• This process is referred to as re-coding.
Coding
• Existing coding schemes can be used for many
variables. E.g. industrial classification, occupation,
social class and socioeconomic classification.
• Coding at data collection usually occurs when there
is a limited range of well- established categories into
which the data can be placed.
• Coding after data collection is necessary when you
are unclear as to the likely responses or there are a
large number of possible responses in the coding
scheme.
Coding for missing data
• Statistical analysis software often reserves a special code for
missing data
• Four main reasons for missing data are identified by De Vaus
(2014):
– The data were not required from the respondent, perhaps because of a
skip generated by a filter question in a survey.
– The respondent refused to answer the question (a non-response).
– The respondent did not know the answer or did not have an opinion.
Sometimes this is treated as implying an answer; on other occasions it is
treated as missing data.
– The respondent may have missed a question by mistake, or the
respondent’s answer may be unclear.
– It may be that leaving part of a question in a survey blank implies an
answer; in such cases the data are not classified as missing.
Entering and saving data
• If software is used to collect data or secondary data have already
existed, it is not necessary to enter input and save the files.
• However, some data are needed to enter and save in the
computer.
• Although some data analysis software contains algorithms that
check the data for obvious errors as it is entered, it is essential
for researchers to take considerable care to ensure that the data
are entered correctly and save the file regularly.
• More sophisticated analysis software allows to attach individual
labels to each variable and the codes associated with each of
them.
Checking for errors
• There will be errors no matter how carefully
researchers code and subsequently enter data.
• The main methods to check data for errors are
as follows:
– Look for illegitimate codes.
– Look for illogical relationships.
– Check that rules in filter questions are followed.
• For each possible error, researchers need to
discover whether it occurred at coding or data
entry and then correct it.
Weighting cases
To weight the cases:
1. Calculate the percentage of the population responding for
each stratum of stratified random sampling.
2. Establish which stratum had the highest percentage of the
population responding.
3. Calculate the weight for each stratum using the following
formula:
Weight = highest proportion of population responding for
any stratum/ proportion of population responding in stratum
for which calculating weight
4. Apply the appropriate weight to each case.
Exploring and presenting data
• Exploratory Data Analysis (EDA) approach useful in these initial
stages. Exploratory Data Analysis approach allows researchers
flexibility to introduce previously unplanned analyses to respond to
new findings. This approach emphasizes on the use of graphs to
explore and understand the data.
• Visual display to illustrate one or more relationships among numbers
– Charts
– Bar graphs or bar charts
– Pie charts
• Once you have explored the variables, researchers begin to compare
variables and interdependences between variables:
– comparing intersections between the data values for two or more variables;
– comparing cumulative totals for data values and variables;
– looking for interdependences between cases for variables.
Exploring and presenting individual variables

• To show specific amounts by using Table.


• To show the highest and lowest values by using
bar chart or bar graph, histogram, or pictogram.
• To show a trend by line graph
• To show proportions or percentages with pie chart
• To show the distribution of value by simply
plotting, or frequency polygon, or a histogram, or
kurtosis, or box plot.
Comparing variables
• To show interdependence and specific amounts by
using contingency table or a cross-tabulation
• To compare the highest and lowest values by using
multiple bar graph, or compound bar graph.
• To compare proportions or percentages by using
percentage component bar graph.
• To compare trends so the intersections are clear with
multiple line graph.
• To compare the cumulative totals by using stacked bar
graph.
Comparing variables
• To compare the proportions and cumulative
totals by using comparative proportional pie
charts.
• To compare the distribution of values
• To show the interdependence between cases
for variables by using a scatter plot.
Describing data using statistics
• Descriptive statistics enable researchers to
describe variables numerically.
• Statistics to describe a variable focus on two
aspects:
– the central tendency;
– the dispersion.
Describing the central tendency
• When describing data for both samples and
populations quantitatively, it is usual to
provide some general impression of values
that could be seen as common, middling or
average. These are termed measures of
central tendency.
Describing the central tendency
• To represent the value that occurs most
frequently
– The mode is the value that occurs most frequently.
– For descriptive data, the mode is the only measure
of central tendency that can be interpreted
sensibly.
– Data are grouped into suitable categories and the
most frequently occurring or modal group is
quoted.
Describing the central tendency
• To represent the middle value
– Median value can be resulted by ranking all the values in
ascending order and finding the mid-point in the
distribution.
– For variables that have an even number of data values the
median will occur halfway between the two middle data
values.
• To include all data values
– The mean includes all data values in its calculation.
However, it is usually only possible to calculate a
meaningful mean using numerical data.
Describing the dispersion
• Two of the most frequently used ways of
describing the dispersion are the:
– difference within the middle 50 percent of values;
– extent to which values differ from the mean
(standard deviation).
• Although these dispersion measures are
suitable only for numerical data, most statistical
analysis software will also calculate them for
categorical data if numerical codes are used.
Describing the dispersion
• To state the difference between values
• Range will be resulted when the difference between the lowest and the highest
values is calculated.
– The median divides the range into two.
– The range can be further divided into four equal sections called quartiles.
– The lower quartile is the value below which a quarter of your data values will
fall; the upper quartile is the value above which a quarter of your data values
will fall.
– The remaining half of data values will fall between the lower and upper
quartiles. The difference between the upper and lower quartiles is the inter-
quartile range.
• Percentiles
• Deciles.
Describing the dispersion
• To describe and compare the extent by which values differ
from the mean
– The standard deviation is used to describe the extent of spread of
numerical data.
– Coefficient of variation is resulted by dividing the standard deviation
by the mean and then multiplying the answer by 100. The values of
this statistic can then be compared.
– Index numbers compare each data value against a base value that is
normally given the value of 100, differences being calculated relative
to this value. Index number is calculated:
Examining relationships, differences and
trends using statistics
• In statistical analysis , the relationship between a
variable and another variable can be found out by
testing the likelihood of a relationship (or one more
extreme) occurring by chance alone, if there really was
no difference in the population from which the sample
was drawn.
• This process is known as significance or hypothesis
testing.
• The data that have been collected are compared with
what researchers would theoretically expect to happen.
Testing for normality
• Histograms, box plots and frequency polygons can be used to
assess visually whether the data values for a particular
numerical variable are clustered around the mean in a
symmetrical pattern, and so normally distributed.
• For normally distributed data, the value of the mean, median
and mode are also likely to be the same.
• Use statistics to establish whether the distribution as a whole
for a variable differs significantly from a comparable normal
distribution.
• Can be used in statistical software such as IBM SPSS Statistics
using the Kolmogorov–Smirnov test and the Shapiro–Wilk test
Testing for significant relationships and
differences
• Testing the probability of a pattern or hypothesis
such as a relationship between variables
occurring by chance alone is known as
significance testing.
• With most statistical analysis software,
significance testing consists of a test statistic, the
degrees of freedom (df) and, based on these, the
probability (p-value) of your test result or one
more extreme occurring by chance alone.
Type I and Type II errors
• Inevitably, errors can occur when making inferences from
samples.
• Statisticians refer to these as Type I and Type II errors.
• Type I errors might involve researchers concluding that two
variables are related when they are not, or incorrectly concluding
that a sample statistic exceeds the value that would be expected
by chance alone.
• The term ‘statistical significance’ refers to the probability of
making a Type I error.
• A Type II error involves the opposite occurring. This means that
Type II errors might involve researchers in concluding that two
variables are not related when they are, or that a sample statistic
does not exceed the value that would be expected by chance
alone.
Testing for significant relationships and
differences
• To test whether two variables are independent or
associated by using chi square test or phi.
• To test whether two groups are different
– Ranked data can be tested by Kolmogorov–Smirnov test.
– Numerical data can be tested by independent groups t-
test or paired t-test.
• To test whether three or more groups are different
by using one-way analysis of variance or one-way
ANOVA.
Assessing the strength of relationship
• To assess the strength of relationship between pairs of
variables
– A correlation coefficient enables you to quantify the strength of
the linear relationship between two ranked or numerical
variables.
– If both the variables contain numerical data, Pearson’s product
moment correlation coefficient (PMCC) can be used to assess
the strength of relationship.
– The two used most widely in business and management
research are Spearman’s rank correlation coefficient
(Spearman’s ρ, the Greek letter rho) and Kendall’s rank
correlation coefficient (Kendall’s τ, the Greek letter tau).
Assessing the strength of relationship
• To assess the strength of a cause-and-effect
relationship between dependent and independent
variables
– Coefficient of determination enables to assess the
strength of relationship between a numerical dependent
variable and one numerical independent variable.
– Coefficient of multiple determination enables to assess
the strength of relationship between a numerical
dependent variable and two or more independent
variables.
Assessing the strength of relationship

• To predict the value of a variable from one or


more other variables.
– Regression analysis can also be used to predict the
values of a dependent variable given the values of
one or more independent variables by calculating
a regression equation.
Examining trends
• Line graph can be drawn to obtain a visual
representation of the trend.
• Three of the more common uses of such analyses
are:
– to explore the trend or relative change for a single
variable over time;
– to compare trends or the relative change for variables
measured in different units or of different magnitudes;
– to determine the long-term trend and forecast future
values for a variable.
Analyzing the Data Part 1
Analyzing Qualitative Data
• Qualitative researchers need to make sense of the
subjective and socially constructed meanings
expressed by those who take part in research about
the phenomenon being studied.
• Since meanings in qualitative research depend on
social interaction, qualitative data are likely to be more
varied, elastic and complex than quantitative data.
• The quality of qualitative research depends on the
interaction between data collection and data analysis
to allow meanings to be explored and clarified.
Deciding on your approach to analysis
• Using a deductive approach
– Theoretical propositions are used as a means to devise a framework to help
researchers to organise and direct the data analysis.
– To devise a theoretical or descriptive framework, researchers need to
identify the main variables, components, themes and issues in the research
project and the predicted or presumed relationships between them.
• Using an inductive approach
– The alternative to a deductive approach is to start to collect data and then
explore them to see which themes or issues to follow up and concentrate
on.
– An inductive approach may be a difficult strategy to follow and may not
lead to success for someone who is an inexperienced researcher.
The interactive nature of the process
• Data collection, data analysis and the development and
verification of propositions are very much an interrelated
and interactive set of processes in qualitative research.
• Analysis is undertaken during the collection of data as
well as after it.
• This analysis helps to shape the direction of data
collection.
• The interactive nature of data collection and analysis
allows recognizing important themes, patterns and
relationships as researchers collect data.
Preparing the data for analysis
• Transcribing qualitative data
– Interview is often audio-recorded and
subsequently transcribed.
• Using electronic textual data including scanned
documents
– For some forms of textual data, the data may
already be in electronic format.
• Both need time to organize. However,
transcribing qualitative data takes longer time.
Aids to help the analysis
• Ways of recording information and developing
reflective ideas to supplement written-up notes
or transcripts and categorised data include:
– interim or progress summaries;
– transcript summaries;
– document summaries;
– self-memos;
– a research notebook;
– a reflective diary or journal.
Aids to help the analysis
• Interim or progress summaries - The progress of research to date, the results of interview
or observation, and the findings from the secondary data are written in interim summary.
• Transcript summaries - A transcript summary compresses long statements into briefer
ones in which the main sense of what has been said or observed is rephrased in a few
words.
• Document summaries - A document summary is used to summarise and list the
document’s key points for research and to describe the purpose of the document, how it
relates to the researcher’s work and why it is significant.
• Self-memos - Self-memos allow recording ideas that occur to researchers about any
aspect of their research, as they think of them.
• Research notebook - The purpose of research notebook will be similar to the creation of
self-memos.
• Reflective diary or journal - Reflective diary or journal is devoted to reflections about the
experiences of undertaking research, what researchers have learnt from these
experiences, how they will seek to apply this learning as their research progresses and
what they will need to do to develop their competence to further their research.
Thematic Analysis
• Thematic Analysis can be referred as a ‘foundational method
for qualitative analysis’
• Thematic Analysis can be used to help researchers:
1. comprehend often large and disparate amounts of qualitative data;
2. integrate related data drawn from different transcripts and notes;
3. identify key themes or patterns from a data set for further
exploration;
4. produce a thematic description of these data; and/or
5. develop and test explanations and theories based on apparent
thematic patterns or relationships;
6. draw and verify conclusions.
Procedure of Thematic Analysis
• Becoming familiar with the data - Familiarization with the data involves a process
of immersion that continues throughout the research project.
• Coding the data - Coding is used to categorize data with similar meanings. Coding
involves labeling each unit of data within a data item with a code that symbolizes
or summarizes that extract’s meaning.
• Searching for themes and recognizing relationships - Searching for themes
involves researchers making judgments about their data and immersing themselves
in the data judgments.
• Refining themes and testing propositions - The themes that researchers devise
need to be part of a coherent set so that the themes provide with a well-structured
analytical framework to pursue the analysis.
• Evaluation on the analysis- Thematic Analysis offers a systematic
approach to qualitative data analysis that is accessible and flexible.
Template Analysis
• Template Analysis is a type of Thematic
Analysis, with a few key differences.
• In Template Analysis, a researcher only codes a
proportion of the data items before developing
an initial list of codes and themes, known as a
coding template.
• The coding template is the hierarchical list of
codes and themes, which is used as the central
analytical tool in Template Analysis.
Procedure of Template Analysis
• The initial procedure of Template Analysis reflects that of Thematic
Analysis. Familiarizing with the data is the same.
• The initial transcript or transcripts will be coded.
• The development of an initial coding template will be an exploratory
process involving the arrangement and rearrangement of the codes
researchers have used until they devise themes that appear to
represent key ideas and relationships in the data.
• As data collection proceeds, your template will be subject to
modification.
• The template may continue to be revised until all of the data
collected have been coded and analysed carefully.
• Evaluation - Template Analysis adopts a higher level of structure
earlier on than Thematic Analysis through the development of
an initial coding template.
Explanation Building and Testing
• Analytic Induction - Analytic Induction uses an incremental approach to build and
test an explanation or theory. Analytic Induction seeks to develop and test an
explanation by intensively examining the phenomenon being explored through the
successive selection of purposive cases.
• Deductive Explanation Building - Explanation Building involves an incremental
attempt to build an explanation by testing and refining a predetermined theoretical
proposition.
• Pattern Matching - Pattern Matching involves predicting a pattern of outcomes
based on theoretical propositions to explain what researchers expect to find from
analyzing their data.
Grounded Theory Method
• Grounded Theory Method is part of a wider
methodological approach.
• Grounded Theory is an emergent and systematic
research strategy. It avoids using a priori codes
derived from existing theory and commences
inductively, by developing codes from the data.
• The development of an emergent idea or theory
from these data informs the direction of a
Grounded Theory study.
Narrative Analysis
• Narrative Analysis is a collection of analytical approaches to
analyse different aspects of narrative. These may be
combined in practice, depending on the research question
and purpose, and the nature of the data.
• Thematic Narrative Analysis
– This approach to Narrative Analysis focuses on ‘what’ the narrative
is about rather than ‘how’ it is constructed. Thematic Narrative
Analysis can be used to analyse an individual narrative or multiple,
related narratives.
• Structural Narrative Analysis
– Structural Narrative Analysis analyses the way in which a narrative is
constructed.
Discourse Analysis
• In Discourse Analysis, the emphasis is not on studying the
way in which language is used for its own sake.
• In this more specific sense, ‘discourse’ describes how
language is used to shape this meaning-making process, to
construct social reality.
• A discourse is therefore not just seen as neutrally reflecting
social practice or relations but as constructing these.
• Discourse Analysis explores how discourses construct or
constitute social reality and social relations through
creating meanings and perceptions.
Content Analysis and quantifying qualitative
data
• Content Analysis is an analytical technique that
codes and categorises qualitative data in order to
analyse them quantitatively.
• Content Analysis has a long history that illustrates
its use as an approach spanning qualitative and
quantitative methods.
• ‘Content analysis is a research technique for the
objective, systematic and quantitative description
of the manifest content of communication.’
Data Display and Analysis
• The process of analysis consists of three concurrent
sub-processes:
• data condensation - includes summarising and
simplifying the data collected and/or selectively
focusing on some parts of this data;
• data display - involves organising and assembling your
data into summary diagrammatic or visual displays
• drawing and verifying conclusion - by the use of data
displays
Using CAQDAS
• CAQDAS (Computer Assisted Qualitative Data
Analysis Software, sometimes abbreviated to
QDAS) refers to programs containing a range
of tools to facilitate the analysis of qualitative
data.
• When used systematically, it can aid continuity
and increase both transparency and
methodological rigour.
Functions of CAQDAS programs
• Structure of work
• Closeness to data and interactivity
• Explore the data
• Code and retrieve
• Project management and data organisation
• Searching and interrogating
• Writing memos, comments, notes, etc.
• Output
References
• Research Methods for Business Students (7th
edition) by Mark Saunders, Philip Lewis, and
Adrian Thornhill (Chapter 12 and Chapter 13)

You might also like