Quantitative Analysis Lectures Notes-2

Quantitative Analysis
TOPIC 0:
INTRODUCTION
Data analysis is perhaps the most important component of research. Weak analysis produces
inaccurate results that not only hamper the authenticity of the research but also make the
findings unusable. It‟s imperative to choose your data analysis methods carefully to ensure
that your findings are reliable and valid.
Section One: Definition and Importance
Data analysis is the process of evaluating data using analytical and statistical tools to discover
useful information and ease or aid decision making by organisation or policy makers. Data
analysis is a part of a larger process of deriving business intelligence. This process includes
one or more of the following steps:
 Posing questions: a question must be asked in the problem domain. Each study seeks to
answer a particular question. For example: does reward determine employees‟
productivity?
 Defining objectives: any study must have a set of clearly defined objectives. Much of the
decisions made in the rest of the process depend on how clearly the objectives of the
study have been stated. For example to determine the relationship between rewards and
employees productivity
 Hypotheses formulation: before any analysis is conducted, postulate or conjecture is
made on what the relationship among variables may be. Hypotheses are testable
statement on the relationship among variables of interest. For example there is no
significant relationship between rewards and employees productivity.
 Data collection: data are of two forms depending on the source of the data namely
primary and secondary data. Primary data are also called field data or first-hand data
because they are collected and compiled by the researcher using tools like questionnaires
or focus groups whereas secondary data are collected from other sources. This step is
very important since the quality and reliability of the findings from data analysis depend
firstly on the quality and reliability of the data collection procedure. When data is being
collected using surveys, a questionnaire to be presented to the subjects is needed. The
questions should be properly designed for the statistical method being used.
1
 Data wrangling or coding: Raw data may be collected in different formats. The
collected data must be cleaned and converted so that analysis tools can import it.
 Data analysis: it the process by which sense is made of the data gathered in research by
proper application of statistical methods and answer to the question posed earlier
provided.
 Drawing conclusions and making predictions: this is the step where after sufficient
analysis, conclusions can draw from the data and appropriate predictions can be made.
These conclusions and predications may then be summarized in a report delivered to end
users.
The main purpose of data analysis is to look at what the data is trying to tell us that are to
draw conclusions on phenomenon and carry out predictions.
Section Two: Types of Data
Data analysis makes great use of economic data to describe phenomenon that is to test a
theory or estimate a relationship. We are looking in this section at the different types of data
and their uses in econometrics.
There are three major types of economic data sets in econometrics: cross-sectional, time-
series, and panel. They are distinguished by the dependence structure across observations.
2.1 Cross-sectional data
Cross-sectional data, or a cross section of a study population, in statistics and econometrics is

a type data collected by observing many subjects (such as individuals, firms, countries, or
regions) at the same point of time, or without regard to differences in time. Analysis of cross-
sectional data usually consists of comparing the differences among the subjects. A cross-
sectional variable is often subscripted with the letter i.
In cross sectional data, observations must be obtained by random sampling. Thus, cross
sectional observations are mutually independent. The ordering of observations in cross
sectional data does not matter for econometric analysis. If the data are not obtained with a
random sample, we have a sample selection problem.
Most often cross-section data are data for micro units such as households, individuals, firms,
companies, etc. but there may also be cross sectional data relating to aggregate units such as
countries, regions, etc. Of course, data of this type are not obtained by random sampling.
2
Cross-Section data show spatial variation: Variation across units. Cross-sectional data can be
used in cross-sectional regression, which is regression analysis of cross-sectional data. For
example, if we want to measure current obesity levels in a population, we could draw a
sample of 1,000 people randomly from that population, measure their weight and height, and
calculate what percentage of that sample is categorized as obese. This cross-sectional sample
provides us with a snapshot of that population, at that one point in time. Note that we do not
know based on one cross-sectional sample if obesity is increasing or decreasing; we can only
describe the current proportion.
2.2 Time series data
In time series, data are observations on a variable over time. The same small-scale or
aggregate entity is observed at various points in time. A time-series variable is often
subscripted with the letter t that is they are indexed by time. This type of data is characterized
by serial dependence so the random sampling assumption is inappropriate. Most aggregate
economic data is only available at a low frequency (annual, quarterly or perhaps monthly) so
the sample size is typically much smaller than in cross-section studies. The exception is
financial data where data are available at a high frequency (weekly, daily, hourly, or by
transaction) so sample sizes can be quite large.
Most often time-series data are macro data or macro-type data, for example time-series for
macro-economic variables from the National Accounts. But micro-data may also occur as
time-series, for example time-series for a particular household or time-series for a particular
firm. In time-series data the data variation goes over time periods; we have variation over
time (time serial variation). Time-Series data show temporal variation: Variation over periods
(years, months, weeks, seconds ...).
2.3 Panel data
Panel data combines elements of cross-section and time-series. These data sets consist of a set
of individuals (typically persons, households, or corporations) surveyed repeatedly over time.
The common modelling assumption is that the individuals are mutually independent of one
another, but a given individual‟s observations are mutually dependent. This is a modified
random sampling environment. Thus, the ordering in the cross section of a panel data set does
not matter, but the ordering in the time dimension matters a great deal. If we do not take into
account the time in panel data, we say that we are using pooled cross sectional data.
3
Panel data (or time-series cross-sectional (TSCS) data), combines both and looks at multiple
subjects and how they change over the course of time. Panel analysis uses panel data to
examine changes in variables over time and differences in variables between subjects. The
variables in a panel data set can vary both across the spatial dimension and over and time
dimension. But some of them may vary along one dimension only. Panel data show both
spatial and temporal variation.
Panel data have, over the years, become a gradually more important and more frequently used
data type for analysing economic relationships. This has several explanations: (i) Panel data is
a `richer' data type than (pure) cross-section data and (pure) time-series data. (ii) The
development of the data collection and data processing methods. (iii) The development in
computer technology. With panel data, all the coefficients can be estimated simultaneously
meanwhile it was impossible with pure cross-sectional and pure time series data.
Section Three: Types of analysis
Research is a systematic investigation that aims to generate knowledge about a particular

phenomenon. However, the nature of this knowledge varies and reflects your study objectives.
Some study objectives seek to make standardised and systematic comparisons, others seek to
study a phenomenon or situation in detail. These different intentions require different
approaches and methods, which are typically categorised as either quantitative or qualitative.
You have probably already made decisions about using qualitative or quantitative data for
monitoring and evaluation. Perhaps you have had to choose between using a questionnaire or
conducting a focus group discussion in order to gather data for a particular indicator.
3.1 Quantitative research
Quantitative research typically explores specific and clearly defined questions that examine
the relationship between two events, or occurrences, where the second event is a consequence
of the first event. Such a question might be: „what impact did the programme have on
children‟s school performance?‟ To test the causality or link between the programme and
children‟s school performance, quantitative researchers will seek to maintain a level of control
of the different variables that may influence the relationship between events and recruit
respondents randomly. Quantitative data is often gathered through surveys and questionnaires
that are carefully developed and structured to provide you with numerical data that can be
explored statistically and yield a result that can be generalised to some larger population.
4
3.2 Qualitative research
Research following a qualitative approach is exploratory and seeks to explain „how‟ and
„why‟ a particular phenomenon, or programme, operates as it does in a particular context. As
such, qualitative research often investigates i) local knowledge and understanding of a given
issue or programme; ii) people‟s experiences, meanings and relationships and iii) social
processes and contextual factors (e.g., social norms and cultural practices) that marginalise a
group of people or impact a programme. Qualitative data is non-numerical, covering images,
videos, text and people‟s written or spoken words. Qualitative data is often gathered through
individual interviews and focus group discussions using semi-structured or unstructured topic
guides.
The table below summarises key differences between qualitative and quantitative research
(analysis).
Qualitative research Quantitative research
Type of Subjective Objective
knowledge
Aim Exploratory and observational Generalizable and testing
Flexible Fixed and controlled
Characteristics Contextual portrayal Independent and dependent
variables
Dynamic, continuous view of Pre and post-measurement of
change change
Sampling Purposeful Random
Data collection Semi-structured or unstructured Structured
Nature of data Narrative, quotations, descriptions Number statistics
Value uniqueness, particularity replication
Analysis Thematic Statistical
Section four: Methods of Data collection
There are many different methods of collecting data. Depending on the type of research
(qualitative or quantitative), they may use one or more of the following forms:
4.1 Methods of collecting quantitative data
5
Quantitative data is numerical and can be collected in a number of forms. The most common
forms of quantitative data are shown below.
• Units: number of staff that have been trained; number of children enrolled in school for the
first time
• Prices: amount of money spent on a building, or the additional revenue of farmers following
a seed distribution programme
• Proportions/percentages: proportion of the community that has access to a service
• Rates of change: percentage change in average household income over a reporting period
• Ratios: ratio of midwives or traditional birth attendants to families in a region
• Scoring and ranking: scores given out of ten by project participants to rate the quality of
service they have received.
Statistical analysis is used to summarise and describe quantitative data and graphs or tables
can be used to visualise present raw data. This section will review the commonly used
methods/sources of quantitative data and the techniques used for recruiting participants.
Quantitative data can be collected using a number of different methods and from a variety of
sources.
1. Surveys and questionnaires use carefully constructed questions, often ranking or scoring
options or using closed-ended questions. A closed-ended question limits respondents to a
specified number of answers. For example, this is the case in multiple-choice questions. Good
quality design is particularly important for quantitative surveys and questionnaires.
2. Biophysical measurements can include variables such as height and weight of a child
3. Project records are a useful source of data. For example, the number of training events
held and the number of participants attending
4. Service provider or facility data includes school attendance or health care provider
vaccination records
6
5. Service provider or facility assessments are often carried out during the monitoring and
evaluation of our projects.
4.2 Methods for collecting qualitative data
Individual interview
An individual interview is a conversation between two people that has a structure and a
purpose. It is designed to elicit the interviewee‟s knowledge or perspective on a topic.
Individual interviews, which can include key informant interviews, are useful for exploring an
individual‟s beliefs, values, understandings, feelings, experiences and perspectives of an
issue. Individual interviews also allow the researcher to ask into a complex issue, learning
more about the contextual factors that govern individual experiences.
Focus group discussions
A focus group discussion is an organised discussion between 6 to 8 people. Focus group

discussions provide participants with a space to discuss a particular topic, in a context where
people are allowed to agree or disagree with each other. Focus group discussions allow you to
explore how a group thinks about an issue, the range of opinions and ideas, and the
inconsistencies and variations that exist in a particular community in terms of beliefs and their
experiences and practices. You should therefore purposefully (the adjective is „purposive‟)
recruit participants for whom the issue is relevant. Be clear about the benefits and limitations
of recruiting participants that represent either one population (e.g. school going girls) or a mix
(e.g. school going boys and girls), and whether or not they know each other.
Photovoice
Photovoice is a participatory method that enables people to identify, represent and enhance
their community, life circumstances or engagement with a programme through photography
and accompanying written captions. Photovoice involves giving a group of participant‟s
cameras, enabling them to capture, discuss and share stories they find significant.
Picture story
The picture story method enables children, in a fun and participatory way, to communicate
their perspectives on particular issues through a series of drawings (story telling) they have
7
made. The story telling can either be done in writing, depending on the child‟s level of
literacy, or verbally with a researcher. The picture story method is relatively quick and
inexpensive, particularly if the draw-and-write technique is adopted. The picture story method
provides a non-threatening way to explore children‟s views on a particular issue (e.g. barriers
to girl‟s education) and to begin to identify what can be done to address any struggles faced
by children.
Qualitative research often focuses on a limited number of respondents who have been
purposefully selected to participate because you believe they have in-depth knowledge of an
issue you know little about, such as:
 They have experienced first-hand you topic of study, e.g. workers of an organisation
 They show variation in how they respond to hardship
 They have particular knowledge or expertise regarding the group under study, e.g.
social workers supporting working street children.
You can select a sample of individuals with a particular „purpose‟ in mind in different ways,
including random sampling, purposive sampling, cluster sampling, etc.
Section Five: Data Analysis Methods
Data analysis methods depends on the type of data analysis conducted (Quantitative or
qualitative).
5.1 Quantitative Data Analysis Methods
After preparing your data that is transforming the raw data collected into a usable and
readable format by validating, editing and coding the data, the data is ready for analysis. The
two most commonly used quantitative data analysis methods are descriptive statistics and
inferential statistics.
5.1.1 Descriptive statistics
Typically, descriptive analysis (or descriptive analysis) is the first level of analysis. It helps
the researcher summarise the data and find patterns. A few commonly used descriptive
statistics are:
Mean: numerical average of a set of value
8
Median: midpoint of set of numerical values
Percentage: it is used to express how a value or group of respondents within the data relates
to a larger group of respondents.
Frequency: the number of times a value is found.
Range: the highest and lowest value in a set of values.
Descriptive statistics provide absolute numbers. However, they do not explain the rationale or
reasoning behind those numbers. Before applying descriptive statistics, it‟s important to think
about which one is best suited for your research question and what you want to show. For
example, a percentage is a good way to show the gender distribution of respondents.
Descriptive statistics are most helpful when the research is limited to the sample and does not
need to be generalized to a larger population. For example, if you are comparing the
percentage of female and male employees in two different organisations, then descriptive
statistics is enough. Since descriptive analysis is mostly used for analysing single variable, it
is often called univariate analysis.
5.1.2 Inferential Statistics
Often, researchers collect data on a sample of their population, then they generalize the results
to the entire population or target group. Inferential statistics are used to generalize results and
make predictions about a larger population.
These are complex analyses that show the relationship between several different variables,
rather than describing a single variable. They are used when the researcher needs to go
beyond absolute values and understand the relations between variables.
A few types of inferential analysis are:
Correlation: This describes the relationship between two variables. If a correlation is found,
it means that there is a relationship among the variables. For example, taller people tend to
have a higher weight. Hence, height and weight are correlated with each other. However, this
doesn‟t necessarily mean that one variable causes the other (e.g. gaining weight doesn‟t cause
people to grow taller).
Regression: This shows the relationship between two variables. For example, regression can
help us guess someone‟s weight based on their height.
9
Analysis of variance (ANOVA): This is a statistical procedure used to test the degree to
which two or more groups vary or differ in an experiment. In most experiments, a great deal
of variance indicates that there was a significant finding from the research. For example, to
understand the relationship between number of children in the family and the socio economic
status, a researcher may recruit a sample of families from each economic status and ask them
about their ideal number of children. The ANOVA will be used to check if the difference
between the groups‟ answers is statistically significant or due to random chance.
The choice of inferential statistic completely depends upon the research objective. Like in the
case of descriptive analysis, it is best to identify the appropriate inferential statistic for your
research question.
Since inferential statistics are used to determine the relationship between two or more
variables, they are called bivariate analysis (when limited to two variables) or multivariate
analysis (when there are more than two variables).
The above-stated methods are the most commonly used methods for data analysis. However,
other data analysis methods and metrics, such as standard deviation and variance, are also
available.
5.2 Analysing Qualitative Data
Qualitative data analysis works a little differently from quantitative data, primarily because
qualitative data is made up of words, observations, images, and even symbols. Deriving
absolute meaning from such data is nearly impossible; hence, it is mostly used for exploratory
research. While in quantitative research there is a clear distinction between the data
preparation and data analysis stage, analysis for qualitative research often begins as soon as
the data is available.
Analysis and preparation of data happen in parallel and include the following steps:
1. Getting familiar with the data: since most qualitative data are just words, the researcher
should start by reading it several times to get familiar and start looking for basic
observations or patterns. This includes transcribing the data.
2. Revisiting the research objectives to identify the questions that can be answered
through the collected data.
10
3. Developing a framework also known as coding or indexing. Here the researcher

identifies broad ideas, concepts, behaviours or phrases and assigns codes to them. For
example coding age, gender, socio economic status or even positive and negative
response to a question.
4. Identifying connections and patterns: once the data is coded, the research can start
identifying themes, looking for the most common responses to questions, identifying data
or patterns that can answer research questions, and finding areas that can be explored
further.
Several methods are available to analyse qualitative data. The most commonly used data
analysis methods are:
 Content analysis: this is one of the most common methods to analyse qualitative data. It
is used to analyse documented information in the form of texts, media or even physical
items. When to use this method depends on the research questions. Content analysis is
usually used to analyse responses from interviewees.
 Narrative analysis: This method is used to analyse content from various sources, such as
interviews of respondents, observations from the field, or surveys. It focuses on using the
stories and experiences shared by people to answer the research questions.
 Discourse analysis: Like narrative analysis, discourse analysis is used to analyse
interactions with people. However, it focuses on analysing the social context in which the
communication between the researcher and the respondent occurred. Discourse analysis
also looks at the respondent‟s day-to-day environment and uses that information during
analysis.
 Grounded theory: This refers to using qualitative data to explain why a certain
phenomenon happened. It does this by studying a variety of similar cases in different
settings and using the data to derive causal explanations. Researchers may alter the
explanations or create new ones as they study more cases until they arrive at an
explanation that fits all cases.
These methods are the ones used most commonly. However, other data analysis methods,
such as conversational analysis, are also available.
11
TOPIC ONE:
HYPOTHESIS TESTING
Section One: Definition
A hypothesis can be defined as a tentative answer to a research problem. It is an assumption

we make about the population parameter. Research hypothesis are usually stated in terms of a
dependent and independent variable. In order words, the hypothesis will link the two variables
so as to bring out the relationship between them.
Section Two: Types of hypotheses
There are basically two types of hypotheses, namely:
 The null hypothesis

 The alternative or directional hypothesis
The null hypothesis denoted as H0 states that there is no effect (or relationship between) of
the independent variable on the dependent variable. This is the hypothesis the researcher want
to verify or test on the basis of sampled information.
The alternative (directional) hypothesis denoted H1 is the counter proposition to the null
hypothesis. It is formulated or states that the exogenous (independent) variable has an effect
on the endogenous (dependent) variable.
After formulating the research hypotheses, the research wishes to reject the null hypothesis
(H0) and accept the alternative (H1).
Section Three: Types of errors in hypothesis testing
In testing hypothesis that is when making decision whether to reject or accept the null
hypothesis (H0), the researcher is liable to commit some errors. One or two types of errors are
usually committed:
Type I error: it is committed when the null hypothesis (H0) is true but is rejected for sampled
data (when the null hypothesis is rejected when it is actually true). This implies that the
alternative hypothesis (H1) is accepted when in fact it is wrong. It should further be noted that
12
the probability of committing type I error is called the level of significance and is noted . As
such the level of significance simply refers to the probability of rejecting H0 when in fact it is
true.
In data analysis practice, significance levels of 10%, 5% and 1% are customary

(conventionally accepted).
Type II error: It is committed when the null hypothesis (H0) which is false in the population
is accepted erroneously on the basis of sampled information. It means the alternative
hypothesis is rejected whereas it should not be rejected. The probability of committing type II
error is denoted .
Section Four: Two tails test and one tail test
When we formulate hypothesis about a true population parameter ( ) and hypothesised

population values ( ), one of the three (3) following cases can arise:
a) versus (two tails test)

b) versus (one tails test more precisely a right tail test)
c) versus (one tails test more precisely a left tail test)
In fact, the way the alternative or directional hypothesis is expressed is important in

determining whether we choose a tow or one tail test.
It is equally important to divide the whole set of value of the population into 2 zones / regions
namely the acceptance region and the rejection or critical region. The values of the population
form a normal distribution. The critical zone can be chosen either at the left end of the
distribution or at the right end (tail), or half at each tail (end) of the distribution.
Figure 1: Two tailed test acceptance and rejection zone
13
Figure 2: One tailed test acceptance and rejection zone
[ Insert Graph]
Section Five: Steps in hypothesis testing
The steps or procedure in testing hypothesis include the flowing:
Step 1: formulate your null and alternative hypotheses (H0 and H1)
Step 2: choose the level of significance
Step 3: choose the appropriate test statistics
Step 4: determine the tabular or critical value of the test statistic at the specified level of
significance
Step 5: compare the computed (or calculated) test statistic with the critical value. If the
computed value lies in the critical zone (is greater than the critical value), reject the null
hypothesis (H0) and conclude that the estimation (test statistics) is statistically significant.
Otherwise, that is if the computed or calculated value lies outside the critical zone or lies in
the acceptance region (is less than the critical value) accept the null hypothesis (H0) and
conclude that the estimation (test statistics) is statistically insignificant.
14
TOPIC TWO
THE CHI SQUARE DISTRIBUTION
The chi square distribution is the most commonly used test of association for research by
students. There are two variants of the Chi square distribution
Section One: Chi square test of goodness of fit
The Chi square test goodness of fit seeks to compare the observed frequencies of a
distribution with the expected frequencies. When the Chi square goodness of fit is used, the
data are grouped into K categories and the observed frequencies for each category are
determined. For each category, the expected frequency can be determined if the data were
distributed in a specific hypothetical manner.
The Chi square test statistics is given by,
With degree of freedom (df) = K – 1
O is observed frequency; E is expected frequency, K is number of categories or modalities.
The computed (or calculated) chi square value can then be compared with the tabular (or
critical) chi square to decide whether to reject or accept the null hypothesis (H0).
Application:
A year 3 student wants to investigate the most important reason for the choice of a degree
program in FEMS. The question asked to the students is „which of the following reasons best
explained (is the most important to you) the choice of your study program?”. Students are
obliged to choose only one reason from the following list:
a) Prestige
b) Availability of pedagogic material
c) Ease of passing examination
d) Job opportunities
Responses were collected from 28 students and summarised in the table below:
15
Reasons Prestige Availability of Ease of passing Job TOTAL

pedagogic material examination opportunity
Observed 6 3 7 12 28
frequency
Expected
frequency
Task: Investigate whether the reason for the choice of a study program plays a role in the
decision to choose a training program at 5% level of significance.
Solution:
Section Two: The Chi square test of association or independence
When we have a cross tabulation of 2 categorical or qualitative data (contingency table), the
question we usually ask is: are the two variables related to each other or they are independent?
In other words, does one variable affect the other? It is very important to note that these two
variables must be categorical in nature.
The chi square test of independence is conducted with the following formula:
Where:
Ri = total of row i
Ci = total of column i
T = grand total
r = number of rows and c = number of columns.
Application:
The table below shows information of political affiliation of personnel of a public company in
Cameroon.
16
Manual employee Executive positions TOTAL

CPDM 36 38 74
SDF 92 28 120
CRM 22 26 48
TOTAL 150 92 242
Task: Investigate whether there is a relationship between political affiliations of personnel and
type of position occupied in the company.
Solution:
Section Three: The Contingency Coefficient
The contingency coefficient is a coefficient of association that tells whether two variables or
data sets are independent or dependent of each other. It is also known as Pearson’s
Coefficient (not to be confused with Pearson‟s Coefficient of Skewness).
It is based on the chi-square statistic, and is defined by:
Where:
 χ2 is the chi-square statistic,

 N is the total number of cases or observations in our analysis/study (grand total),
 C is the contingency coefficient.
The contingency coefficient helps us decide if variable b is „contingent‟ on variable a.

However, it is a rough measure and doesn‟t quantify the dependence exactly. It can be used as
a rough guide:
 If C is near zero (or equal to zero) you can conclude that your variables are
independent of each other; there is no association between them.
 If C is away from zero there is some relationship; C can only take on positive values.
17
The larger the table your chi-square coefficient is calculated from, the closer to 1 a perfect
association will approach. That‟s why some statisticians suggest using the contingency
coefficient only if you‟re working with a 5 by 5 table or larger.
A contingency coefficient is particularly informative if you are working with a large sample,
and you do not need to find out if an association is complete or not (just whether or not the
association exists).
Other alternative measures of association include the phi coefficient (which has the same
weak point as our C; never reaching one), and Cramer’s V. Cramer‟s V is often preferred
because with perfect association, it becomes exactly 1 no matter how large the table.
Cramer‟s V coefficient is calculated as follows:
N.B: The p-value for the significance of the contingency coefficient and Cramer‟s V
coefficient are the same as that of the chi square.
Application:
Compute the contingency coefficient and Crammer‟s V of the previous example.
Solution:
Assignment:
The results of a random sample of children with pain from muscular injuries treated with
Amoxicillin, Ibuprofen and codeine are shown in the table.
Amoxicillin Ibuprofen codeine Total
Significant improvement 58 81 61
Slight improvement 42 19 39
Total
At , is there enough evidence to conclude that the treatment and result are
independent?
18
TOPIC THREE
THE STUDENT T TEST
The student t-test tells you how significant the differences between groups are. In other
words, it informs you if those differences (measured in means) could have happened by
chance. There are principally three types of t-tests.
Section One: One sample t test
It is also known as the t test of one sample / population mean. It compares the mean of your
sample data to a known value (hypothesised value). For example, you may be interested to
know how your sample mean compares to the population mean. The one sample t tesr is
appropriate when you do not know the population standard deviation or you have a small
sample size. The assumptions of the test are:
Data is independent
Data is collected randomly (using random sampling for example)
The data is approximately normally distributed
The t-statistics is computed using the following formula:
̅
⁄
√
̅ is the sample mean
is the hypothesised mean (population mean for example)
the standard deviation
is the number of observation
Application:
You company wants to improve sales. Past sales data indicate that the average sale was 100
MU per transaction. After training your sales force, recent sales data taken from a sample of
19
25 salesmen indicates an average sale of 130 MU, with a standard deviation of 15 MU. Did
the training work? Test the hypothesis at 5% level of significance.
Solution:
Assignment:
A lecturer claims that the average performance of students in quantitative analysis is 13. After
administering a test to a sample of 16 students, the mean score is 15 with a standard deviation
of 2.5. Investigate the claim of the lecturer at 5% level.
Section Two: Unpaired samples t test or independent samples t-test
It is also known as the t-test statistics about difference two populations means. This is the
most common form of the t-test. It helps you to compare the means of two sets of data. For
example, you could run a test to see if the mean test scores of males and females are different.
This second variant of t-test answers the question. Could these differences have occurred by
random chance? Assumptions of the test include:
 Independence of the two samples: you need two independent categorical groups that
represent your independent variables (for example male and female).
 The dependent variable should be approximately normally distributed and measured
on a continuous scale.
 The variances of the dependent variable should be equal
Independent t-test with equal variances
Consider ̅̅̅ the sample mean of population 1 and ̅̅̅ that of population 2, then the difference
between the 2 samples means ( ̅̅̅ ̅̅̅ ). The formula for computing the value of t-test
statistics for such a difference is given by:
20
̅̅̅̅ ̅̅̅̅ ̅̅̅̅ ̅̅̅̅
√[ ]
√[ ]
̅̅̅ ̅̅̅ are respectively the first and second sample means
are the variances of the first sample and second sample respectively
are the number of observations in the first and second samples respectively
The test that assumes equal population variances is referred to as the pooled t-test. Pooling
refers to finding a weighted average of the two independent sample variances.
The pooled test statistic uses a weighted average of the two sample variances.
( ) ( )
 If then which is the average of the two sample
variances
 If then the variance based on the larger sample size will receive more weight than
the other Variance.
The advantage of this test statistic is that it exactly follows the student‟s t-distribution with
n1+ n2– 2 degrees of freedom. The t-statistics formula with pooled variance becomes:
̅̅̅̅ ̅̅̅̅ ̅̅̅̅ ̅̅̅̅ ̅̅̅̅ ̅̅̅̅ ̅̅̅̅ ̅̅̅̅
√[ ] √ [ ] √[ ]
√[ ]
Where Varp and Sp are respectively the pooled variance and pooled standard deviation.
Independent t-test with unequal variance
The above formula is based on a restrictive assumption of equality of variances which rarely
holds true. However, when the assumption of equality of variances is violated, how do you
21
proceed? You can still use the two-sample t-test. You use a different estimate of the standard
deviation.
Application:
A researcher wishes to verify that students from private universities perform better than those
from state universities. He selects a sample from each type of the universities and administers
the same test to both samples and obtain the following results:
University n Mean Variance

Private 9 15 25
State 15 13.5 26
Test the researcher claim at 1% and 5% levels of significance.
Solution:
Step 1: Formulate the hypotheses of the test
Assignment:
[See tutorial questions]
Section Three: The Matched or Paired t-test
This section describes the hypothesis testing procedure for the difference between two
population means when the two samples are dependent.when we have two samples collected
from the same source, the sample are called matched or paired samples and the previous test
procedure is no longer applicable. For example we may administer a test in quantitative
analysis and research methods to the same group of year 3 students. In paired samples, the
difference between 2 data values for each element of the two samples is denoted by d and is
called the paired difference. Usually we need to compute the mean and standard deviation of
the paired differences for the samples:
22
∑ ∑ ̅
̅ √[ ]
Then the t-statistics is obtained from the following formula:
̅
̅
̅ √
The t statistics formula can thus be summarised as:
̅
∑ ̅
√
N.B: since it is a symmetric test we work with the absolute value (| |.
Application:
Two tests, one in quantitative analysis and the other in managerial economics were
administered to a sample of 10 level 400 management students in FEMS. The score on 100
are presented in the table below:
Quantitative Analysis 60 51 42 53 40 74 65 41 55 70
Managerial Economics 75 58 79 81 55 70 80 74 60 83
Verify the hypothesis that performances in the two subjects were not the same
Solution:
Assignment:
[See tutorial questions]
23
TOPIC FOUR
ANALYSIS OF VARIANCE (ANOVA)
ANOVA is used when the population means to be compared are more than two (2). There are
basically two types of ANOVA:
 One factor or one way ANOVA

 Two factors or two ways ANOVA
Section One: One way ANOVA
1.1 Definition
This is a design where one independent variable with many levels is associated with a
dependent variable. A typical design of the one way ANOVA is the comparison of the mean
from three different populations ( ̅ ̅ ̅ ). Since it is usually difficult to get the true
population means, we infer the information from sample means computed as estimate of the
population.
1.2 Technique or Procedure of ANOVA
ANOVA is a technique that breaks down total variation into 2 components. These
components are between group variation and within group variation.
Total variation = BG variation + WG variation
The between group variation is due to the general differences among the means of the
samples. The possible explanation for between group variability is what we call the treatment
or group effect. This means that the differences arise due to the treatments that are
implemented. Such differences may also arise due to individual differences. In this case,
subjects may have entered the treatment conditions with different ability or attitude.
The within group variability is due to variation among subjects within the respective samples.
Example: Is there a significant difference of reading skills between children from various
socio economic backgrounds (High, Medium, Low).
24
To explain this concept, we consider a one factor experiment with 3 treatments or groups as
displayed in the table below
G1 G2 G3
X11 X21 X31
X12 X22 X32
X13 X23 X33
. . .
. . .
. . .
X1n1 X2n2 X31
T1 T2 T3
̅ ̅ ̅
The table shows that there are n1 subjects in group G1, n2 subjects in group G2 and n3 subjects
in group G3. The groups means are ̅ ̅ ̅ .
N = n1 + n2 + n3 (where N is the total number of observations).
To perform an ANOVA, we proceed through the following steps:
Step 1: We compute the sum of squares for total variation (SStotal), sum of squares for
between group variation (SSBG) and sum of squares for within group variation (SSWG).
SStotal = SSBG + SSWG
Where∑ is the sum of squared observations in the different groups and is the squared
sum of the sub totals ( ∑ ).
By deduction, we will then have:
Step 2: We compute the degree of freedom
25
In ANOVA, we have to compute 3 degrees of freedom for each of the sum squares variation.
Step 3: We determine the mean squares
There are two mean squares:
Mean squares between groups (MSBG):
Mean squares within groups (MSWG):
Step 4: We compute the F-ratio (Fischer statistics)
The F-ratio formula is:
Once the F-ratio has been calculated, it can be compared with the critical F for a given level
of significance at specified degree of freedom for the numerator ( ) and degree of freedom
for the denominator ( ).
The calculation required for the above test is summarised in a table known as the ANOVA
table:
Variations Degree of freedom (df) Mean Square (MS) F-ratio
at F (k-1; N-k)
26
Application:
The yield of certain varieties of corn grown a particular type of soil, treated with chemical C1,
C2 and C3 are given in the table below:
C1 C2 C3
3 2 4
4 4 6
5 3 5
4 3 5
1) Find the mean yields for different treatments

2) Find the grand mean for all the treatments
3) Find the total variation
4) Determine the between and within treatment variations
5) Test the hypothesis that there is no difference in yields at 5% and 1% level of
significance.
Solution:
[See you in class. Attendance is mandatory]
Assignment:
Three machines are tested in a company to see if the machines produce the same quantity of a
given product. Productions for a certain number of days of the week are presented in the table
below:
27
M1 M2 M3
10 9 17
8 8 15
12 14 12
5 13 14
11 16
Task: Test at 1% level of significance if the productions from the three above machines are
the same.
1.3 Limitations of the One Way ANOVA
A one way ANOVA will tell you that at least two groups were different from each other. But
it will not tell you which groups were different. If your test returns a significant f-statistic,
you may need to run an ad hoc test (like the Least Significant Difference test) to tell you
exactly which groups had a difference in means.
Section Two: Two Way ANOVA
2.1 Definition
A Two Way ANOVA is an extension of the One Way ANOVA. With a One Way, you have
one independent variable affecting a dependent variable. With a Two Way ANOVA, there
are two independents. Use a two way ANOVA when you have one measurement variable
(i.e. a quantitative variable) and two nominal variables. In other words, if your experiment
has a quantitative outcome and you have two categorical explanatory variables, a two way
ANOVA is appropriate.
For example, you might want to find out if there is an interaction between income and gender
for anxiety level at job interviews. The anxiety level is the outcome, or the variable that can
be measured. Gender and Income are the two categorical variables. These categorical
variables are also the independent variables, which are called factors in a Two Way ANOVA.
28
The factors can be split into levels. In the above example, income level could be split into
three levels: low, middle and high income. Gender could be split into three levels: male,
female, and transgender. Treatment groups are all possible combinations of the factors. In this
example there would be 3 x 3 = 9 treatment groups.
2.2 Main Effect and Interaction Effect
The results from a Two Way ANOVA will calculate a main effect and an interaction effect.
The main effect is similar to a One Way ANOVA: each factor‟s effect is considered
separately. With the interaction effect, all factors are considered at the same time. Interaction
effects between factors are easier to test if there is more than one observation in each cell. For
the above example, multiple stress scores could be entered into cells. If you do enter multiple
observations into cells, the number in each cell must be equal.
Two null hypotheses are tested if you are placing one observation in each cell. For this
example, those hypotheses would be:
H01: All the income groups have equal mean stress.
H02: All the gender groups have equal mean stress.
For multiple observations in cells, you would also be testing a third hypothesis:
H03: The factors are independent or the interaction effect does not exist.
An F-statistic is computed for each hypothesis you are testing.
2.3 Assumptions for Two Way ANOVA
 The population must be close to a normal distribution.

 Samples must be independent.
 Population variances must be equal (i.e. homoscedastic).
 Groups must have equal sample sizes.
Section Three: MANOVA
3.1 Definition of MANOVA
29
MANOVA is just an ANOVA with several dependent variables. It‟s similar to many other
tests and experiments in that its purpose is to find out if the response variable (i.e. your
dependent variable) is changed by manipulating the independent variable. The test helps to
answer many research questions, including:
 Do changes to the independent variables have statistically significant effects on

dependent variables?
 What are the interactions among dependent variables?
 What are the interactions among independent variables?
3.2 MANOVA Example
Suppose you wanted to find out if a difference in textbooks affected students‟ scores in math
and science. Improvements in maths and science means that there are two dependent
variables, so a MANOVA is appropriate.
An ANOVA will give you a single (univariate) F-value while a MANOVA will give you a
multivariate F value. MANOVA tests the multiple dependent variables by creating new,
artificial, dependent variables that maximize group differences. These new dependent
variables are linear combinations of the measured dependent variables.
3.3 Interpreting the MANOVA results
If the multivariate F value indicates the test is statistically significant, this means that
something is significant. In the above example, you would not know if math scores have
improved, science scores have improved (or both). Once you have a significant result, you
would then have to look at each individual component (the univariate F tests) to see which
dependent variable(s) contributed to the statistically significant result.
3.4 Advantages and Disadvantages of MANOVA vs. ANOVA
Advantages
1. MANOVA enables you to test multiple dependent variables.

2. MANOVA can protect against Type I errors.
30
Disadvantages
1. MANOVA is many times more complicated than ANOVA, making it a challenge to

see which independent variables are affecting dependent variables.
2. One degree of freedom is lost with the addition of each new variable.
3. The dependent variables should be uncorrelated as much as possible. If they are
correlated, the loss in degrees of freedom means that there isn‟t much advantages in
including more than one dependent variable on the test.
Section Four: Factorial ANOVA
4.1 Definition
A factorial ANOVA is an Analysis of Variance test with more than one independent variable,
or “factor“. It can also refer to more than one Level of Independent Variable. For example,
an experiment with a treatment group and a control group has one factor (the treatment) but
two levels (the treatment and the control). The terms “two-way” and “three-way” refer to the
number of factors or the number of levels in your test. Four-way ANOVA and above are
rarely used because the results of the test are complex and difficult to interpret.
 A two-way ANOVA has two factors (independent variables) and one dependent
variable. For example, time spent studying and prior knowledge are factors that affect
how well you do on a test.
 A three-way ANOVA has three factors (independent variables) and one dependent
variable. For example, time spent studying, prior knowledge, and hours of sleep are
factors that affect how well you do on a test
Factorial ANOVA is an efficient way of conducting a test. Instead of performing a series of

experiments where you test one independent variable against one dependent variable, you can
test all independent variables at the same time.
4.2 Variability
In a one-way ANOVA, variability is due to the differences between groups and the
differences within groups. In factorial ANOVA, each level and factor are paired up with each
31
other (“crossed”). This helps you to see what interactions are going on between the levels and
factors. If there is an interaction then the differences in one factor depend on the differences in
another.
Let‟s say you were running a two-way ANOVA to test male/female performance on a final
exam. The subjects had either had 4, 6, or 8 hours of sleep.
 IV1: SEX (Male/Female)

 IV2: SLEEP (4/6/8)
 DV: Final Exam Score
A two-way factorial ANOVA would help you answer the following questions:
1. Is sex a main effect? In other words, do men and women differ significantly on their
exam performance?
2. Is sleep a main effect? In other words, do people who have had 4,6, or 8 hours of sleep
differ significantly in their performance?
3. Is there a significant interaction between factors? In other words, how do hours of
sleep and sex interact with regards to exam performance?
4. Can any differences in sex and exam performance be found in the different levels of
sleep?
4.3 Assumptions of Factorial ANOVA
 Normality: the dependent variable is normally distributed.

 Independence: Observations and groups are independent from each other.
 Equality of Variance: the population variances are equal across factors/levels.
Section Five: Comparison between ANOVA and Student t - test
A Student‟s t-test will tell you if there is a significant variation between groups. A t-test
compares means, while the ANOVA compares variances between populations.
You could technically perform a series of t-tests on your data. However, as the groups grow in
number, you may end up with a lot of pair comparisons that you need to run. ANOVA will
32
give you a single number (the f-statistic) and one p-value to help you support or reject the null
hypothesis.
33
TOPIC FIVE
CORRELATION AND REGRESSION ANALYSIS
Section One: Correlation Analysis
Section Two: Regression Analysis
Section Three: Comparison between Correlation Analysis and Regression Analysis
34

Quantitative Analysis Lectures Notes-2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quantitative Analysis Lectures Notes-2

Uploaded by

Copyright:

Available Formats

Quantitative Analysis

Section One: Definition and Importance

Section Two: Types of Data

2.1 Cross-sectional data

Cross-sectional data, or a cross section of a study population, in statistics and econometrics is

2.2 Time series data

2.3 Panel data

Section Three: Types of analysis

Research is a systematic investigation that aims to generate knowledge about a particular

3.1 Quantitative research

3.2 Qualitative research

Section four: Methods of Data collection

4.1 Methods of collecting quantitative data

• Proportions/percentages: proportion of the community that has access to a service

• Ratios: ratio of midwives or traditional birth attendants to families in a region

4.2 Methods for collecting qualitative data

Focus group discussions

A focus group discussion is an organised discussion between 6 to 8 people. Focus group

Section Five: Data Analysis Methods

5.1 Quantitative Data Analysis Methods

Mean: numerical average of a set of value

Median: midpoint of set of numerical values

5.1.2 Inferential Statistics

5.2 Analysing Qualitative Data

3. Developing a framework also known as coding or indexing. Here the researcher

Section One: Definition

A hypothesis can be defined as a tentative answer to a research problem. It is an assumption

Section Two: Types of hypotheses

There are basically two types of hypotheses, namely:

 The null hypothesis

Section Three: Types of errors in hypothesis testing

In data analysis practice, significance levels of 10%, 5% and 1% are customary

Section Four: Two tails test and one tail test

When we formulate hypothesis about a true population parameter ( ) and hypothesised

a) versus (two tails test)

In fact, the way the alternative or directional hypothesis is expressed is important in

Figure 1: Two tailed test acceptance and rejection zone

Figure 2: One tailed test acceptance and rejection zone

Section Five: Steps in hypothesis testing

The steps or procedure in testing hypothesis include the flowing:

Step 2: choose the level of significance

Step 3: choose the appropriate test statistics

THE CHI SQUARE DISTRIBUTION

Section One: Chi square test of goodness of fit

The Chi square test statistics is given by,

With degree of freedom (df) = K – 1

O is observed frequency; E is expected frequency, K is number of categories or modalities.

Reasons Prestige Availability of Ease of passing Job TOTAL

Section Two: The Chi square test of association or independence

Manual employee Executive positions TOTAL

Section Three: The Contingency Coefficient

It is based on the chi-square statistic, and is defined by:

 χ2 is the chi-square statistic,

The contingency coefficient helps us decide if variable b is „contingent‟ on variable a.

Cramer‟s V coefficient is calculated as follows:

Compute the contingency coefficient and Crammer‟s V of the previous example.

THE STUDENT T TEST

Section One: One sample t test

Data is collected randomly (using random sampling for example)

The data is approximately normally distributed

The t-statistics is computed using the following formula:

̅ is the sample mean

is the hypothesised mean (population mean for example)

the standard deviation