You are on page 1of 11

Conflict Analysis Using Bayesian Neural Networks and Generalized Linear Models

Author(s): N. Iswaran and D. F. Percy


Source: The Journal of the Operational Research Society, Vol. 61, No. 2 (Feb., 2010), pp.
332-341
Published by: Palgrave Macmillan Journals on behalf of the Operational Research Society
Stable URL: https://www.jstor.org/stable/40540256
Accessed: 18-05-2019 11:50 UTC

REFERENCES
Linked references are available on JSTOR for this article:
https://www.jstor.org/stable/40540256?seq=1&cid=pdf-reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

Operational Research Society, Palgrave Macmillan Journals are collaborating with JSTOR
to digitize, preserve and extend access to The Journal of the Operational Research Society

This content downloaded from 193.19.172.190 on Sat, 18 May 2019 11:50:43 UTC
All use subject to https://about.jstor.org/terms
Journal of the Operational Research Society (2010) 61, 332-341 © 2010 Operational Research Society Ltd. All rights reserved. 0160-5682/10 ~tf-

www.palgrave-journals.com/jors/

Conflict analysis using Bayesian neural networks


and generalized linear models
N Iswaran and DF Percy*
University of Salford, Greater Manchester, UK

The study of conflict analysis has recently become more important due to current world events. Despite
numerous quantitative analyses on the study of international conflict, the statistical results are often inconsistent
with each other. The causes of conflict, however, are often stable and replicable when the prior probability
of conflict is large. As there has been much conjecture about neural networks being able to cope with the
complexity of such interconnected and interdependent data, we formulate a statistical version of a neural
network model and compare the results to those of conventional statistical models. We then show how to
apply Bayesian methods to the preferred model, with the aim of finding the posterior probabilities of conflict
outbreak and hence being able to plan for conflict prevention.
Journal of the Operational Research Society (2010) 61, 332-341. doi:10.1057/jors.2008.183
Published online 4 March 2009

Keywords: Bayesian inference; conflict analysis; generalized linear models; neural networks

Introduction Further to this, other researchers have used artificial neural


network models and have applied Bayesian methodology
War and conflict can take on many different forms. Civil war
to enhance the results. Gass (1994) invoked multi-criteria
in particular can last for decades, with frequent occurrences
decision analysis to compute crisis and conflict potentials
of war crimes. To name a few, there was the conflict between
under simultaneous consideration of all other countries in
Catholics and Protestants in Northern Ireland that lasted for
a region. However, most work has concentrated on interna-
38 years, the civil war in Sri Lanka between the government
tional conflict between two countries (dyads). Smith (1996)
and the rebel group, the Tamil Tigers, which has been raging
used a bivariate ordered discrete choice model with Bayesian
for 24 years, and the violent conflict in Uganda, between
methods in order to model the level of force used by each
the Lord's Resistance Army, the Ugandan government and
country in response to one another and other explanatory
northern Ugandans. The last of these started in 1986 and with
variables. Gurr and Moore (1997) did model ethno-political
a ceasefire there having recently ended, it is difficult to predict
conflict by developing a system of four equations in order
for how much longer the troubles could last.
to estimate which factors were significant in leading to an
The 'groups' described here are communal, sometimes
outbreak of conflict. The subjects of these equations were
known as 'ethnic' or 'rebel' groups. These are groups defining
factors that they considered worked interdependently to
themselves as having some combination of common descent,
determine levels of rebellious conflict - namely rebellion,
shared historical experience and perhaps valued cultural traits
mobilization, grievances and repression. The equations were
(Gurr and Moore, 1997). They make claims from their collec-
estimated using a three-stage least squares estimator, which
tive interest against the state or other groups. It is also worth
was successful in showing the statistical significance of
stating that ethno-political conflict is not usually one sided:
certain independent variables. However, it would be more
another side usually counters the claim or action of any one
desirable to have one equation with rebellion as the depen-
side. The books by Cheldelin et al (2003) and Wheeler (2000)
dent variable in order to use these significant variables in
provide further descriptions of these and related issues.
attempting to forecast a rebellion, so that preventative or
There has been much research carried out in the subject of
counter-insurgent measures could be put in place.
conflict analysis using statistical methods; see the review by
Most commonly used in modelling conflict are logistic
Donohue (2007) for example. In particular, some researchers
regression models, as stated by Beck et al (1998, 2000) and
have used logistic regression models to analyse conflict.
Parwez (2006). Parwez uses a logistic regression model to
analyse the conflict in Nepal and explain the factors that
correlate with it, so that another outbreak might be predicted
* Correspondence: DF Percy, Centre for Operational Research and
Applied Statistics, Salford Business School, University of Salford, Greater
in advance. The output his model produced included easy-
Manchester M5 4WT, UK. to-read statistics (z and P values) for each of his explana-
E-mail: d.f.percy@ salford.ac.uk tory variables, so it was easy to determine the significant,

This content downloaded from 193.19.172.190 on Sat, 18 May 2019 11:50:43 UTC
All use subject to https://about.jstor.org/terms
N Iswaran and DF Percy - Conflict analysis 333

potentially causal factors. As he was using data from a current Data and methodology
conflict, it would have been much easier to find a relationship
The data used in the analysis are from the Minorities at Risk
between conflict outbreaks and specified causal factors, as any
project; see references for the website. This is an academic
relationship would just mimic the state of events in Nepal at
research project based at the University of Maryland, which
that time. Furthermore, modelling for just one conflict in one
Gurr established in 1986. It 'monitors and analyses the
country would likely eliminate much of the collinearity and
status and conflicts of politically active communal groups in
interdependence that make modelling conflict generically so
all countries with a current population of at least 500,000'
complex. It would be useful to be able to construct a model
from 1945 to 2003. The project contains data on 284 polit-
that we could apply to many places at risk of war.
ically active ethnic groups. Its definition of a 'politically
There is, though, a problem with logistic regression. Beck
active ethnic group' is as follows: 'a group being discrim-
et al (1998) state that these statistical models can predict
inated against (advantaged or disadvantaged) compared to
very accurately when the outcome is no rebellion (binary
other groups in the same area/country [with] the means and
outcome of 0) but are virtually unable to predict an outbreak
motivation to take action (ie rebel)'.
of rebellion (binary outcome of 1). In most data sets the In order to create a model that can be used to forecast
vast majority of outcomes correspond with no rebellion, so
levels of rebellion for any such group, we use all available
perhaps this is the reason for the poor positive predictive
groups in the data set, across all continents and countries.
performance - logistic regression models can be sensitive to
Although serial trends might provide useful information for
these types of design. One way to address this issue might
assessing threat, they take the form of second-order effects.
be to replace the logit link with a different link function.
We focus on the first-order effects, which use simultaneous
Specifically, the complementary log-log link is beneficial for
explanatory variables to forecast the potential for rebellion in
applications when the outcomes are predominantly zeroes
the near future. This coincidentally avoids any bias that could
(McCullagh and Neider, 1989). However, there appears to
otherwise arise due to the varying quality of available input
be no published work using this link function for conflict
data with respect to time. Our chosen year for input data is
analysis. Instead, neural network models have been widely
2000, as this is the most recent year for which suitable output
accepted as a better model for conflict prediction, due to data are available.
their flexibility. Neural networks are able to cope with the
For our modelling, we calculate the output rebellion
non-linear, interdependent, interactive and context dependent
index, also taken from the Minorities at Risk data set, as
(some factors may only be significant in certain regions) data
the maximum value between the years 2000 and 2002. The
often associated with conflict analysis, that statistical models
reason for this is that a rebellion as a result of specific factors
cannot cope with (Beck et al, 2000). Moreover, it is impor-
will most likely not happen immediately, as it will take time
tant to be wary about assuming causality from correlations,
to group together, mobilize, plan and gather the resources
though the use of a validation data set alleviates this problem
needed. We choose 2 years as an appropriate timeframe in
somewhat. Using a neural network, Beck et al (1998) were
which to measure the response from the explanatory variables
able to correctly predict an outbreak of conflict 17% of the
and therefore assume that the maximum rebellion value in the
time for dyads of countries.
2 years following the input data forms the response rebellion.
We now propose to build on the work already done by
As stated above, the Minorities at Risk data set contains
extending the output variable to an ordinal scale of increasing
only groups that are at risk of conflict. Therefore, their prior
levels of rebellion and applying the methods that we develop
probabilities of conflict are large which makes the explana-
to data from the Minorities at Risk project (Gurr, 2000). Our
tory variables more significant and stable. For most groups a
choice of output rebellion corresponds to that of rebel (or
change in an input variable would not increase their chances
minority) groups; this type of rebellion is commonly referred
of rebelling if they were not already at risk of doing so (Beck
to as ethno-political rebellion. In order to reduce the risk of
et al, 1998). Table 1 describes the output rebellion index for
predictive error, we use those communal groups which are
considered by the Minorities at Risk project as being 'at risk'
of rebelling, as the probability of conflict is large for these
groups and so the data set is not diluted by groups at peace Table 1 Rebellion index categories
within their country. We also compare appropriate statistical Value Rebellion level
models and neural networks, in order thoroughly to assess the
0 None reported
difference in their predictive performance when an ordinal
1 Political banditry, sporadic terrorism
scale is used. We then show how to apply these models 2 Campaigns of terrorism
within a Bayesian framework to obtain posterior predictive 3 Local rebellions
probability distributions for conflict outcomes. These serve to 4 Small-scale guerrilla activity
improve the predictive accuracy of our best-fitting models, in 5 Intermediate guerrilla activity
6 Large-scale guerrilla activity
the absence of sufficient data for estimating unknown param- 7 Protracted civil war
eters efficiently.

This content downloaded from 193.19.172.190 on Sat, 18 May 2019 11:50:43 UTC
All use subject to https://about.jstor.org/terms
334 Journal of the Operational Research Society Vol. 61, No. 2

our analyses. The scale runs from 0 to 7, with increasing scale, whereby intervals and ratios are both measurable and
levels of rebellion. comparable. In order to assess the five methods considered,
After careful consideration of all the variables available we use the fitted models to make predictions for the 16% test
in the Minorities at Risk data set, we decided to use the data, equivalent to 45 observations, which we do not use for
following explanatory variables: region, strength of group training and validation. However, due to missing data on the
identity, group (geographic) concentration, demographic explanatory variables, only 43 of these observations are actu-
stress, economic discrimination, political discrimination and ally available to us for prediction.
three measures of state repression. For the region vari- If necessary, we map these predictions on to the scale of
able, the world is coded into six geo-political regions: the rebellion index categories defined in Table 1. Finally, we
Western Democracies and Japan; Eastern Europe and the compare these allocations y to the actual observations y. We
former Soviet Union; Asia and the Pacific; North Africa and are interested in whether our predictions match the observed
the Middle East; Sub-Saharan Africa; Latin America and rebellion categories and might progressively wish to penalize
the Caribbean. The three measures of repression are: few imperfect predictions according to the symmetric absolute
group members arrested (repl); limited use of force against distance metric 'y - y', though we could develop an asym-
protesters (rep2); military campaigns against armed rebels metric metric if necessary. Consequently, a standardized,
(rep3). Strength of group identity and demographic stress are general measure of predictive accuracy is a power-law loss
measured on continuous scales, while all the other variables function of the form
are measured on ordinal scales, except for region which is
measured on a nominal scale. / I v - yl '*
We chose this set of explanatory variables because of their ^ 'msLx'y-y'J
fy(y,y) V,
^ = 'msLx'y-y'J ( / I V v - yl , €[0,i] (i)
proven significance in previous research. Gurr and Moore
for specified constant </> > 0. Here y is the actual observed
(1997) used all the variables except region in their statistical
rebellion level of Table 1, y is the predicted rebellion level
model and showed their significance. Gurr (2000) uses democ-
according to the active model, and the denominator is the
ratization and nearby regional conflict as a positive force on
maximum observable value of the distance metric, which
rebellion and our analyses assume that these can be summa-
satisfies max 'y - y' =1 for our application.
rized into the single variable 'region'. We analyse the data
The function in Equation (1) represents a linear loss
using five different models, three of which are conventional
if <'> = 1 and a quadratic loss if (f) = 2. Progressively as
statistical models and two of which are neural networks.
(j) -> oo, the loss function attaches relatively more weight
The statistical models are: a general linear model, an ordinal
to larger discrepancies, culminating in the limiting form in
logistic regression (proportional odds) model and a two-stage,
Equation (2).
hierarchical logistic regression model. For the last of these,
the first stage involves a binary response variable to represent
i;™ lim
i;™ limiMy,
/ i* /,a^ =i*1° 'y-y'<suP'y-y'
" y) ,a = ' 1 .* . .* n'
. (2)(2)n'
conflict and the second involves an ordinal response variable 0_>oo iMy, ^ " y) ' [I 1 .* 'y- y' . = sup'y- .* y' .
for levels of conflict among those that forecast conflict at
For the other limiting extreme, we have
the binary stage. As we restrict attention to data representing
groups at risk of rebellion, there is no need to consider link
functions other than the logit link for the two generalized
linear models; the general linear model uses an identity link
Ä^'H? '7y (3)
by definition. Of the neural networks, one has a continuous
which penalizes all discrepancies equally severely. We
response and the other has a categorical response, though theconsider this simple binary loss function to be the most
latter neural network algorithm is unable to allow for the extra
appropriate measure for our conflict analysis, though we
knowledge that its measure is actually on an ordinal scale.acknowledge that other forms provide valuable information
For all five models, we split the data into a 'training set'
too.

and a 'test set' to match the way that the data set is dividedGiven n predicted values summarized in a vector y, corre-
by the neural network software we use (Alyuda Neurolntelli- sponding to n actual observed values summarized in a vector
gence). In total, there are 281 observations in the data set. For
y, we also define the sample mean loss function
all neural networks, this software randomly splits the data set
into training, validation and test sets with the ratios 68:16:16, I n
respectively. Therefore, in order to compare all the models fy(y. y) = -Hl4>$h yj) € [°. n w
7=1
and predictions robustly and consistently, we fit the statistical
models to 84% of the data selected at random, equivalentastoan overall unbiased estimator of predictive accuracy. For a
236 observations. This corresponds to the training and vali- test set of data comprising n cases, we can then evaluate an
dation sets that the neural networks use. In order to reduce
estimate for the proportion of predictions that are correct as
the number of parameters, we regard the ordinal repression
variables as continuous because they take values on a ratio 0=l-/o(y,y) (5)

This content downloaded from 193.19.172.190 on Sat, 18 May 2019 11:50:43 UTC
All use subject to https://about.jstor.org/terms
N Iswaran and DF Percy - Conflict analysis 335

based on the binary loss function in Equation (3). Similarly, Table 3 Predictive accurac
we can evaluate an estimate for the mean absolute difference Observed Predicted Total
between all observed and predicted values as
0 12 3 4 5 6 7

S = l'(y,y)max'y-y' (6) 0 16 9210000 28


1 12 0 1110 0 6
using the linear loss function with (/> 2
= 1 in Equation (1).
00000010 1
3 00000001 1
General linear model (GLM) 4 00010001 2
5 00011000 2
Equation (7) defines the general linear model for a continuous 6 00000200 2
7 00001000 1
response variable Y with realized value y.

y = x'ß + £ (7)
</>
The linear predictor for Y is x'ß, where = 1,
x is an for
observ- t
0.153.
able vector of explanatory (predictor) variables specific Based to o
an estimated mean absolute difference between all observed
each response and ß is an unobservable vector of parame-
and predicted
ters (regression coefficients). There values of
is also ò % 1.07 fromterm
a residual Equation (6). We
interpret this as the model's
s for each response, the residuals which we usually assume ability to predict about one level
out on average.
to be independent normal random variables with zero mean
and constant variance. This model can only be an approxima-
Ordinal logistic
tion for our analysis, as our response variableregression (OLR)
is ordinal with
the eight categories identified in Table 1. However, it offers
For a single observation, we define the ordinal response vector
a simple approach that provides a useful benchmark against
which to assess the performance of other models. We generate
a prediction for this model using the fitted mean response x'ß,
where ß is the vector of maximum likelihood estimates for
: (8)
the unknown regression coefficients. Such predictions can be zj
any real numbers and so we categorize the continuous output
whose realized values satisfy
into the relevant bins, as specified in Table 2.
The general linear model correctly predicts 18 of the 43
observations, corresponding to 6 % 0.42 in Equation (5) or a
z--i° y+i (9)
42% success rate. Table 3 illustrates the spread for the predic-
with probabilities
tions in each rebellion level; bold font indicates perfect predic-
tions (and in Tables 4, 6, 7, 8). It is clear ni = P(Zi
that the = vast
l) = P(Y = i) (10)
majority
of observed rebellion categories are level 0 and the model
for i = 0, 1 , . . . , 7, subject to
correctly predicts only 16 out of these 28. The other correct
predictions relate to only two of the six level7 1 rebellions - the
model does not predict any of the other rebellion values
correctly. This lack of spread is because the i=0
test set has many
more zero response observations
andthan any other output. To
enable further comparisons between models, 7 the sample mean
linear loss function, defined by Equations (1) and (4) with
J2ni = l (l2)
/=0

Table 2 Discrete categorization of continuous predictions


Similarly, we define an ordinal logistic regression model for
Fitted mean response Predicted rebellion level n observations by the joint probability mass function of a
multinomial distribution
x'ß < 0.5 y = 0
0.5 ^ x'ß < 1.5 y='
1.5<x'ß<2.5 y = 2 PV)=n'''-^ Zi =0,1, ...,n (13)
2.5sCx'ß<3.5 y = 3
3.5^x'ß<4.5 y = 4 subject to
4.5^x'ß<5.5 y = 5
7
5.5^x'ß<6.5 y = 6
x'ß ;* 6.5 y = 1
/=0

This content downloaded from 193.19.172.190 on Sat, 18 May 2019 11:50:43 UTC
All use subject to https://about.jstor.org/terms
336 Journal of the Operational Research Society Vol. 61, No. 2

and the constraint in Equation (12). For both models, the logit we note from
link functions (McCullagh and Neider, 1989) are some very poo

log-^-^x'p,.
1 - (Oi
(15) Hierarchical

This two-stage
for / = 0, 1 , . . . , 7, where the co, represent the cumulative
of response o
probabilities in Equations (16).
arises for the
(Do = 7T() data set and m
oj' =no-'- n' the actual lev
rebel. An appr
logistic regress
CD7 = 7lo + TTi + . . . + 7I7 = 1 (16) where rebellio
model analysi
These link the category probabilities to a linear predictor x'ß, ,
rebellion whe
where x and ß, are vectors similar to those used in the general
linear model. we prefer to u
second stage, r
We estimate the individual probabilities of category
assumptions a
membership using the method of maximum likelihood, by
predictions th
inverting the link function of fitted linear predictors x'ß, in
isactually equi
Equation (15) to evaluate the parameter estimates co, and
regression) m
hence the ñ¡ from Equations (16) for i = 0, 1, . . . , 7. For
of constraints
consistency with other studies and to avoid unnecessary
ß0 is entirely d
over-complication, we restrict this ordinal logistic regression
model to a special case known as the proportional odds
Stage one
model. This constrains all of the ßf vectors to be equal
for i = 0, 1 , . . . , 7 except for the first component in each, The first stag
corresponding to different intercepts but equal slopes. or no conflict
This ordinal logistic regression model predicts more accu- zeros and one
rately than the general linear model, with 24 correct predic- phase. We defi
tions out of the 43 observations, corresponding to 6 « 0.56 or probability of
56% accuracy. It does have a more even spread with correct has a Bernoull
predictions in the level 6 and level 7 categories, as well as
correctly predicting the majority in the level 0 category and
p(z) = 7lz(i-ny-z, z = 0, i (17)
two of the level 1 category. Table 4 shows all the results
and logit link function
by level. Although this model predicts more observations
correctly and for a broader spread of categories than does the
general linear model, the estimated mean absolute difference log 1j^-- 17T
= xß
1 => n = - 1
+ exp(-x'ß)
between all observed and predicted values is ô « 1.51, which
is 41% greater than for the general linear model. We inter-where x'ß is a linear predictor involving vectors
that are similar to those used in the general linear
pret this as the ordinal logistic regression model's ability to
order to predict no conflict and select suitable dat
predict about one and a half levels out on average. In passing,
two analysis, we assign the categories according to
model using the rule in Table 5. This stage one, bina
regression model correctly predicts 31 out of 43 obs
Table 4 Predictive accuracy of ordinal logistic regression model
corresponding to 72% correct, spread evenly betwe
Observed Predicted
categories. If z = 0, we predict a rebellion level o
0 12 3 4 5 6 7 If z = 1, we predict a rebellion level of y e {1, 2,
according to stage two.
0 20 3003002 28
1 2 2 0 10 0 0 16
2 000000011 Table 5 Predictive allocations for stage one of th
3 10000000 1
4 100001002 Estimate Category
5 100000012
6 100000102 n<0.5 z = 0
7 000000011 n>0.5 2=1

This content downloaded from 193.19.172.190 on Sat, 18 May 2019 11:50:43 UTC
All use subject to https://about.jstor.org/terms
N Iswaran and DF Percy - Conflict analysis 337

Stage two Table 6 Predictive accuracy of hierarchical logistic regression


model
The second stage of this hierarchical model predicts the level Observed Predicted Total
of rebellion y = 1 , 2, . . . , 7 for each of the observations with a
0 12 3 4 5 6 7
fitted response of £ = 1 from the first stage. Except for having
one fewer category, it takes the same form as the ordinal 0 23 5000000 28
logistic regression model described above. As before, we 1 23000001 6
impose constraints on the regression coefficients that corre- 2 01000000 1
3 10000000 1
spond with a proportional odds model. In order to proceed, we
4 01000001 2
must re-define the probabilities of Equation (10) in the form 5 20000000 2
6 10000001 2
m = P(Zi = l'Z=') = P(Y = i'Y ¿ 0) (19) 7 10000000 1

for / = 1, 2, . . . , 7. We can then evaluate estimates for the


marginal probabilities of rebellion level membership using
Table 7 Predict
the law of total probability and the multiplication law
Observed Predicted Total
P(Y = i) = P(Y = i H Y = 0) + P(Y = i H Y yé: 0)
0 12 3 4 5 6 7
= 0 + P(Y = i'Y ¿ 0)P(Y # 0) = m n % ft/ ft
(20) 0 28 1000000 29
1 60000000 6
by estimating the parameter vectors in Equation (18) and the 2 10000000 1
3 10000000 1
equivalent of Equation (15). Hence, for each of the groups
4 10010000 2
in stage two, we multiply the corresponding estimated proba-
5 00000010 1
bility from the binary model in stage one by the corresponding 6 10001000 2
estimated level probability from the ordinal model in stage 7 00000100 1
two, to estimate an overall probability for each of the seven
positive rebellion levels. This second stage model does not
predict well in isolation, because few data are available to fit
models conside
the model and even fewer to assess the model's performance.
tion for the in
Indeed, it only predicts correctly three out of the eight groups
nodes to the o
passed on from the first stage. These three all correspond to
bounded outpu
rebellion level y = 1, which suggests poor predictive power we have here. This activation function takes the form
for other categories.

Overall results for hierarchical model g(z) = , 1 , 1 , , (21)


, 1 + , exp(-z) , ,

However, the important measures of predictive accuracy relate The whole process of fitting a model to data and then testing
to the aggregate performance of this hierarchical model based its ability to predict is much quicker and simpler with neural
on stages one and two, rather than on the individual contribu- network software, as it splits the data into training and test
tions of the two stages. Overall, the two-stage model predicts sets and makes predictions automatically. However, as this
rebellions fairly well, with the best accuracy so far of 9 ^ 0.60 network yields continuous results, we categorize these outputs
or 60%, corresponding to 26 out of the 43 cases. On further manually. We opt for a network with continuous outputs in
inspection, we note that it predicts no rebellion much better order to preserve the ordinal scale of measurement, whereas a
than it predicts a positive rebellion, as illustrated in Table 6. network with discrete outputs only provides a nominal scale of
The sample mean absolute difference between all observed measurement and so ignores the natural ordering of categories.
and predicted values is ö ~ 1.75, which is worse than for the We present the results from the predictive test set in
Table 7. There is one more observation of rebellion level 0
general linear model and ordinal logistic regression. Again, it
than for the statistical models and one fewer of rebellion level
is clear that there are some very poor predictions.
5, due to the way the neural network software quasi-randomly
splits the data. However, the numbers of observations in the
Continuous neural network (CoNN)
test set categories are sufficiently similar to those used by
Lee (2004) describes this general approach, dividing contin- the statistical models, so that the two broad approaches are
uous output variables into bins, in order to categorize them in directly comparable.
a similar manner to our structure in Table 2. The procedure The predictions are very one sided: the continuous neural
defines and models explanatory variables as quantitative or network correctly predicts all but one of the level 0 rebel-
qualitative using the same assumptions as for the statistical lion outputs but does not correctly predict any observation in

This content downloaded from 193.19.172.190 on Sat, 18 May 2019 11:50:43 UTC
All use subject to https://about.jstor.org/terms
338 Journal of the Operational Research Society Vol. 61, No. 2

the other seven rebellion categories. We therefore conclude classifies 73% of the test groups with 6 ^ 0.73. Moreover, this
that this type of model and analysis might be accurate for approach predicts rebellions more accurately over a greater
predicting instances of no rebellion, but not at all useful for range of categories, in terms of overall allocations. This is
predicting the actual levels of true rebellions. Overall, the reflected in terms of the average absolute distance error for
continuous neural network has 9 % 0.65 corresponding to a this analysis, which at õ % 0.51 is the least of all the models
65% success rate at predicting, with little observed spread we consider.

over the different categories. Although it is not able to predict


any of the positive rebellions correctly, it does have an average
Comparison of results based on cost implications
absolute distance error of ô ~ 0.65, which is much less than
those of the statistical models. Hence, although the predic- When choosing the best model for classification, it is impor-
tant to consider all of the available measures. As well as
tions are not necessarily correct, they are not too far off
the true classification. However, it is important to note that general errors of prediction, such as the simple forms defined

simply by predicting all cases as rebellion level 0, we would by Equations (1) and (4), one should consider the specific
observe even better results here: 6 ^ 0.67 and ô ^ 0.49. This costs associated with inaccurate predictions by the models.
In order to analyse our results and put them in context, we
is because borrowing medical terminology, the specificity of
attempt to derive a hypothetical scale to represent better the
this diagnostic test is very high at the expense of very low
costs associated with under-prediction and over-prediction of
sensitivity (power to detect likely rebellions).
rebellion categories.
We now propose a scale for under-prediction in terms of
Categorical neural network (CaNN)
lives lost. If the scale of a rebellion has been under-estimated,
We selected this approach as a suitable analysis to take there will be insufficient measures put in place to prevent the
account of the fact that the rebellion output is categorical. uprising. Such measures might take the form of peacekeeping
However, we cannot specify this discrete output as ordinal in interventions to prevent loss of life. Consequently, there will
the neural network software available, so we have to treat it most likely be more deaths than our models forecast. Although
as though it were a nominal response in the terminology of our scale is hypothetical, we use data pertaining to intra-state
statistical modelling. Naturally, this ignores some informa- rebellions from the Correlates of War research project website
tion that might reduce the power of our analysis. This model to ensure that the figures used are realistic. This information
has the same logistic activation function for the input to the enables us to associate specific numbers of deaths d¡ to partic-
hidden nodes and from the hidden nodes to the output nodes, ular types of rebellion i. The number of deaths takes into
as the one we used for the continuous model and defined in account loss of life from all sides involved in the rebellion.
Equation (21). However, it differs because we now measure Table 9 presents the scale we propose for under-prediction.
rebellion level on a nominal scale rather than on a ratio scale. We express our corresponding scale for over-prediction in
As for the continuous neural network, the software auto-terms of billions of US dollars cost. In practice, if we predict a
matically splits the data set into training, validation and testparticular level of rebellion i , then the national or international
sets, and makes predictions automatically after training. Since community spends a certain amount of money c¡ on measures
we must specify the output as unordered categorical, we need to counteract it. If we predict a rebellion level too highly, then
perform no manual adjustments to map continuous outputs the community will waste money tackling a lesser problem
onto particular rebellion levels. The method of modelling thethan anticipated. This makes the reasonable assumption that
data and testing their predictive ability is thus straightforwardhigher levels of rebellion require more money to implement
and we can evaluate results very quickly. For our applica- counteractive procedures. In order to make our scale real-
tion, Table 8 demonstrates that this model predicts rebellions istic, we base the figures loosely on US defence spending
better than do any of the other models, as it successfully on foreign military aid and international peacekeeping, as
extracted from the Global Issues World Military Spending

Table 8 Predictive accuracy of categorical neural network


Observed Predicted Total Table 9 Number
0 12 3 4 5 6 7 Rebellion level i Number of deaths d¿

0 26 3000000 29 0 0
1 21000010 4 1 5000
2 10000000 1 2 15 000
3 10100000 2 3 83 000
4 10000000 1 4 95 000
5 10000100 2 5 125 000
6 00000020 2 6 150 000
7 00000000 0 7 210 000

This content downloaded from 193.19.172.190 on Sat, 18 May 2019 11:50:43 UTC
All use subject to https://about.jstor.org/terms
N Iswaran and DF Percy - Conflict analysis 339

Table 10 Amount of money spent to counteract different It is inappropriate to compare numbers of deaths with
rebellion categories fiscal costs, because this would necessitate inhumanely
Rebellion level i US $ billions c¡ assigning finite values to people's lives, so we consider
these losses separately. Taking all of the above measures
0 0.0
1 0.1 into account - percentage correct, spread, average distance
2 0.2 error, over-predictive costs and under-predictive deaths - the
3 0.5 categorical neural network seems to be a good choice. It
4 0.7
has a large proportion of successful predictions, a small
5 0.8
linear distance measure, a good spread of output values, rela-
6 1.0
7 2.0 tively small numbers of deaths due to under-prediction and
relatively small costs due to over-prediction. Therefore, we
conclude that the categorical neural network is a good model
for predicting levels of conflict.
Table 11 Av

Model Number of deaths Cost (US $ billions) Bayesian inference


GLM 5907 0.165
This approach to statistical inference will likely prove highly
OLR 10767 0.274
HLR 18 674 0.109
beneficial in conflict analysis because data are relatively
CoNN 12209 0.007 scarce, whereas subjective knowledge is plentiful. Moreover,
CaNN 9659 0.029 as evident from the last section, the framework is one of
decision analysis and Berger (1985) indicates that the optimal
approach for this purpose in the presence of stochastic uncer-
tainty makes use of these concepts. We now consider the
research project website. The reason f
feasibility of implementing a Bayesian analysis of the neural
this nation accounts for the vast major
networks in order to take into account all available informa-
worldwide, actually taking a 43% sha
tion so that our predictions and decisions are as accurate as
(Hellman, 2007). Table 10 presents th
they can be, though we reserve application of these ideas for
over-prediction.
future research.
For all five models, we calculate the
First, consider a continuous neural network. According to
of deaths per group due to under-pr
Lee (2004), we specify the single hidden layer feed-forward
mean cost per group due to over-predi
network used here as
the consequence of our models' predi
to help choose among these models. F
predicts a level 3 rebellion but the a
4, then this corresponds B-*+gi+Bp<-(,/+g->^)}+'1
with an<22)
unde
about ö?4 - d?> = where
12 000
y¡ is our deaths.
output of interest, k is the number ofSimilar
hidden
a level 4 rebellion but the actual rebe
nodes and r is the number of input nodes. The unknown
this corresponds with an over-predic
parameters are j90, ßj, yy0 and yjh for j = 1, 2, . . . ,k and
of about C4 - C3 = The
Ä = l,2,...,r. 0.2
model alsobillion US
contains residual error terms dol
two average losses
denoted by associated
s¡, which we assume to satisfy with an
the sample mean number of deaths d
and the sample mean cost
€¡ ~ N(0, o1), i =0, 1 due to over
Table 1 1 illustrates that there is subst
the results of Lee (2004)
our five also extends this
analyses. The m
has the smallest in which case
average we regard of
number y¡ as d
has a tendency to over-predict the tr
The two-stage model
y/ = (y«i,;y«2,...,:y,>)' (24)
(hierarchical log
largest average number of deaths (18
This effectively adds an extra dimension so that the outputs
its large distance yerror and
ig have the broader poor
specification predic
of the model. The other three models
around 10000 for this * measure. ß Rega
of over-prediction in
yis = ßog + J2US 7^ l - n $ - r^ billions, th
U 7^ l + exP {- n (yjo r^ + Hh='yjhXih)} vr
the lowest figures (0.007 continuous
because they havefor g = a
1 , 2, . tendency
. . , q, where to under
lion categories. The statistical models
of around 0.2 billion US dollars. C/ = fe-i,fi/2,...,^)/is5'^(O,E) (26)

This content downloaded from 193.19.172.190 on Sat, 18 May 2019 11:50:43 UTC
All use subject to https://about.jstor.org/terms
340 Journal of the Operational Research Society Vol. 61, No. 2

We generally assume the special case where £ = diag((j2, where w¡g are the continuous output predictions of the
o', . . . ,o2q), though this does not correspond with fitting neural network and p¡g are the corresponding probabilities
q independent univariate models, because of the common that corresponding observations belong in rebellion level g,
regression coefficients in Equation (25). for g = 0, 1, . . . , n. Lee (2004) further explains that this
approach retains the original normal model for classification,
Prior distribution
by assuming latent variables for each rebellion level response.
For objectivity, convenience and illustration, we adopt the We can use the continuous outputs of the neural network from
invariant prior (Jeffreys, 1998) as a suitable non-informative Equation (31) to assign a response prediction, by selecting
prior. Unlike other non-informative priors, such as the uniform
the level with the largest output value. Hence, we now have
distribution, this prior is invariant to transformations of the a posterior distribution for the parameters of our neural
parameters. For our categorical neural network, there are two
network model, which we can use to generate probabilities
of error. In turn, this allows us to model the uncertainty so
groups of unknowns: the ßjg parameters and the yjh param-
eters for j =0, 1, ...,fc, g = 1,2, ...,q and /z = 0, 1, . . . , r, we can generate predictions that are more accurate.
which we summarize in corresponding parameter vectors ß
and y. Conclusions

From Equations (25) and (26), we have a multi variate


The results of our earlier empirical analyses are summarized
normal model for the probability distribution /(y|X, ß, y) of
concisely in Table 12, which presents the percentages of true
a random vector y with an observed matrix X of predictor
negatives and false positives, for scenarios with rebellion level
variables, in terms of unknown parameter vectors ß and y.
0, and the percentages of true positives and false negatives,
For this scenario, Jeffreys' invariant prior has the form
for scenarios with rebellion level in the range one to seven.
S(ß,y)oc|/(ß,y)|1/2 (27) We also display this information in the receiver operating
characteristic curve of Figure 1, where specificity and sensi-
where
tivity refer to the estimated probabilities of a true negative
and a true positive, respectively. An ideal classifier for this
/(ß, y) = -£y|X,p,Y j ^-2 108 Z(ylX' P' V)} (28) binary setting would attain the point (0,1) on this graph and
the Euclidean distances from this ideal for our models are:
is Fisher's expected information matrix. However, for stability
GLM (0.43); OLR (0.49); HLR (0.50); CoNN (0.72); CaNN
reasons, both Jeffreys (1998) and Lee (2004) recommend
(0.51). This initially suggests that the general linear model
calculating prior distributions separately for each parameter is accurate, whereas the continuous neural network is not.
and assuming prior independence. Thus, our approach is to
However, this simplistic analysis involves only a small sample
determine prior distributions of the form in Equation (25)
and binary classifications.
for each of the parameters in our network and multiply them
Extending the existing binary methodology, our objective
together to form a joint distribution for the network model,
was to model a data set comprising an ordinal response assess-
which corresponds to the independence Jeffreys prior.
ment of rebellion and some related explanatory variables,
Posterior distribution using a variety of appropriate statistical models and neural
networks. The purpose is to assess which approach performs
Having established a joint prior distribution, the posteriormost accurately for predicting potential rebellion uprisings.
probability density function for the parameter vectors ß and Our investigations demonstrate that the categorical output
y takes the form neural network performs well according to several perfor-
mance measures. This result agrees with conjectures of other
g(ß,y|D)ocL(ß,y;D)£(ß,y) (29)
researchers in the field of statistical conflict analysis, that
in terms of the likelihood function neural networks offer a suitable approach for analysing this

L(ß,Y;D)oc]~[/(ylX'ß<"r) (30)
Table 12 Percentages of true and false classifications for
where the product is over all cases in the random sample of
rebellion levels 0 and 1-7
training data D. Although this posterior is not in explicit form,
Model Rebellion level 0 Rebellion level 1-7
as it is only determined to a constant of proportionality, it is
adequate for analytic purposes. True False True False
Lee (2004) explains how we can use this continuous negative positive positive negative
output approach for categorical outputs by exponentiating GLM 57 43 93 7
and normalizing the outputs using the standardizing transfor- OLR 71 29 60 40
mation HLR 82 18 53 47
CoNN 97 3 29 71
CaNN 90 10 50 50
ELoexP(^)

This content downloaded from 193.19.172.190 on Sat, 18 May 2019 11:50:43 UTC
All use subject to https://about.jstor.org/terms
N Iswaran and DF Percy - Conflict analysis 341

a certain (small) timeframe, but also with whatever data are


1.0 -r

available to them. Therefore, a Bayesian framework for the


/glm i model would greatly aid conflict analysis in such a situation.
The threat of conflict posed by rebel groups seems to be
continuing into the foreseeable future, with no permanent end
in sight for many of the ongoing conflicts (examples of which
were described in the Introduction). Quantitative methods
are needed to produce easily assessed evidence, with relative
speed. It is therefore of ongoing importance to investigate
I 0.5-

g /CaNN : : and produce methods to address and counter this threat.

References
TCoNN : ;
Beck N, King G and Zeng L (1998). The problem with quantitative
studies of international conflict. The Pennsylvania State University
CiteSeer Archives.
Beck N, King G and Zeng L (2000). Improving quantitative studies
0.0 '
of international conflict: A conjecture. Am Pol Sci Rev 94: 21-36.
0.0 0.5 1.0 Berger JO (1985). Statistical Decision Theory and Bayesian Analysis.
Springer- Verlag: New York.
1 -specificity
Cheldelin SI, Druckman D and Fast L (2003). Conflict: from
Analysis to Intervention. Continuum International Publishing
Figure 1 Receiver operating characteristic curve to assess binary
Group: London.
classification accuracy.
Cheng B and Titterington DM (1994). Neural networks: a review from
a statistical perspective. Statist Sci 9: 2-54.
Correlates of War research project, http://www.correlatesofwar.org/,
accessed 17 November 2008.
sort of complex and inte
Others consider the 'black box' nature of neural networks to Donohue WA (2007). Review essay - methods, milestones, and
models: State of the art in conflict analysis research. Negotiat J
be a drawback, as we cannot entirely specify the model used, 23: 487-497.
as discussed in Verbruggen and Babuska (1999). However, the Gass N (1994). Conflict analysis in the politico-military environment
purpose of these models, to be used in practice to predict levels of a new world order. J Opl Res Soc 45: 133-142.
Global Issues World Military Spending research project. http://www
of rebellion, including application by those out in the field,
. globalissues.org/Geopolitics/ ArmsTrade/Spending, accessed 17
would require ease of use, to generate reliable predictions November 2008.
quickly. Therefore, the neural network does seem appropriateGurr TR (2000). Peoples Versus States: Minorities at Risk in the New
here. Century. United States Institute for Peace: Washington.
Our study offers a direct comparison among competing Gurr TR and Moore WH (1997). Ethnopolitical rebellion: a cross-
sectional analysis of the 1980s with risk assessments for the 1990s.
approaches, using the same data set and basic assumptions
Amer J Pol Sci 41: 1079-1103.
about the data, whereas most other published work concen-
Hellman C (2007). U.S. Military Spending vs. the World. Center for
trates on either statistical models or neural networks; Cheng Arms Control and Non-Proliferation, Washington.
and Titterington (1994) discuss the similarities between these Jeffreys H (1998). Theory of Probability. Oxford University Press:
Oxford.
approaches. This study has also extended previous models by
Lee HK (2004). Bayesian nonparametrics via neural networks.
introducing a more detailed ordinal scale, rather than a simple
American Statistical Association and Society for Industrial and
binary response of rebellion or no rebellion. This is more Applied Engineering: Philadelphia.
relevant to the real-world situation, as it takes several forms McCullagh P and Neider JA (1989). Generalized Linear Models.
of rebellion into account and so predicts critical events more Chapman and Hall: London.
accurately. This would help to put in place the most appro- Minorities at Risk research project (2008). http://www.cidcm
.umd.edu/mar, accessed 17 November 2008.
priate, and therefore the most effective, countermeasures for
Parwez S (2006). An empirical analysis of the conflict in Nepal.
a potential rebellion. N.R.M. Working paper series 7, Asian Development Bank.
We also offer some initial ideas about how to apply Smith A (1996). A Bayesian method for the analysis of dyadic crisis
Bayesian methodology to our preferred model, the categor- data. Presented at Annual Meeting for the Peace Science Society,
Houston.
ical neural network, in order to calculate posterior probabil-
Verbruggen HB and Babuska R (1999). Fuzzy Logic Control:
ities of conflict and so improve further upon the accuracy
Advances in Applications. World Scientific: Singapore.
of predictions. These will form the basis of further research Wheeler NJ (2000). Saving Strangers: Humanitarian Intervention in
to implement this approach, as the neural network software International Society. Oxford University Press: Oxford.
available to us does not provide sufficient information to
compute the required matrices for analysis. Often, analysts in Received July 2008;
the field must respond to events and requests not only within accepted November 2008 after one revision

This content downloaded from 193.19.172.190 on Sat, 18 May 2019 11:50:43 UTC
All use subject to https://about.jstor.org/terms

You might also like