0 Up votes0 Down votes

35 views14 pagesJan 12, 2009

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

Attribution Non-Commercial (BY-NC)

35 views

Attribution Non-Commercial (BY-NC)

- A First Course in the Finite Element Method Solution Manual
- Paper Presentation Topics-cse
- D Separation
- Inference in Bayesian Networks
- Artificial Neural Networks
- Bayesian Network
- A Hybrid Classifier approach for Software Fault Prediction
- Using Bayesian belief networks to analyse social-ecological conditions for migration in the Sahel
- Understanding ANN
- Front Page
- techbyte
- Bayesian Neural Networks for Bridge Integrity Assessment
- GesHome_ProjRep
- Machine Learning for Soil Fertility and Plant Nutrient Management Using Back Propagation Neural Networks
- Analysis of Variance for Bayesian Inference
- FINDING_SPARSE_TRAINABLE_NEURAL_NETWORKS.pdf
- BayesianCourse_session13 (Ajuste Bayesiano)
- A Bayesian Nonlinear Model for Forecasting Insurance Loss Payments
- ANN by Muskan in Word Format
- ML.pdf

You are on page 1of 14

°

c 1999 Kluwer Academic Publishers. Manufactured in The Netherlands.

PETRI MYLLYMÄKI

Complex Systems Computation Group (CoSCo), Department of Computer Science, P.O. Box 26, FIN-00014,

University of Helsinki, Finland

Petri.Myllymaki@cs.Helsinki.FI, http://www.cs.Helsinki.FI/∼myllymak/

Abstract. We present a method for mapping a given Bayesian network to a Boltzmann machine architecture,

in the sense that the the updating process of the resulting Boltzmann machine model probably converges to a

state which can be mapped back to a maximum a posteriori (MAP) probability state in the probability distribution

represented by the Bayesian network. The Boltzmann machine model can be implemented efficiently on massively

parallel hardware, since the resulting structure can be divided into two separate clusters where all the nodes in one

cluster can be updated simultaneously. This means that the proposed mapping can be used for providing Bayesian

network models with a massively parallel probabilistic reasoning module, capable of finding the MAP states in a

computationally efficient manner. From the neural network point of view, the mapping from a Bayesian network

to a Boltzmann machine can be seen as a method for automatically determining the structure and the connection

weights of a Boltzmann machine by incorporating high-level, probabilistic information directly into the neural

network architecture, without recourse to a time-consuming and unreliable learning process.

work structure, provided that suitable massively para-

Neural networks are massively parallel computational llel hardware is available. In order to apply these mod-

models consisting of a large number of very simple pro- els for solving combinatorial optimization problems,

cessing units (for a survey of neural models, see e.g., we need also an efficient, theoretically justifiable

the classic collections [1–4]). These models can per- method for constructing a neural network structure

form certain computational tasks extremely fast when where the maximum point of the objective function

run on customized parallel hardware, and hence they corresponds to the solution to the optimization prob-

have been suggested as a computationally efficient tool lem to be solved. However, Boltzmann machines, and

for solving NP-hard optimization problems approxi- neural network models in general, are typically con-

matively [5, 6]. Especially suitable for this type of structed by inefficient, partly quite adhoc methods.

tasks are stochastic neural network architectures, as In particular, the neural network architecture is usu-

these models are based on a stochastic updating pro- ally selected in a more or less arbitrary manner by

cess which converges to a state maximizing an objec- fixing the number of nodes and the connections be-

tive function determined by the variable states of the tween the nodes manually. The weights are then de-

binary nodes of the network, and by the (constant) pa- termined by a learning algorithm which first assigns

rameters (“weights”) attached to the connecting arcs. A (randomly chosen) initial weights to the connections,

well-known example of a stochastic neural network ar- and uses then some gradient-based greedy algorithm

chitecture is the Boltzmann machine (BM) model [7, 8]. for changing the weights until the behavior of the net-

Boltzmann machines can be used as a computa- work seems to be consistent with a sample of training

tionally efficient tool for finding maximum points of data [8–10].

32 Myllymäki

There are, however, several serious pitfalls with this resulting in relatively small conditional probability ta-

approach, which relate to the “black box” nature of the bles. In these cases, a Bayesian network representation

functioning of the resulting models: normally there is of the problem domain probability distribution can be

no way of finding out what kind of knowledge a neu- constructed efficiently and reliably, assuming that ap-

ral network contains after the learning, nor is it possi- propriate high-level expert domain knowledge is avail-

ble to explain the behavior of the model. In particular, able. There exists also several interesting approaches

as the learning algorithms start with a randomly cho- for constructing Bayesian networks from sample data,

sen initial state, “tabula rasa”, they are unable to use and moreover, theoretically solid techniques for com-

any prior knowledge of the problem environment, al- bining domain expert knowledge with the machine

though in many cases this kind of information would learning approach (see, e.g., [14]).

be readily available. This results in a very slow and The Bayesian network theory offers a framework for

unreliable learning process. Besides, as most of the constructing algorithms for different probabilistic rea-

learning algorithms are “steepest-descent” type greedy soning tasks (for a survey of probabilistic reasoning

algorithms, they are very likely to get stuck in local algorithms, see [15]). In this paper, we focus on the

maximum points. problem of finding the MAP state in the domain mod-

It follows that although Boltzmann machines pro- eled by a given Bayesian network. Unfortunately, for

vide us with an efficient computational mechanism for a general network structure, the MAP problem can be

finding maximum points of the objective function, find- shown to be NP-hard [16, 17], which means that very

ing a Boltzmann machine structure with a suitable ob- probably it is not possible for any algorithm to solve

jective function can be very difficult in practice. In this this task (in the worst case) in polynomial time with

paper we present one possible solution to this prob- respect to the size of the network. Consequently, re-

lem, focusing on a combinatorial optimization problem cently there has been a growing interest in developing

of great practical importance, the problem of finding stochastic algorithms for solving the MAP problem,

the maximum a priori (MAP) probability value assign- where instead of aiming at an accurate, deterministic

ment of a probability distribution on a set of discrete solution, the goal is to find a good approximation for a

variables. More precisely, we show how a Bayesian given problem with high probability.

(belief) network model [11–13], a graphical high-level A prominent example of such stochastic methods is

representation of a probability distribution over a set the simulated annealing algorithm [18–21]. This global

of discrete variables, can be mapped to a Boltzmann optimization method can be used as the computational

machine architecture so that the global maximum of the procedure for performing probabilistic reasoning on a

Boltzmann machine objective function (the consensus Bayesian network, but the method can be excruciat-

function) has the same maximum points as the proba- ingly slow in practice. The efficiency of simulated an-

bility distribution represented by the Bayesian network nealing depends crucially on a set of parameters which

model. control the behavior of the algorithm. Elaborate tech-

Intuitively speaking, a Bayesian network model of a niques for optimizing these parameters can be found

probability distribution is constructed by explicitly de- in [22–29]. An alternative approach for speeding up

termining all the direct (causal) dependencies between the simulated annealing method is to develop parallel

the random variables of the problem domain: each node variants of the algorithm. Techniques for implement-

in a Bayesian network represents one of the random ing parallel forms of simulated annealing on conven-

variables of the problem domain, and the arcs between tional hardware can be found in [21, 30, 31]. On the

the nodes represent the direct dependencies between other hand, as suggested in [21], since the Boltzmann

the corresponding variables. In addition, each node machine updating process produces a stochastic pro-

has to be provided with a table of conditional probabil- cess similar to simulated annealing, this neural network

ities, where the variable in question is conditioned by model offers an interesting possibility for an implemen-

its predecessors in the network. tation on a massively parallel architecture.

The importance of Bayesian network representations In this paper we argue that we can achieve “the best

lies in the way such a structure can be used as a com- of both worlds” by building a hybrid Bayesian-neural

pact representation for many naturally occurring dis- system, where the model construction module uses

tributions, where the dependencies between variables Bayesian network techniques, while the probabilis-

arise from a relatively sparse network of connections, tic reasoning module is implemented as a massively

Massively Parallel Probabilistic Reasoning 33

parallel Boltzmann machine. For constructing such of accuracy in sampling the probability distribution,

a hybrid Bayesian-neural system, we need a way to the BM process is only an approximation of a simu-

map a given Bayesian network to a Boltzmann ma- lated annealing process on the Bayesian network. It is

chine architecture, so that the consensus function of therefore an interesting question whether the speedup

the resulting Boltzmann machine has the same maxi- gained from parallelization compensates for the loss of

mum points as the probability measure corresponding accuracy in the stochastic process. In the empirical part

to the original Bayesian network. Although in some re- of the paper, we study this question by using artificially

stricted domains this kind of a transformation is fairly generated test cases.

straightforward to construct [7, 32–35], the methods The paper is organized as follows. First in Section 2

presented do not apply to general Bayesian network we give the formal definition of Bayesian network mo-

structures (see the discussion in [36]). In [37, 38] we dels. The probabilistic reasoning task studied in this pa-

presented a mapping from a binary-variable Bayesian per, the MAP problem, is discussed in Section 3. In this

network to a harmony network [39], which can be re- section we also describe how simulated annealing can

garded as a special case of the Boltzmann machine be used for solving this problem. The basic concepts

architecture. Following the work presented in [36, 40, of Boltzmann machine models are reviewed briefly in

41], in this paper we extend this framework to cases Section 4, and in Section 5 we describe the suggested

where the Bayesian network variables can have an arbi- mapping from a Bayesian network to a Boltzmann ma-

trary number of values, and the neural network module chine. In Section 6 we show results of simulation

uses the standard Boltzmann machine structure. runs which demonstrate that the massively parallel SA

Compared to other neural-symbolic hybrid systems scheme provided by the Boltzmann machine model

(see e.g. [42–45]), the Bayesian-neural hybrid system clearly outperforms the corresponding traditional se-

suggested here has two clear advantages. First of all, quential SA scheme, provided that suitable massively

the mathematical model behind the system is the theo- parallel hardware is available.

retically sound framework of Bayesian reasoning, com-

pared to the more or less heuristic models of most other 2. Bayesian Networks

hybrid systems (for our earlier, heuristic attempts to-

wards a hybrid system, see [46–48]). Secondly, al- Let U denote a set of N discrete random variables,

though some hybrid models provide theoretical justifi- U = {U1 , . . . , U N }. We call this set our variable base.

cations for the computations (see e.g., Shastri’s system In the sequel, we use capital letters for denoting the

for evidential reasoning [49]), they may require fairly actual variables, and small letters u 1 , . . . , u N for de-

complicated and heterogeneous computing elements noting their values. The values of all the variables in

and control regimes, whereas the neural network model the variable base form a configuration vector or a state

behind our Bayesian-neural system is structurally very vector uE = (u 1 , . . . , u N ), and all the M possible con-

simple and uniform, and confirms to an already exist- figuration vectors (E u 1 , . . . , uE M ) form our configuration

ing family of neural architectures, the Boltzmann ma- space Ω. Hence our variable base U can also be re-

chines. In addition, the mapping presented here creates garded as a random variable UE , the values of which

a two-way bridge between the symbolic and neural are the configuration vectors. Generally, if X ⊆ U is

representations, which can be used to create a “real” a set of variables, X = {X 1 , . . . , X n }, by XE = xE we

modular hybrid system where two (or more) separate mean that xE is a vector (x1 , . . . , xn ), and X i = xi , for

(neural or symbolic) inference modules work together. all i = 1, . . . , n.

The mapping described in this paper produces a Let F denote the set of all the possible subsets of

Boltzmann machine updating process which corre- Ω the set of events, and let P be a probability mea-

sponds in a sense to a simulated annealing process sure defined on Ω. The triple (Ω, F, P) now defines

where all the Bayesian network variables are updated a joint probability distribution on our variable base U.

simultaneously. Consequently, with suitable massively Having fixed the configuration space Ω (and the set

parallel hardware, processing is quite efficient and be- of events F ), any probability distribution can be fully

comes independent of the size or the structure of the determined by giving its probability measure P, and

Bayesian network. On the other hand, the BM updating hence we will in the sequel refer to a probability distri-

process works in a state space much larger than the state bution by simply saying “the probability distribution is

space of the original Bayesian network, and in terms P”.

34 Myllymäki

retrieving the configuration probabilities P{E u } in a

( ¯ )

compact and efficient manner. The theoretical foun- Y

N ¯ ^

¯

dations of such models were set up in [11, 12, 13, 50], P{E

u} = P Ui = u i ¯ Uj = u j , (1)

i=1

¯ U ∈F

where it was shown how probability distributions can j Ui

ploiting conditional independencies between the vari- where FUi denotes the predecessors of variable V Ui ,

ables of U: and the conditional probabilities P{Ui = u i | U j ∈FU

i

U j = u j } can be found in the set PG . Consequently,

having defined a set of conditional independencies in

Definition 1. Let X, Y and Z be sets of variables.

a graphical form as a Bayesian network structure G,

Then X is conditionally independent of Y, given Z, if

we can use the conditional probabilities PG to fully de-

termine the underlying joint probability distribution.

P{ XE = xE | YE = yE, ZE = Ez } = P{ XE = xE | ZE = Ez } The number of parameters needed, m = |PG |, de-

pends Pon the density

Q of the Bayesian network structure:

m = i (|Ui | U j ∈FU |U j |). In many natural situa-

holds for all vectors xE, yE, Ez such that P{YE = yE, ZE i

than the size of the full configuration space. An exam-

ple of a simple Bayesian network is given in Fig. 1.

Consequently, the variables in Z intercept any de- Let Ci denote a set consisting of a variable Ui and

pendencies between the variables in X and the vari- all its predecessors in a Bayesian network G, Ci =

ables in Y: knowing the values of Z renders informa- {Ui } + FUi . We call these N sets the cliques of G. For

tion about the values of Y irrelevant to determining the each clique Ci , we define a potential function Vi which

distribution of X. Using the concept of conditional in- maps a given state vector to a real number,

dependence we can now give the definition of Bayesian

( ¯ )

network models: ¯ ^

¯

Vi (E

u ) = ln P Ui = u i ¯ Uj = u j . (2)

¯ U ∈F

j Ui

Definition 2. A Bayesian (belief) network (BN) rep-

resentation for a probability distribution P on a set of

discrete variables U = {U1 , . . . , U N } is a pair {G, PG }, The value of a potential function Vi depends only on

where G is a directed acyclic graph whose nodes corre- the values of the variables in set Ci . As has been noted

spond to the variables U = {U1 , . . . , U N }, and whose in several occasions [33, 37, 38, 54–56], these cliques

topology satisfies the following: each variable X ∈ U can be used to construct an undirected graphical

is conditionally independent of all its non-descendants Markov random field model of the probability dis-

in G, given its set of parents FX , and no proper subset tribution P, which means that the joint probability

of FX satisfies this condition. The second component distribution (1) for a Bayesian network representation

PG is a set consisting of the corresponding conditional can also be expressed as a Gibbs distribution

probabilities P{X | FX }.

1 − Pi −Vi (Eu ) 1 P

P{E

u} = e = e i Vi (Eu ) , (3)

Z Z

As the parents of a node X can often be interpreted as

direct causes of X , Bayesian networks are also some- where the clique potential functions Vi are given in (2),

times referred to as causal networks, or as the purpose and Z = 1.

is Bayesian reasoning, they are also called inference

networks. In the field of decision theory, a model sim-

ilar to Bayesian networks is known as influence dia- 3. The MAP Problem and Simulated Annealing

grams [51]. Rigorous introductions to Bayesian net-

work modeling can be found in [11, 13, 14, 52, 53]. Let X ⊆ U be a set of variables, X = {X 1 , . . . , X n }. By

The importance of Bayesian network structures lies an event { XE = xE} we mean a subset of F which inclu-

in the way such networks facilitate computing the joint des all the configurations uE ∈ Ω that are consistent with

configuration probabilities P{Eu } as a product of simple the assignment hX 1 = x1 , . . . , X n = xn i. Now assume

Massively Parallel Probabilistic Reasoning 35

Figure 1. A simple Bayesian network structure with 7 binary variables U1 , . . . , U7 , and all the conditional probabilities required for determining

the corresponding probability distribution exactly. By “u i ” we mean here the value assignment hUi = 1i, and by “ū i ” the value assignment

hUi = 0i.

we are given a partial value assignment h EE = eEi on a and we set uE(t + 1) = uE(t). For generating a new can-

set E ⊂ U of variables as an input, and let Pmax denote didate state uE j , most versions of simulated annealing

the maximal marginal probability in the set { EE = eE} use a simple scheme called Gibbs sampling where only

consisting of all the configurations consistent with the one randomly chosen variable is allowed to change its

given value assignment, Pmax = maxuE∈{ E=E E e} P{Eu }. A value to a new randomly chosen value.

state uEi ∈ { EE = eE} with the property P{Eu i } = Pmax is In the Metropolis-Hastings version [18, 57] of sim-

now called a maximum a posteriori probability (MAP) ulated annealing, the acceptance probabilities are de-

state. In this study, we are interested in the task of fined by

finding a MAP state of the configuration space Ω, given µ ¶

some evidence h EE = eEi, and we call this problem the

V (E

u j ) − V (E

ui )

1, if exp ≥ 1,

MAP problem. T

Ai j (T ) = µ ¶

Let us consider a Bayesian network with a joint prob-

u j ) − V (E

V (E ui )

ability distribution of the form (3), and let us define the

exp , otherwise.

T

potential of a state uE as the sum of clique potentials,

X where T (t), the computational temperature, is a mono-

V (E

u) = Vi (E

u ). (4) tonely decreasing function converging to zero as t ap-

i

proaches infinity. Barker [58] has introduced an al-

We can now clearly find a MAP state by maximizing ternative method with acceptance probabilities of the

the potential function V . form

In simulated annealing (SA) [18–21] this task is

1

solved by producing a stochastic process {E u (t) | t = Ai j (t) = ¡ V (Eu i )−V (Eu j ) ¢ . (5)

0, 1, . . .}, which converges to a MAP solution with 1 + exp T (t)

high probability. The stochastic process used in SA is

a Markov chain, which means that, at time t, the next In both cases it can be shown that the resulting stochas-

state uE(t + 1) depends only on the current state uE(t). tic process will converge to a MAP solution with prob-

Assuming that the current state uE(t) is uEi , the new state ability p, where p approaches one as the number of

uE(t + 1) is produced by first generating a candidate uE j iterations approaches infinity [21]. Moreover, it can

for the next state, and the candidate is then accepted by be shown that if the temperature T (t) decreases to-

an acceptance probability Ai j , otherwise it is rejected, wards zero slowly enough, the convergence is almost

36 Myllymäki

certain even with a finite time process [20]. Unfortu- input in (7):

nately, for a theoretically guaranteed convergence, a X

computationally infeasible exponential number of iter- r ) − C(Es ) =

C(E wi j s j = Ii .

ations is needed. Although in practice good results are j

sometimes obtained with a relatively small number of

iterations, the method can be excruciatingly slow. In It follows that the BM updating process can be regarded

the following, we show how to create a massively par- as a Gibbs sampling process with acceptance probabil-

allel implementation of simulated annealing by using ities identical to those given in (5), with respect to a

the Boltzmann machine architecture. Gibbs distribution

4. Boltzmann Machines

Consequently, in principle the BM model can be used

A Boltzmann machine (BM) [8] is a neural network

as a massively parallel tool for finding the maximum

consisting of a set of binary nodes {S1 , . . . , Sn }, where

of the consensus function. However, the acceptance

the state si of a node Si is either 1 (“on”), or 0 (“off”).

probability (7) sets here implicitly some additional re-

Each arc from a unit Si to unit S j is provided with a

quirements to the generation probabilities, since the

real-valued parameter w ji , the weight of the arc. The

difference in consensus is calculated by keeping all the

arcs are symmetric, so wi j = w ji . Each unit Si is also

nodes except one constant. For this reason, if two or

provided with a real-valued parameter θi , the bias of

more adjacent nodes of the BM network are to be up-

the unit.

dated at the same time, the corresponding transition

Let sE = (s1 , . . . , sn ) ∈ {0, 1}n denote a global state

probability matrix of the resulting process is no more

vector of the nodes in the network. We define the con-

stochastic, and hence convergence of the algorithm can

sensus of a state sE as

not be guaranteed. On the other hand, if we allow only

X

n X

n one node to be updated at a time, convergence can be

C(Es ) = wi j si s j , (6) guaranteed, but then the parallel nature of the algorithm

j=1 i= j is lost.

To solve this dilemma, it has been suggested [59]

where wii denotes the bias θi of node Si , and the weight

that the nodes of a BM network should be divided into

wi j is defined to be 0 if Si and S j are not connected

clusters, where no two nodes inside of a cluster are con-

(disregarding the sign reversal, the consensus function

nected to each other. Using this kind of clustered BM

is equal to the the energy of a BM model used in most

models we can maintain some parallelism and update

definitions).

all the nodes in one cluster at the same time, while a

The nodes of a BM network are updated stochasti-

convergence theorem similar to the convergence theo-

cally according to the following probabilistic rule:

rem of simulated annealing can be proved [21, p. 139].

1 Obviously, the degree of parallelism depends on the

P(si = 1) = ¡ −Ii ¢ , (7) number of clusters in the network. Unfortunately, the

1 + exp T (t) problem of finding a minimal set of clusters in a given

P network is NP-complete [21, p. 141]. In the next section

where Ii = j wi j s j is the net input to node Si , we deal with a special class of BM architectures which

and T (t) is a real-valued parameter called tempera- has by definition only two clusters, being in this sense

ture, which decreases with increasing time t towards an optimal BM architecture.

zero. Consequently, each node is able to update its state

locally, using the information arriving from the con-

necting neighbors as the input for a sigmoid function, 5. Solving the MAP Problem

thus offering a possibility for a massively parallel im- by Boltzmann Machines

plementation of this algorithm.

Let us consider a state sE, and let rE be a new state Let us consider a Bayesian network {G, PG } for a

which is produced by changing the state of node Si . It probability distribution P of the form (3), and let

is now relatively easy too see [36] that the difference m denote the number of conditional probabilities in

in consensus between the two states is exactly the net PG , PG = { p1 , . . . , pm }. Furthermore, let λ1 , . . . , λm

Massively Parallel Probabilistic Reasoning 37

P

denote the natural logarithms of the conditional proba- where I j = i w ji ai is the net input to pattern node

bilities pi , λi = ln pi . We can now equally well forget Bj.

about the cliques altogether, and express the potential corresponding

Let us now consider a pattern node B j V

function (4) as to a parameter λ j , and let P{Ui = u i | Uk ∈FU Uk =

i

u k } be the conditional probability used for computing

X

N X

m

λj,

V (E

u) = Vi (E

u) = χ j (E

u )λ j , (8)

i=1 j=1 ( ¯ )

¯ ^

¯

where the function χ j (E u ) has value one, if uE is consis- λ j = ln P Ui = u i ¯ Uk = u k + λ∗ . (10)

¯ U ∈F

k Ui

tent with the value assignment corresponding to prob-

ability p j , otherwise it is zero. It is clear that if we

find the maximum of the potential function V in (8), We now set the weights of the arcs between pattern node

we have found the maximum of the probability distri- B j and feature nodes A1 , . . . , An in the following way:

bution P.

For our purposes, we need the function to be maxi- 1. The weights of all the arcs connecting node B j to

mized to be unbounded above, so we scale the strictly feature nodes representing values of variables not

negative potential function (8) by adding a constant in {Ui , FUi } are set to zero.

λ∗ to each of the parameters λ j . If we set λ∗ ≥ 2. The weight from pattern node B j to a feature node

− min j ln p j , the new potential function becomes non- corresponding to the value u i is set to λ j , and the

negative. In our empirical tests described in Section 6, weights of the arcs to feature nodes corresponding

we set λ∗ to be equal to − min j ln p j . to other values of the variable Ui are set to −λ j .

As the number of non-zero parameters χ j is always 3. The weight from node B j to feature nodes corre-

exactly N , the number of cliques, the resulting new sponding to the values u j appearing on the right

potential function is equal to the original function plus hand side in (10) are set to λ j , and arcs to feature

a constant N λ∗ , and hence it has the same maximum nodes representing other values of the predecessors

points as the original function. In the following, we of the variable Ui are set to −λ j .

consider maximizing a potential function of the form 4. The bias θ j of pattern node B j is set to (n +j − 1)λ j ,

(8), with the parameters λ j scaled to positive numbers where n +j is the number of positive arcs leaving from

as described above. If we manage to map this function node B j . However, if n +j = 0, we set θ j = 0.

to a consensus function of a Boltzmann machine, we

have accomplished our goal: massively parallel solu- Following this construction for each of the pattern

tion to the MAP problem. nodes B j , we get a structure which we call a two-

Let us now consider a two-layer Boltzmann

P machine, layer Boltzmann machine (BM2 ). An example of a BM2

where the first layer consists of n = i |Ui | feature structure in a case of a simple Bayesian network is

nodes A1 , . . . , An , one for each value for each of the shown in Fig. 2. As arcs with a zero-valued weight are

variables of the problem domain. The second layer has irrelevant for the computations, they are excluded from

altogether m pattern nodes B1 , . . . , Bm , one node for the network.

each of the parameters λ j in the sum in (8). Initially, Let us now consider a Bayesian network GBN , and

let us assume that each pattern node is connected to all the corresponding BM2 network GBM . In the sequel,

the feature nodes in the first layer, but no two nodes we use the term feature vector to denote the vec-

in the same layer can be connected to each other. The tor aE = (a1 , . . . , an ), consisting of the states of the

consensus function of such a network can be expressed feature nodes of GBM . Correspondingly, the vector

as bE = (b1 , . . . , bm ), consisting of the states of the pattern

nodes is called a pattern vector.

C(Es ) = C(a1 , . . . , an , b1 , . . . , bm )

Xm X n

= bj w ji ai Definition 3. The feature vector of a two-layer

j=1 i=1 Boltzmann machine GBM is consistent (with respect to

X

m the corresponding Bayesian network GBN ), if there is

= bj Ij, (9) exactly one feature node active for each of variables in

j=1 GBN , representing one possible value of that variable.

38 Myllymäki

Figure 2. A simple Bayesian network (on the left), and the corresponding BM2 structure with the parameters λ j . Solid lines represent arcs

with positive weight, negative arcs are printed with dashed lines.

From Definition 3 it follows directly that a consistent It is now easy to proof the following simple lemma:

feature vector can be mapped to an instantiation of GBN .

As before, let B j be a pattern node corresponding

V to a Lemma 1. A BM2 network converges to a consistent

parameter λ j , and let P{Ui = u i | Uk ∈FU Uk = u k } state.

i

be the conditional probability used for computing λ j .

We now say that the pattern node B j is consistent with Proof: Let sE = (E E be the final state of a BM2 up-

a , b)

V aE if aE is consistent with the assignment

a feature node dating process. We know thatPsE is a state which maxi-

hUi = u i , Uk ∈FU Uk = u k i. Moreover, a pattern acti- mizes the consensus C(Es ) = mj=1 b j I j . It is now easy

vation vector bE is said to be consistent, if all the con- too see that if aE is a consistent vector, bE must also

i

sistent pattern nodes are on, and all the inconsistent be consistent, since if there were inconsistent pattern

pattern nodes are off. Finally, a state sE = (E E of GBM

a , b) nodes active on the second layer in this case, switching

E

is called consistent if both aE and b are consistent. them off would increase the consensus of the network,

Massively Parallel Probabilistic Reasoning 39

and correspondingly, switching any inactive consistent Bayesian networks, and attached to each network a

pattern node on would increase the consensus. On the random MAP instantiation problem by clamping half

other hand, aE can not be inconsistent, since in this case of the variables to randomly chosen values. This same

all the pattern nodes connected to inconsistent feature test was performed with three different algorithms. As

nodes would be inactive, which means that removing a benchmark, we used a brute force search algorithm

inconsistencies would increase the number of active (denoted by BF), which is guaranteed to find the MAP

pattern nodes, thus increasing the consensus. It now solution in |Ä E | ∗ m time steps, where |Ä E | is the num-

follows that sE = (E E must be a consistent state. 2

a , b) ber of possible solutions, and m is the number of condi-

tional probabilities attached to the Bayesian network in

Using this lemma, we can show that the BM2 struc- question. Secondly, we used a sequential simulated an-

ture can be used for solving the MAP problem: nealing algorithm on the Bayesian network (denoted by

SSA), and the corresponding massively parallel BM2

Proposition 1. The updating process of the BM2 net- updating process (denoted here by BMSA). As we did

work converges to a state which maximizes the poten- not have access to real neural hardware, the BMSA

tial function V of the corresponding Bayesian network model was tested by running simulations on a conven-

GBN , provided that all the parameters λ j are nonnega- tional Unix workstation. The time requirement for one

tive. iteration of the SSA was assumed to be O(m), whereas

the time requirement for one iteration for the BMSA

Proof: According to Lemma 1, the network con- algorithm was assumed to be O(1) (all the nodes on

verges to a maximum consensus state where all the one layer were updated at the same time, so updating

active pattern nodes are consistent with the consistent the whole network took 2 time steps).

feature vector. This means that each final feature vec- When considering the performance of the SSA or the

tor aE corresponds to an instantiation uE. It is easy to see BMSA algorithm, it is clear that the most critical issue

that the net input I j to a consistent pattern node B j is is finding a suitable cooling scheme. Unfortunately,

λ j , and hence it follows that the theoretically correct logarithmic cooling scheme

X

m described in [20] can not be used in practice, since the

max C(Es ) = max max bj Ij number of iterations required grows too high even with

sE aE bE j=1 relatively low starting temperatures: for instance, start-

X

m ing with the initial temperature of 2, annealing down

= max max (b j I j ) to 0.1 would require more than 485 million iteration

aE b j ∈{0,1}

j=1 steps. The problem with heuristic annealing schemes

X

m is that if the annealing is done too cautiously, an unnec-

= max χ j (E

u )λ j essarily large amount of computing time may be spent.

uE

j=1

On the other hand, if the annealing is done too quickly,

= max V (E

u ). 2 the theoretical convergence results do not apply, and

uE

the results are unreliable. Recent theoretical modifica-

Please note that if the parameters λ j were not scaled tions applied in fast annealing (FA) [24] and adaptive

to nonnegative numbers as suggested earlier, the poten- simulated annealing (ASA) [22, 23] offer interesting

tial function would be strictly negative, and the BM2 alternatives to the exponential Metropolis form of sim-

construction would be useless, since the network would ulated annealing used in this study, but it is currently

then always have a trivial maximum point at zero, cor- not clear whether the framework presented here can

responding to a state where all the nodes are off. be extended to these forms of simulated annealing as

well. This poses an interesting research problem that

6. Empirical Results deserves further study.

In the experiments performed, our primary objec-

6.1. The Setup tive was not to study the efficiency of the sequen-

tial and massively parallel algorithms per se, but to

To empirically test the effectiveness of the BM2 con- see whether the speedup gained from parallelization

struction, we generated several MAP instantiation would be enough to compensate for the loss of accuracy

problems, with N , the number of variables, ranging in sampling—in other words, whether the BM2 con-

from 2 to 24. For each N , we generated 10 random struction would be computationally practical to use in

40 Myllymäki

be noted that as SSA and BMSA use the same parame-

ters for determining the cooling schedule of the anneal-

ing algorithm, it seems reasonable to assume that use of

the more sophisticated variants of simulated annealing

[22–29] would demonstrate roughly equal speedup for

both SSA and BMSA.

In our empirical tests, the temperature was low-

ered according to a simple cooling schedule, where

the temperature was multiplied after each iteration by

a constant annealing factor F, F < 1.0. A more detailed

study of the behavior of SSA and BMSA with different

annealing factors can be found in [36]. Based on this Figure 3. The behavior of the BMSA and SSA algorithms as a

study, in the tests reported here, the annealing factor function of the density of Bayesian network.

was set to 0.66. Although simple, this type of cooling

schedule is very common, and has proven successful

in many applications [21]. It is also empirically ob- significant effect on the results with either SSA or

served that more sophisticated annealing methods do BMSA. It would be an interesting research problem to

not necessarily produce any better results than this sim- study (analytically or empirically) how the Bayesian

ple method [60]. network probability distribution P changes with the

The main goal of the experiments was to study the parameter ², but this question was not addressed here.

behavior of the SSA and BMSA algorithms with in- Increasing the density of Bayesian networks not only

creasingly complex MAP problems. The complexity changes the shape of the probability distribution on the

of these problems can be increased in two ways: by configuration space, but it also imposes a computa-

changing the properties of the probability distribution tional problem as it increases m, the number of the

on the configuration space in question, or by chang- conditional probabilities that need to determined for

ing the size of the configuration space. We experi- the model. Since the BMSA algorithm (or actually its

mented with three methods for changing the configu- simulated massively parallel implementation) is inde-

ration space probability distribution: by restricting the pendent of m, increasing the density does not affect

conditional probabilities of the Bayesian networks to the BMSA solution time very much, whereas the so-

small regions near zero or one, by changing the density lution time of SSA increases significantly (see Fig. 3).

of the Bayesian network structure, and by changing the Nevertheless, it should be noted that we have here ex-

size of the evidence set E, i.e., by changing the num- tended our experiments to very dense, and even fully

ber of clamped variables. The size of the configuration connected networks. Naturally, this does not make any

space was increased in two ways: by allowing more sense in practice, since the whole concept of Bayesian

variables in the Bayesian networks, and by allowing networks relies on the networks being relatively sparse.

the variables to have more values. For this reason, in the sequel we use in our experiments

relatively sparse networks only (which does not, how-

ever, mean that the networks were singly-connected or

6.2. The Results otherwise structurally simple).

The test set corresponding to Fig. 3 consisted of 100

It has been noted [61] that solving certain types of prob- MAP problems on 10 Bayesian networks with only 8

abilistic reasoning tasks can become very difficult if binary nodes. When considering the results, it should

the Bayesian network contains a lot of extreme prob- be kept in mind that although the BF algorithm seems to

abilities (probabilities with values near zero or one). work relatively well with these small networks, it does

However, as already noted in [38], in our MAP prob- not scale up with increasing size of the networks (as we

lem framework this does not seem to be true. We ex- shall see in Fig. 6). For the same reason, the exhaus-

perimented by restricting the randomly generated con- tive BF algorithm performs well with a small number

ditional probabilities of the Bayesian network model of unclamped variables (in which case the search space

in the regions [0.0, δ], [1.0 − δ, 1.0], and varied the is small), but from Fig. 4 we see that as the number of

value of δ between 0.5 and 0.0001, but observed no unclamped variables increases, the time required for

Massively Parallel Probabilistic Reasoning 41

Figure 6. The behavior of the algorithms as a function of the num-

ber of the unclamped variables.

ber of the variables.

before, half of the variables were clamped in advance.

soning tasks, formulated as finding the MAP state of a

given Bayesian network, by a massively parallel neu-

ral network computer. Our experimental simulations

strongly suggest that the massively parallel BMSA al-

gorithm outperforms the sequential SSA algorithm,

Figure 5. The behavior of the algorithms as a function of the num- provided that suitable hardware is available. As can be

ber of the values of the variables. expected, the proportional speedup gained from par-

allelization seems to increase with increasing problem

complexity.

running BF grows rapidly. Both SSA and BMSA ap- It should be noted that our SSA realization of the

pear to be quite insensitive to the number of instantiated Gibbs sampling/annealing method uses a heuristic

variables. In these tests, we used 100 MAP problems cooling schedule which does not fulfill the theoreti-

on 10 Bayesian networks with 16 binary variables. cal requirements of the convergence theorem of SA.

In Fig. 5, we plot the behavior of the algorithms as What is more, even as an approximation of the theo-

a function of the increasing configuration space, when retically correct algorithm, the SSA method is not a

the maximum number of variable values is increased. very sophisticated alternative, and there exist several

The test set consisted of 100 MAP problems on 10 elaborate techniques for making the algorithm much

10-node Bayesian networks with half of the variables more efficient than in the experiments here. Neverthe-

clamped in advance. With networks of this size, the less, it can also be argued that the same techniques

SSA algorithm seems to perform only comparably to would probably yield a similar speedup for the BMSA

the BF algorithm. However, when the size of the net- algorithm as well. Consequently, we believe that these

works is increased, the general tendency is clear: the results show that if suitable neural hardware is avail-

exhaustive BF algorithm starts to suffer from combina- able, the BMSA algorithm offers a promising basis for

torial explosion, and fails to provide a computationally building an extremely efficient tool for solving MAP

feasible solution to the MAP problem (see Fig 6). The problems.

SSA and BMSA algorithms, on the other hand, seem From the Bayesian network point of view, the map-

to scale up very well. In Fig. 6, each data point corre- ping presented here provides an efficient implementa-

sponds to a test set consisting of 100 MAP problems tional platform for performing probabilistic reasoning

42 Myllymäki

with the stochastic simulated annealing algorithm. 6. E. Baum, “Towards practical ‘neural’ computation for combi-

From the neural network point of view, the mapping natorial optimization problems,” in: Proceedings of the AIP

offers an interesting opportunity for constructing neu- Conference 151: Neural Networks for Computing, edited by

J. Denker, Snowbird, UT, pp. 53–58, 1986.

ral models from expert knowledge, instead of learn- 7. G. Hinton and T. Sejnowski, “Optimal perceptual inference,” in

ing them from raw data. The resulting neural network Proceedings of the IEEE Computer Society Conference on Com-

could also be used as a (cleverly chosen) initial starting puter Vision and Pattern Recognition, Washington, DC, 1983,

point to some of the existing learning algorithms [8–10, pp. 448–453.

62] for Boltzmann machines, in which case the learn- 8. G. Hinton and T. Sejnowski, Learning and Relearning in

Boltzmann Machines, in [3], pp. 282–317, 1986.

ing problem should become much easier than with a 9. Y. Freund and D. Haussler, “Unsupervised learning of distribu-

randomly chosen initial state. Moreover, the resulting tions on binary vectors using two layer networks,” in Neural In-

“fine-tuned” neural network could also be mapped back formation Processing Systems 4, edited by J. Moody, S. Hanson,

to a Bayesian network representation after the learn- and R. Lippmann, Morgan Kaufmann Publishers: San Mateo,

CA, pp. 912–919, 1992.

ing, which means that the mapping can also be seen as

10. C. Galland, “Learning in deterministic Boltzmann machine net-

a tool for extracting high-level knowledge from neural works,” Ph.D. Thesis, Department of Physics, University of

networks. From the Bayesian network point of view, Toronto, 1992.

this kind of a “fine-tuning” learning process could also 11. J. Pearl, Probabilistic Reasoning in Intelligent Systems: Net-

be useful in detecting mutually inconsistent probabili- works of Plausible Inference, Morgan Kaufmann Publishers:

San Mateo, CA, 1988.

ties, or other inconsistencies in the underlying Bayesian

12. R. Shachter, “Probabilistic inference and influence diagrams,”

network representation. Operations Research, vol. 36, no. 4, pp. 589–604, 1988.

It should also be noted that the results presented here 13. R. Neapolitan, Probabilistic Reasoning in Expert Systems, John

can in principle be used as a computationally efficient, Wiley & Sons: New York, NY, 1990.

massively parallel tool for solving optimization prob- 14. D. Heckerman, D. Geiger, and D. Chickering, “Learning

Bayesian networks: The combination of knowledge and sta-

lems in general, and not only for solving MAP prob-

tistical data,” Machine Learning, vol. 20, no. 3, pp. 197–243,

lems as formulated here. Naturally, the efficiency of 1995.

such an approach would largely depend on the degree 15. M. Henrion, “An introduction to algorithms for inference in be-

to which the function to be maximized can be decom- lief nets,” in Uncertainty in Artificial Intelligence 5, edited by

posed as a linear sum of functions, each depending M. Henrion, R. Shachter, L. Kanal, and J. Lemmer, Elsevier Sci-

ence Publishers B.V.: North-Holland, Amsterdam, pp. 129–138,

only on a small subset of variables (corresponding to

1990.

the cliques in the Bayesian network formalism). Fur- 16. G. Cooper, “The computational complexity of probabilistic in-

ther studies on this subject are left as a goal for future ference using Bayesian belief networks,” Artificial Intelligence,

research. vol. 42, no. 2/3, pp. 393–405, 1990.

17. S. Shimony, “Finding MAPs for belief networks is NP-hard,”

Artificial Intelligence, vol. 68, pp. 399–410, 1994.

Acknowledgments 18. N. Metropolis, A. Rosenbluth, M. Rosenbluth, M. Teller, and

E. Teller, “Equations of state calculations by fast computing

This research has been supported by the Academy of machines,” Journal of Chem. Phys., vol. 21, pp. 1087–1092,

1953.

Finland, Jenny and Antti Wihuri Foundation, Leo and 19. S. Kirkpatrick, D. Gelatt, and M. Vecchi, “Optimization by sim-

Regina Wainstein Foundation, and the Technology De- ulated annealing,” Science, vol. 220, no. 4598, pp. 671–680,

velopment Center (TEKES). 1983.

20. S. Geman and D. Geman, “Stochastic relaxation, Gibbs distri-

butions, and the Bayesian restoration of images,” IEEE Trans-

References actions on Pattern Analysis and Machine Intelligence, vol. 6,

pp. 721–741, 1984.

1. J. Anderson and E. Rosenfeld (Eds.), Neurocomputing: Foun- 21. E. Aarts and J. Korst, Simulated Annealing and Boltzmann Ma-

dations of Research, MIT Press: Cambridge, MA, 1988. chines: A Stochastic Approach to Combinatorial Optimization

2. J. Anderson and E. Rosenfeld (Eds.), Neurocomputing 2: Direc- and Neural Computing, John Wiley & Sons: Chichester, 1989

tions for Research, MIT Press: Cambridge, MA, 1991. 22. L. Ingber, “Very fast simulated re-annealing,” Mathematical

3. D. Rumelhart and J. McClelland (Eds.), Parallel Distributed Computer Modelling, vol. 8, no. 12, pp. 967–973, 1989.

Processing, vol. 1, MIT Press: Cambridge, MA, 1986. 23. L. Ingber, “Adaptive simulated annealing (ASA): Lessons

4. J. McClelland and D. Rumelhart (Eds.), Parallel Distributed learned,” Control and Cybernetics, vol. 25, no. 1, pp. 33–54,

Processing, MIT Press: Cambridge, MA, vol. 2, 1986. 1996.

5. J. Hopfield and D. Tank, “Neural computation of decisions in op- 24. H. Szu and R. Hartley, “Nonconvex Optimization by Fast Sim-

timization problems,” Biological Cybernetics, vol. 52, pp. 141– ulated Annealing,” Proceedings of the IEEE, vol. 75, no. 11,

152, 1985. pp. 1538–1540, 1987.

Massively Parallel Probabilistic Reasoning 43

25. G. Bilbro, R. Mann, and T. Miller, “Optimization by mean field 44. S. Goonatilake and S. Khebbal (Eds.), Intelligent Hybrid Sys-

annealing,” in Advances in Neural Information Processing Sys- tems, John Wiley & Sons: Chichester, 1995.

tems I, edited by D. Touretzky, Morgan Kaufmann Publishers: 45. R. Sun, Integrating Rules and Connectionism for Robust Com-

San Mateo, CA, pp. 91–98, 1989. monsense Reasoning, John Wiley & Sons: Chichester, 1994.

26. C. Peterson and J. Anderson, “A mean field theory learning al- 46. P. Floréen, P. Myllymäki, P. Orponen, and H. Tirri, “Neural

gorithm for neural networks,” Complex Systems, vol. 1, pp. 995– representation of concepts for robust inference,” in Proceedings

1019, 1987. of the International Symposium Computational Intelligence II,

27. J. Alspector, T. Zeppenfeld, and S. Luna, “A volatility measure edited by F. Gardin and G. Mauri, Milano, Italy, pp. 89–98,

for annealing in feedback neural networks,” Neural Computation 1989.

vol. 4, pp. 191–195, 1992. 47. P. Myllymäki, H. Tirri, P. Floréen, and P. Orponen, “Compiling

28. N.R.S. Ansari and G. Wang, “An efficient annealing algorithm high-level specifications into neural networks,” in Proceedings

for global optimization in Boltzmann machines,” Journal of Ap- of the International Joint Conference on Neural Networks, vol. 2,

plied Intelligence, vol. 3, no. 3, pp. 177–192, 1993. Washington, D.C., 1990, pp. 475–478.

29. S. Rajasekaran and J.H. Reif, “Nested annealing: A provable 48. P. Floréen, P. Myllymäki, P. Orponen, and H. Tirri, “Compiling

improvement to simulated annealing,” Theoretical Computer object declarations into connectionist networks,” AI Communi-

Science, vol. 99, pp. 157–176, 1992. cations, vol. 3, no. 4, pp. 172–183, 1990.

30. D. Greening, “Parallel simulated annealing techniques,” Physica 49. L. Shastri, Semantic Networks: An Evidential Formalization and

D, vol. 42, pp. 293–306, 1990. Its Connectionist Realization. Pitman: London, 1988.

31. L. Ingber, “Simulated annealing: Practice versus theory,” Math- 50. J. Pearl, “Fusion, propagation and structuring in belief net-

ematical Computer Modelling, vol. 18, no. 11, pp. 29–57, works,” Artificial Intelligence, vol. 29, pp. 241–288, 1986.

1993. 51. R. Howard and J. Matheson, “Influence diagrams,” in Readings

32. H. Geffner and J. Pearl, “On the probabilistic semantics of con- in Decision Analysis, edited by R.A. Howard and J.E. Matheson,

nectionist networks, Technical Report R-84, UCLA Computer Strategic Decisions Group: Menlo Park, CA, pp. 763–771, 1984.

Science Department, Los Angeles, CA, 1987. 52. F. Jensen, An Introduction to Bayesian Networks. UCL Press:

33. K. Laskey, “Adapting connectionist learning to Bayesian net- London, 1996.

works,” International Journal of Approximate Reasoning, vol. 4, 53. E. Castillo, J. Gutiérrez, and A. Hadi, “Expert systems and prob-

pp. 261–282, 1990. abilistic network models,” Monographs in Computer Science,

34. T. Hrycej, “Common features of neural-network models of high Springer-Verlag: New York, NY, 1997.

and low level human information processing,” in Proceedings 54. T. Hrycej, “Gibbs sampling in Bayesian networks,” Artificial

of the International Conference on Artificial Neural Networks Intelligence, vol. 46, pp. 351–363, 1990.

(ICANN-91), edited by T. Kohonen, K. Mäkisara, O. Simula, 55. S. Lauritzen and D. Spiegelhalter, “Local computations with

and J. Kangas, Espoo, Finland, 1991, pp. 861–866. probabilities on graphical structures and their application to ex-

35. R. Neal, “Connectionist learning of belief networks,” Artificial pert systems,” J. Royal Stat. Soc., Ser. B, vol. 50, no. 2, pp. 157–

Intelligence, vol. 56, pp. 71–113, 1992. 224, 1988. Reprinted as pp. 415–448 in [63].

36. P. Myllymäki, “Mapping Bayesian networks to stochastic neural 56. D. Spiegelhalter, “Probabilistic reasoning in predictive expert

networks: A foundation for hybrid Bayesian-neural systems,” systems,” in Uncertainty in Artificial Intelligence 1, edited by

Ph.D. Thesis, Report A-1995-1, Department of Computer Sci- L. Kanal and J. Lemmer, Elsevier Science Publishers B.V.:

ence, University of Helsinki, 1995. North-Holland, Amsterdam, pp. 47–67, 1986.

37. P. Myllymäki and P. Orponen, “Programming the harmonium,” 57. W. Hastings, “Monte Carlo sampling methods using Markov

in Proceedings of the International Joint Conference on Neural chains and their applications,” Biometrika, vol. 57, pp. 97–109,

Networks, vol. 1, Singapore, 1991, pp. 671–677. 1970.

38. P. Myllymäki, “Bayesian reasoning by stochastic neural net- 58. A. Barker, “Monte Carlo calculations of the radial distribution

works,” Ph.Lic. Thesis, Tech. Rep. C-1993-67, Department of functions for a proton-electron plasma,” Aust. J. Phys., vol. 18,

Computer Science, University of Helsinki, 1993. pp. 119–133, 1965.

39. P. Smolensky, Information Processing in Dynamical Sys- 59. A. de Gloria, P. Faraboschi, and M. Olivieri, “Clustered

tems: Foundations of Harmony Theory, in [3], pp. 194–281, Boltzmann Machines: Massively parallel architectures for con-

1986. strained optimization problems,” Parallel Computing, pp. 163–

40. P. Myllymäki, “Using Bayesian networks for incorporating 175, 1993.

probabilistic a priori knowledge into Boltzmann machines,” 60. D. Johnson, C. Aragon, L. McGeoch, and C. Schevon, “Opti-

in Proceedings of SOUTHCON’94, Orlando, pp. 97–102, mization by simulated annealing: An experimental evaluation;

1994. Part I, graph partitioning,” Operations Research, vol. 37, no. 6,

41. P. Myllymäki, “Mapping Bayesian networks to Boltzmann ma- pp. 865–892, 1989.

chines,” in Proceedings of Applied Decision Technologies 1995, 61. H. Chin and G. Cooper, “Bayesian belief network inference

edited by A. Gammerman, NeuroCOLT Technical Report NC- using simulation,” in Uncertainty in Artificial Intelligence 3,

TR-95-034, London, 1995, pp. 269–280. edited by L. Kanal and J. Lemmer, Elsevier Science Publishers

42. J. Barnden and J. Pollack (Eds.), Advances in Connectionist and B.V.: North-Holland, Amsterdam, pp. 129–147, 1989.

Neural Computation Theory, Vol. I: High Level Connectionist 62. G. Hinton, “Connectionist learning procedures,” Artificial Intel-

Models, Ablex Publishing Company: Norwood, NJ, 1991. ligence, vol. 40, no. 1–3, 1989.

43. G. Hinton, “Special issue on connectionist symbol processing,” 63. G. Shafer and J. Pearl (Eds.), Readings in Uncertain Reasoning,

Artificial Intelligence, vol. 46, no. 1–2, 1990. Morgan Kaufmann Publishers: San Mateo, CA, 1990.

44 Myllymäki

tems Computation (CoSCo) research group at the Department of

Computer Science of University of Helsinki. For the past ten years,

the CoSCo group has been studying complex systems focusing on

issues related to uncertain reasoning, machine learning and data

mining. Research areas include Bayesian networks and other proba-

bilistic models, information theory, neural networks, case-based rea-

soning, and stochastic optimization methods. Dr. Myllymäki is cur-

rently working at the University of Helsinki as a research scientist

for the Academy of Finland.

- A First Course in the Finite Element Method Solution ManualUploaded bypacojn
- Paper Presentation Topics-cseUploaded bykarthika_nandhakumar
- D SeparationUploaded bygigi_ion
- Inference in Bayesian NetworksUploaded byAbin Paul
- Artificial Neural NetworksUploaded bydmsheikh
- Bayesian NetworkUploaded byKalijorne Michele
- A Hybrid Classifier approach for Software Fault PredictionUploaded bysurendiran
- Using Bayesian belief networks to analyse social-ecological conditions for migration in the SahelUploaded byISOE - Institute for Social-Ecological Research
- Understanding ANNUploaded byNanda Maulina Firdaus
- Front PageUploaded byAmirR-con
- techbyteUploaded byarjuninformation
- Bayesian Neural Networks for Bridge Integrity AssessmentUploaded byFrancisco Calderón
- GesHome_ProjRepUploaded byShreyasi Sarkar
- Machine Learning for Soil Fertility and Plant Nutrient Management Using Back Propagation Neural NetworksUploaded byEditor IJRITCC
- Analysis of Variance for Bayesian InferenceUploaded byFrederick Bugay Cabactulan
- FINDING_SPARSE_TRAINABLE_NEURAL_NETWORKS.pdfUploaded byAdrian Chan
- BayesianCourse_session13 (Ajuste Bayesiano)Uploaded byLuis Pineda
- A Bayesian Nonlinear Model for Forecasting Insurance Loss PaymentsUploaded byYiu Keung Li
- ANN by Muskan in Word FormatUploaded byGaurav Chawla
- ML.pdfUploaded byMurali
- Bayesian_Astrostatistics_essay.pdfUploaded byVane Piedrahita
- Neuro fuzzy based heart desease diagnosisUploaded byhirenbhalala708
- Bayesian ClassificationUploaded bynishi21
- Neural Networks and Deep Learning Chap3Uploaded byMohit Duseja
- 10.1016@j.icte.2018.10.005Uploaded byOumaima El Ghafiki
- 17687-20070-1-PBUploaded byParth Chauhan
- Limitation of Apriori AlgoUploaded byGarima Saraf
- 10.1016@j.orhc.2017.08.002Uploaded byNovendra Setyawan
- Intro to Neural Nets.pdfUploaded byjohnconnor
- Gan TutorialUploaded bydgrnwd

- Motivating Factors in Advertisement for Brand Recognition in Print MediaUploaded byspitraberg
- 03Uploaded byspitraberg
- 71413876605-تصميم گيري چند شاخصه در رتبه بندي طرحهاي تامين آب شهريUploaded byspitraberg
- 52613830510-ارائه يک مدل پشتيبان تصميم گيري جهت برنامه ريزي، ارزيابي و انتخاب تامين کنندگانUploaded byspitraberg
- Islamic Perspective of Emotional Intelligence and Significance of Its DevelopmentUploaded byspitraberg
- Strategic Selection of Problem-Solving Methodologies (PSMs( To Gain Lean Production SystemUploaded byspitraberg
- 337- طراحي آزمايشها در محيط فازي با استفاده از تصميم گيري چند هدفه فازيUploaded byspitraberg
- 323 -رتبه بندي صنايع ايران بر اساس تکنيک هاي تصميم گيري با معيارهاي چندگانهUploaded byspitraberg
- How to Find Relationship Between Performance Measurement System and TQM in Manufacturing OrganizationUploaded byspitraberg
- Measuring Service Quality at Hospitals affiliated to Iran Medical Sciences UniversityUploaded byspitraberg
- 224 - تحليل تصميم گيري چندشاخصي با گزينه هاي ترکيبيUploaded byspitraberg
- 20-establishing dominance and potential optimality in multi-criteria analysis with imprecise weight and valueUploaded byspitraberg
- Gender Differences in Communication Styles Among Employees of Service Sector: Age, Gender And Individual’s EducationUploaded byspitraberg
- Total Quality Management-an Approach towards Good overnanceUploaded byspitraberg
- 12-the choquet integral for the aggregation of interval scales in multicriteria decision makingUploaded byspitraberg
- 11-multicriteria job evaluation for large organizationsUploaded byspitraberg
- Benchmarking for Competitive Advantage – Process Vs. Performance Benchmarking For Financial ResultsUploaded byspitraberg
- Emulative Leadership Qualities and Their Global RamificationUploaded byspitraberg
- 04570-فرايند تبيين و تدوين بيانيه رسالت سازمان (مطالعه موردي : آستان قدس رضوي)Uploaded byspitraberg
- 01480-اجراي استراتژي:سازمان مبتني بر نقاط مرجع استراتژيكUploaded byspitraberg
- Establishing Framework for Pakistan Quality AwardUploaded byspitraberg
- The Evolution of ExcellenceUploaded byspitraberg
- 820-نقش فرهنگ در انتخاب مدل استراتژيUploaded byspitraberg
- 127-برنامه ريزي راهبردي در آستان قدس رضويUploaded byspitraberg
- 0002-طراحي و پيادهسازي کارگزار فازي برای اولويت بندي راهبردي سهام بازارUploaded byspitraberg
- 0001-ارائة سیستم هوشمند تصمیم یار برای اولویت بندی استراتژیها با استفاده از فرایند تحليل سلسله مراتبی در حالت فازيUploaded byspitraberg
- Constructing Context in Quality Management ImplementationUploaded byspitraberg
- Knowledge Management in a BSC Project Experience of GIGUploaded byspitraberg
- Knowledge Management in SMEs – Current Practices and Potential BenefitsUploaded byspitraberg
- Weighted Quantum Management for School Teaching / Learning ProgramUploaded byspitraberg

- PRAM ModelsUploaded byjazz
- Huff Man CodeUploaded bymundeepihimanshu
- APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINESUploaded bycseij
- Lec 6: Gain, Phase Margin, Designing with Bode plots, CompensatorsUploaded byzakiannuar
- 10.1.1.26Uploaded byrishijha_jsr
- SPMA07Uploaded byAnonymous 1aqlkZ
- Reed Solomon DocUploaded bykalkam
- Secure Data Retrieval for Decentralized Disruption-Tolerant Military NetworksUploaded byKiran Shenoy
- Supervised LearningUploaded bylacie
- ADA Previous Year PapersUploaded bypiloxa
- Pattern Recognition Using Neural Network (Project Proposal for Image Processing)Uploaded byDil Prasad Kunwar
- COMPUTER ORIENTED NEUMARICAL METHODSUploaded byAbdul Rasheed
- Chapter4_ Heuristic SearchUploaded bysaurabh3112verma
- Arabic Opinion Mining Using Combined Classification ApproachUploaded byChristina Febriani
- RC4c : A Secured Way to View Data Transmission in Wireless Communication NetworksUploaded byAIRCC - IJCNC
- Primal DualUploaded byPritish Pradhan
- world war 1 code breaking easierUploaded byapi-284978643
- Mat540 Strayer Quiz2 Week3Uploaded byVictoriaVidad
- Lab3_ LINEAR CONV.docxUploaded byegaupc2123
- CCSDS 131.0-B-1 TM Synchronization and Channel CodingUploaded bymailiraikkonen
- Acoustic Echo Cancellation Using Nlms-neural Network StructuresUploaded byintelhe
- schedUploaded bykumar_mech230
- Azura Che Soh_ math_malaysia _original_.pdfUploaded bycitianz2011
- [EE280] Lab Exercise 2.docxUploaded byA-Jay N. Galiza
- MPTmanualUploaded bysriksrid
- stochastic processUploaded byAnil Kumar
- Multiple Sequence AlignmentUploaded bysukanyas111
- Enhanced Channel Estimation Based on Basis Expansion Using Slepian Sequences for Time Varying OFDM SystemsUploaded byEditor IJRITCC
- Project ThesisUploaded byAneeshGhosh
- numerical_solution_for_consolidation.docUploaded byrupasm

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.