You are on page 1of 24

Java Object Oriented Neural Engine

The Distributed Training Environment


All you need to know about the DTE

19. September 2004

by Paolo Marrone
http://www.joone.org

1. Overview..........................................................................................................................................3
2. Requirements....................................................................................................................................4
2.1. Portability and scalability.........................................................................................................4
2.2. Task recovery............................................................................................................................5
2.3. Framework parametrization and extensibility..........................................................................6
2.4. The user interface......................................................................................................................7
3. How the DTE works.........................................................................................................................8
3.1. Client.........................................................................................................................................8
3.1.1. Pre-processing...................................................................................................................8
3.1.2. Processing.........................................................................................................................8
3.1.3. Stop-condition...................................................................................................................8
3.1.4. Post-processing.................................................................................................................8
3.2. Master.......................................................................................................................................8
3.3. Worker......................................................................................................................................9
3.4. Jobs and Tasks..........................................................................................................................9
3.4.1. Job.....................................................................................................................................9
3.4.2. Task...................................................................................................................................9
4. The Implementation........................................................................................................................10
4.1. The architecture......................................................................................................................10
4.2. The Inversion of Control (IoC) mechanism............................................................................13
5. How to use the DTE.......................................................................................................................15
5.1. Installation..............................................................................................................................15
5.1.1. The Jini framework on the Master server........................................................................16
5.1.2. The Computefarm framework on the Workers machines...............................................16
5.1.3. The JooneDTE framework on the Client machine..........................................................16
5.2. How to launch the Distributed Training Environment...........................................................17
5.3. How to write the XML parameter file....................................................................................17
5.3.1. A very simple example....................................................................................................17
5.3.2. A complete example........................................................................................................20
5.3.3. Using multiple Factories.................................................................................................22
6. Enhancements and things to do......................................................................................................23
6.1. New Plugins............................................................................................................................23
6.2. A visual parametrization tool..................................................................................................23
6.3. A Task recovery mechanism...................................................................................................23
6.4. A Load Balancing mechanism................................................................................................23
6.5. A visual Job controller............................................................................................................23
6.6. A more general Distributed Environment...............................................................................24
6.7. Suggestions...?........................................................................................................................24

2


One of the biggest problems an user encounters when s/he tries to resolve a complex task using the
neural networks, is the difficulty to find a priori a good architecture and the suitable parameters to
resolve his own problem.
Often the training falls into a local minimum, and this causes the work to be restarted with different
initialization values, causing a lot of long and frustrating trials, mainly when all the work is done
sequentially using a single machine.
The above situation is very usual, and it arises whenever the problem to resolve is noisily (like
almost all the problems involving real world's data) and very complex (i.e. containing a lot of input
variables and/or many training patterns).
What we need is a system to train in parallel many neural networks using several machines
connected to a LAN, in order to try different solutions and be able to find a good one within an
acceptable time. This technique is not new, and it's called 'global optimization', because it explores
globally the entire space of the solutions in order to find the best one.
Many techniques belong to the field of the 'global optimization' approach, as for instance the
Genetic Algorithms, and all them can be implemented by a parallel distributed environment like this
one.
In order to describe formally what we have implemented, we need to give some definitions about all
the components involved in the DTE.
The basic framework for a distributed training environment involve the following ...

• Workers: A collection or group of servers that will assist in the training of a group of neural
networks (called Job) such that the total time taken to train them will be considerably less than if
trained on one single server.
• Job Controller: A central controller that would oversee the overall job's execution, by applying
the desired pre-post execution algorithms, in order to implement whatever custom needs (neural
networks creation/duplication policy, genetic evolution, fittest neural networks selection, etc.)
• Master: A collector/distributor or master server that would oversee the distribution of jobs and
collection of completed tasks.

Our implementation of the DTE that we'll present in this document is based on a series of
requirements, as illustrated in the next chapter.

3


The first thing to do is to specify what the requirements are of both the potential users and the
developers.
Not all the requirements have been implemented, and some of them will be incorporate later in
future releases of the DTE. The main requirements are shown below ...


The load sharing framework should be platform independent. i.e the underlying message passing
framework such as Jini, SOAP, JMS, JXTA or what ever should be hidden from the Joone DTE by
some kind of Joone message driver interface. Drivers for each framework can be written and
plugged in depending on what the user has available or wants. In this fashion drivers for exotic
systems such as distributed processing systems could be written (Globus Grid, etc.). Thus a plug in
architecture for message framework should be created. The Joone DTE therefore would not be
dependent on a specific framework.
Note: the actual implementation guaranties the platform independence, but not the 'framework' one,
because it is based on the Jini/JavaSpaces framework. However we consider it a fundamental step to
reach a complete independence from any specific framework, hence it will remain in the todo-list
with a high level of priority.
The DTE should be completely scalable, it should work from 1 server all the way up to N Servers.
A master server object which would run on the master server should provide a capability for job
marshaling for task distribution and collection. A master server object should not produce the tasks
but simply manage and ensure they complete.
The load-balancing facility must NOT rely on the knowledge of the master server about the
workers' utilization, because the master should not be concerned about how many workers exist
neither about their physical availability, nor their physical coordinates like the IP address. To avoid
to configure the master server whenever each single worker is added/removed from the cluster, the
job-distribution should be based on a pop-like mechanism.

The mechanism should be the following:


1. The master server acts as a commonplace known by both the controller and all the existing
workers.
2. The controller prepares all the tasks composing the job to elaborate, after that it calls the master
object passing that task list. The master stores all the tasks requested to be elaborated (i.e. all the
single networks to elaborate) making them available to the workers. Each task is marked by a
task-id number, and belongs to a Job, identified itself by a job-id.
3. Each worker simply calls the master object requesting a task to elaborate. The master object
responds by returning the next task present in the list that must be elaborated.

4
4. When the worker finishes to elaborate a task, it calls the master object passing the elaborated
neural network along with some useful information like: the task-id, the final RMSE, the number
of training epochs, etc. After that, the worker makes a new request for a new task to work,
restarting from the above step 2.
5. The master object updates the task list by storing the returned neural network in order to permit
the controller to retrieve it.
6. The controller interrogates the master to collect all the elaborated tasks, and when the last task of
a job has been received, the controller can apply some predefined post-elaboration on the job, by
applying, for instance, some kind of 'genetic evolution' to its components, and then submitting
again the tasks to the DTE for another elaboration (by restarting from the above step 2), until
some stopping condition is matched.
The above pop-based mechanism has two big advantages:
• The master server must not be configured to be informed about how many workers exist. Also
the distributed training process must not be stopped to add/remove workers, as they can be
dynamically turned on/off during the process
• The load balancing policy must not be implemented into some central component of the DTE
(indeed it isn't implemented at all), as each worker asks for a new task when - and only when - it
is ready to elaborate another one. Hence the tasks requesting rate is self regulated by the power
and the actual utilization of the each single machine on which each worker is running. For
instance, we could launch a number of workers, on each single machine, equal to the number of
processors available on that machine.


As the task-distribution mechanism is based on the workers' requests, the master object can't
interrogate any worker to know its state. In order to maintain the current state of each task, the
corresponding worker should call the master object (at some predetermined interval of time)
sending informations about its state (task-id, current training cycle, current RMSE, etc.), so the
master object can have a complete vision of the overall distributed training mechanism.
If some worker doesn't inform the master object about its state within the determined interval of
time, then the master object can revert the state of the corresponding task in the list, so that it can be
assigned to another requesting worker.
This notification mechanism can be based on the possibility to associate a script to each neural
network that can be remotely executed at some event notification (see the documentation about the
joone’s scripting engine). The executed script will contain the code necessary to notify the master
server about the current training process.

Note: this mechanism has not been implemented in the actual version, hence the DTE is not able to
recovery/reassign a task interrupted due to a failure of the assigned worker.

5

In order to implement a predefined global optimization algorithm, a DTE Control object should be
built, containing all the needed pre/post elaboration algorithms.
A macro-definition of the Controller requirements should be ...
1. The Controller should produce all the tasks needed by a job to be completed, by applying some
kind of pre-elaboration, starting from the reading of external parameters describing that job.
2. It would utilize the master server to complete tasks it has produced. A batch of tasks should be
given to the master object for distribution/execution.
3. The Controller should be able to collect all the results from the workers, and also apply, at the
end of the job, some kind of post-elaboration, (described within external parameters) in order to
generate the new population to elaborate, by restarting from the previous step 2, unless a stop-
condition has been reached.
The following schema illustrates all the modules involved in the above mechanism and their
interactions:

Controller

pre-processing

Workers
Workers
processing Master Workers

Stop-condition
Stop
reached ?? Yes
No

post-processing

The Controller object will be based on a modular architecture, where each component can be
instantiated using a plugin mechanism, so that different global optimization algorithms can be easily
produced/implemented.
The following are some examples about the flexibility of the DTE and its capacity to be adapted to
any algorithm:
• The simplest DTE Control object could implement a 'train-once' based approach, where

6
the needed neural networks are simply created and sent out for training. After their
completion, the trained neural networks will be stored on some repository (the file
system, for instance) for a successive external elaboration.
• Another Controller would be an extension of the previous one, and in the post-elaboration
it could select the returned networks, in order to discover and store in the next population
only the best ones in terms of capacity of generalization. The process will continue using
as population the selected list of the fittest neural networks (it acts like a distillation
process), until a desired RMSE is achieved.
• A more complex DTE Control object could implement a Genetic Algorithm approach,
where, after the training, the networks are ranked, selected, crossed, mutated and sent out
again for another training/validation phase, until some predefined stop condition is
reached.

The classes implementing the above algorithms (the plugins) could be attached dynamically to the
Controller using parameters written in XML, so the user would be free to build whatever controller
capable to execute the needed global optimization technique.
Joone DTE, starting from its first release, will be published along with several plugins and some
sample XML parameter file, in order to give you out-of-box some useful and well known algorithm.
The scripting facility to control the process' behavior can be defined by the final user at two
different levels:
1. At Controller level, providing the DTE with a centralized mechanism, based on external
parameters, to control the overall global optimization process (e.g. both the pre/post
elaborations can be described and executed at this level)
2. At Worker level, by using the already existing joone’s scripting engine, providing the
user with a powerful control mechanism running remotely on the worker machine to
control on-site the training of each single neural network (for instance, the
training/validation policies of each neural network can be implemented by using this
technique in order to be executed remotely on the worker's side).


A GUI editor/controller should be provided to allow control of the DTE Control Object from a
graphical perspective. The user could then interact more easily with the DTE environment/process.
The GUI would allow the use of various implemented DTE control objects.
Note: in this first release there isn't any DTE GUI editor/controller, but in the future we'll
implement a graphical tool integrated with the actual Joone GUI Editor.

7


As said, the DTE is composed by several components, that live in three separate virtual layers of the
overall architecture:
1. The Client
2. The Master
3. The Worker

Now we'll see all them in detail, describing all the components that live on each layer.


The client is the layer from which all the Jobs are started, and to which all the elaborated Tasks
return after the remote training.
It's composed mainly by the Controller that, as described in the previous chapter, is composed by
several plugins that implement the different aspects of the Job's processing:


In this phase the Controller must prepare all the tasks that will be elaborated by the DTE. It
invokes a factory class in order to obtain all the needed neural networks.
Several implementation of the TaskFactory interface exist, as we'll see in the next chapters.


This phase is hard coded in the DTE, hence there isn't any plugin to set externally. In this
phase the Controller simply sends all the generated Tasks to the Master for the remote
training, after that collects all the returning networks, stores them on the file system and
writes a log file in XML format useful for the successive phases.


In this phase, the Controller calls the corresponding plugin to check if the process must
either continue or stop. By implementing the StopCondition interface, the user can define
whatever stop-condition policy, based on the results of the previous processing step.


This is the phase where all the returned tasks are elaborated to eventually generate new
Tasks to submit again for the remote training. The PostProcessor interface must be
implemented by the user to define whatever post-elaboration algorithm.
All the above functionalities have been implemented by the DTE framework.


The Master is represented, in this first release, by the Jinitm/JavaSpacestm distributed framework.

8
The installed services are composed by four modules:
1. The http daemon, a lightweight web server used to transport remotely the byte code of the
requested classes
2. A Lookup Service, used to discover all the running Jini services
3. The JavaSpaces, the common place where all the tasks are written/taken
4. The Transaction Manager, another Jini service used to guarantee the consistency of the
operations made on the JavaSpaces
This machine contains only the Sun's framework, hence here there isn't any custom DTE code to
install.


To implement this piece of the DTE we have used Computefarm (http://computefarm.jini.org), a
robust and powerful implementation of a compute server framework based on JavaSpaces.
Some of the features that the framework includes (from the Computefarm's home page):
• Both controllers and workers use Jini discovery to discover a local JavaSpace to use.
• Workers support class evolution (via class unloading), which makes the framework
attractive for development of parallel algorithms.
• Workers are very easy to set up and run.
• Workers are highly fault tolerant (they survive service restarts).
• The API to create parallel jobs is small and easy to work with.
The Worker simply gets a Task from the JavaSpaces (the Master) and execute it on a remote
machine. As each Task is an extension of the Jini TaskEntry object, all the Worker needs to do is to
call the TaskEntry.execute() method in order to elaborate the task, hence the Worker doesn't know
what the Task must/can do. This useful characteristic makes the framework really expansible,
because we can change the behavior of the running tasks simply by creating new objects that inherit
the TaskEntry.
As the Task bytecode is transported by the underlying Jini framework, the Worker can use any new
task without changes to its own source code (it simply makes a request for the Task's byte code and
invokes its execute method, without be worried about what must be done).
Computefarm saved us many time and the writing of a lot of code. Thanks to Tom White for this
great implementation.



• A Job describes a work to be executed by the distributed training environment
• It is composed by several Tasks
• A Job is identified by a JobID


• A Task describes a piece of work to execute remotely on a Worker to complete a Job
• It belongs to a Job and it's identified by a pair JobID+TaskID

9


All the basic components forming the DTE should be represented only by interfaces, making the
object model based on abstract contracts at which each concrete implementation should adhere.
This choice represents an advantage for the following reasons:
• A object design based on interfaces simplifies the adoption of the IoC - Inversion of Control
(called also Dependency Injection) technique, such as that provided, for instance, by the Spring
framework (http://www.springframework.org), that we have chosen to use.
• The DTE is opened to each custom implementation, having the certainty that the overall system's
behavior will be always coherent, regardless of the details of the implementation


The design of the object model is composed by a stratified architecture, as depicted by the following
class diagram of the main DTE components:

As you can see, it is divided into three logical layers:

10
The first one represents the 'Controller' of the DTE from where any job is launched and controlled.
It contains only two classes:
● The JooneJobRunner is the main class launched from the command line. It contains, in the
'main' method, all the code needed to execute the 4 phases of the distributed processing, as
described in the previous chapter. It instantiates and uses a JooneJob class.
● The JooneJob is the class that implements the processing phase, by sending all the generated
tasks to the Master for the remote training and then collecting all the results. It also contains a
pointer to the plugins that implement the other three phases of the distributed elaboration, as
we'll see in the following paragraph.
In the second layer of the UML diagram all the interfaces needed to describe the plugins are
contained, and in the third layer there are all the corresponding implementations:

● Interface TaskFactory: this interface describes the plugin that implements the pre-processing
phase, i.e. the classes that must be able to generate the initial population containing all the
networks (tasks) to train remotely. It exposes a method named getNextTask(), that returns the
next generated neural network; it returns null if there aren't more tasks to generate. Available
implementations:
● SimpleFactory: this factory simply contains a list of neural networks to elaborate, and
exposes two methods – getTasks and setTasks – to permit to initialize the neural
network list. As the neural networks to elaborate are fed from an external source, this
simple factory doesn't contain any algorithm able to generate neural networks.
● TaskListFactory: this class is able to generate all the networks described into a
TaskList, that contains a TaskDescriptor for each network to generate. It will be useful
to implement several more powerful factories, as for instance the MultiplierFactory.
● MultiplierFactory: this factory extends the TaskListFactory class and can generate N
clones of each described neural network according to some parameters: Copies indicates
how many copies we want to generate for each network; Noise sets the initial random
noise amplitude applied to each clone, in order to generate slightly different copies. If it is
set to zero, all the clones are simply an exact copy of the original neural network (indeed
not many useful in a distributed environment); Randomize is a boolean parameter that,
when set to true, reinitializes each generated neural network, instead of applying only a
noise to its internal weights/biases (useful when we don't want to preserve the initial
weights of the cloned neural network).
● ChainedFactory: this factory is simply a container of different factories, as it contains an
ArrayList of TaskFactory objects – named Chain – that can be filled by invoking the
corresponding getter/setter methods. It's very useful when the neural networks composing
a Job must be generated according several different rules. All you need to do, in this case,
is to create all the needed factories and then put them into the Chain array of this factory.

● Interface StopCondition: it describes the interface of the plugins implementing the stop-
condition phase, in order to check if some final condition has been reached. The JooneJobRunner
simply calls its done() method, and the elaboration continues only if that method returns 'false'.

11
Available Implementations:
● AlwaysStop: the done() method of this class returns always 'true', hence the DTE will
stop unconditionally its elaboration after the processing phase.
● MaxCyclesCondition: the done method of this class returns true only if the current cycle
(# of DTE iterations) is equal to the value of the parameter MaxCycles.
● RmseCondition: this class checks if the actual population contains an element having a
RMSE equal or less than a predefined value, indicated by the MaxRMSE parameter. By
setting the validation parameter to 'true', the control is made on the validation rmse,
instead of the training one, in order to stop the elaboration when the generalization
capacity of at least one network in the actual population has reached the desired value.
The Mean parameter, if set to true, stops the elaboration when the average rmse
calculated on the entire population has reached the indicated value. This class inherits the
MaxCyclesCondition class in order to assure that the elaboration stops in any case after
maxCycles iterations, even if the desired rmse has not been reached.

● Interface Selector: it describes the interface of the plugins that implement the post-process phase.
The JooneJobRunner calls its execute() method, passing it the actual population of neural
networks. The method returns a TaskListFactory containing the new generation ready to be
elaborated in the next iteration of the DTE. Available implementations:
● FittestSelector: this class selects all the neural networks having the best results in terms
of either training or validation rmse (discriminated by the 'validation' parameter). The
number of neural networks selected is equal to the value of the numTasks parameter (if
numTask is equal or greater than the actual population's size, the entire population is
returned). As you can see in the class diagram, the FittestSelector contains a pointer to an
instance   of   the  TaskListFactory  class;   this   is   because   the   next   population   could   be
generated by applying some factory algorithm to the selected networks. To give you an
example, we could select the fittest 2 networks from a population of 10, and then, by
using the  MultiplierFactory, we could create  5 copies  of  each selected  network   (by
applying them also a little noise), in order to obtain again a population of 10 networks to
submit to the next iteration (this is what I call a 'distillation' process).

Here is the sequence diagram that describes the iterations between all the above classes:

12
As you can see, the JooneJobRunner controls the overall process by calling in sequence all the
interested classes, which are obtained by asking for them to the JooneJob class, which represents
also a container of the needed plugins (look at the calls to the getControl and getSelector methods –
steps 3 and 5 of the above sequence diagram).
The different phases are executed by the following steps:
1 – 2: pre-processing/processing
3 – 4: stop-condition
5 – 6: post-processing
After the step 6, we obtain, from the instantiated Selector, a TaskFactory class (do you remember? It
contains the selected networks, i.e. the next population), which is passed to the JooneJob class (step
7), so it can retrieve the next networks to elaborate, simply by calling the TaskFactory.getNextTask
method when the iteration restarts from the step 1.



One of the most important requirements of the DTE is the possibility to implement different global
optimization algorithms without changing any rows of java code (of course we assume that the
needed plugins exist and have been declared in the application's classpath).
A great and new paradigm that we have found very useful for our purpose is a technique called
'Inversion of Control'. It is so called because it creates the entire object model by instantiating and
connecting all the requested classes simply by starting from an external definition, so the main
application can access to it without be worried about the instantiation of the needed classes.
In other words, within the main procedure of our application it doesn't exist any DTE classes'
initialization code, but simply we call some components of an external framework that does this for

13
us, simply by reading the initial configuration from a XML property file.
A very great implementation of a IoC framework is represented by the Spring Framework
(http://www.springframework.org), that we have chosen due to its robustness and its very small
footprint.
A Spring XML property file looks like the following:

   ...
   <bean id="Job" class="org.joone.dte.JooneJob">
      <property name="name">
         <value>SimpleXOR</value>
      </property>
      <property name="taskFactory">
         <ref bean="SimpleFactory"/>
      </property>
      <property name="saveFolder">
         <value>/tmp/joone/</value>
      </property>
      <!­­ A bean used as value of a property ­­>
      <property name="control">
         <bean class="org.joone.dte.control.MaxCyclesCondition">
            <property name="maxCycles"><value>3</value></property>
         </bean>
      </property>
   </bean>

The tag <bean...> contains the declaration of a java class to instantiate, whereas the <property...>
tag permits to declare a value of whatever parameter of the instantiated class. As you can see, the
representation is recursive, because we can declare a new <bean> as value of a <property...>.
When the Spring framework is asked for a bean declared in the XML file (by using as identifier the
string contained in the 'id' attribute of the <bean...> tag), it instantiates the requested class and sets
all the declared properties by invoking the corresponding setters.
By changing the above Spring XML property file, the user can instantiate different implementations
of the DTE without changing any line of java code, because, once the desired classes have been
declared in the XML parameter file, they will be instantiated by the IoC framework.
Just 'Plug Once, Play Anywhere' – in the distributed sense of the word.

As the complete description of the features of the Spring framework is out of


the scope of this paper, we recommend you to download and read the Spring's
documentation at http://www.springframework.org/documentation.html.
In the following chapters we'll assume that the reader has a basic (even if not
complete) knowledge about the syntax of the Spring's XML parameter file.

14



The DTE is composed by three main packages:
1. The Jini framework (the communication/transport framework by Sun)
2. The Joone DTE framework (the Controller: i.e. the classes described in the class diagram above)
3. The Computefarm framework (the classes representing the Workers)
The following schema describes how the above three frameworks must be distributed on the
different machines on our LAN:

Master

Jini

JoneDTE Computefarm

Client Workers

The text near each machine describes the installed framework on that box. Of course all the
machines must have installed the java virtual machine v. 1.4 or above.
You should use at least three machines, but you could collapse the Master and the Client on the
same box (in this case a dual-processor would represent the best choice).
The installed OS can be one of the following: Linux, Mac OSX, Sun Solaris, Windows 2000/XP,
but you can use whatever OS for which a porting of the JVM exists.
Now we'll see how to install the described packages on the above three typologies of machines.

15

Download and install the Jini Technology Starter Kit Version 2.0 from the following URL:
http://wwws.sun.com/software/communitysource/jini/download.html.
The starter kit contains contributed implementations from Sun Microsystems of Jini technology
infrastructure services, supporting helper classes, and services, including JavaSpaces technology.
Then get the Brian Murphy's 2.0 configuration and policy files from http://user-
btmurphy.jini.org/codemesh/jini-start-examples.zip and install them following the instructions at
http://user-btmurphy.jini.org/.
After the installation you should have a directory tree as the following:

The 'example' subdirectory is the place where the Brian Murphy's 2.0 configuration and policy files
have been installed.


Download and install the Computefarm framework from http://computefarm.jini.org/ (simply unzip
the file computefarm-x.y.zip into a directory).
This must be done for each Worker machine.


Download jooneDTE-x.y.z.zip from the download page of the Joone's web site and unzip it into a
directory.

16
A subdirectory named jooneDTE will be created, containing all the file needed to launch the
Controller components of the DTE.


The following are the instructions to launch all the needed components on the three typologies of
machines in order to startup the DTE:
1. On the Master machine open a console, cd to the Jini/example/scripts directory and launch both
the http demon and the Jini services:
wrun httpd
wrun jrmp­transient
(in a Linux environment use urun)
2. On each Worker, open a console, cd to the computefarm directory and launch:
run

3. On the Client machine, open a console, cd to the jooneDTE directory and launch:
runJob parameters.xml
where <parameters.xml> is the file containing the Spring's parameters of your own Job (see the
paragraph below)

Gone! Now your distributed training environment should be started, and your job executed.
Each console on the interested machines should show all the messages indicating the regular
execution of the requested job. However the central point where you can control the overall training
environment is represented by the console on the Client machine where you have launched the
runJob command.


As said, all the parameters of the DTE are written into a XML file using the syntax of the Spring
Framework project. We assume that the reader owns a basic knowledge of such syntax (download
and read the Spring's documentation at http://www.springframework.org/documentation.html).
In this paragraph we'll concentrate on the available implemented plugins and about how to use them
in order to implement several global optimization techniques.


The first and simpler Job we can implement is that one based on the 'train once' strategy, where the
initial population is distributed for remote training and then stored on the file system when
collected.
In this strategy we will not implement any post-processing step because it doesn't make sense if we
don't want to iterate the remote training more than once.
We will use a MultiplierFactory in order to get a serialized neural network and send to the remote
workers several clones of it.

17
First of all, we declare a TaskList instance containing the list of the neural networks to clone:
   <bean id="tasks" class="org.joone.dte.TaskList">
      <property name="tasks">
        <list>
          <bean class="org.joone.dte.TaskDescriptor">
           <property name="netFile"><value>/tmp/xor1.snet</value></property>
           <property name="netName"><value>XOR1</value></property>
          </bean>
          <bean class="org.joone.dte.TaskDescriptor">
           <property name="netFile"><value>/tmp/xor2.snet</value></property>
           <property name="netName"><value>XOR2</value></property>
          </bean>
        </list> 
      </property>
   </bean> 

As you can see, the instantiated class is org.joone.dte.TaskList, and it accepts as parameter a list of
TaskDescriptor instances. Each TaskDescriptor describes a serialized neural network by indicating
its logical name and its file name (i.e. the name and path used when the network has been exported,
for instance, by the GUI Editor).
In this example we want to start from two neural networks, xor1.snet and xor2.snet.
Now we need to create the initial population, and to do it, we'd like to train a total of 10 neural
networks, 5 of them cloned from XOR1, and the remaining 5 cloned from XOR2.
   <!­­ Creates an initial population of 10 neural networks ­­>
   <bean id="Factory" class="org.joone.dte.factory.MultiplierFactory">
      <property name="taskList"><ref bean="tasks"/></property>
      <property name="copies"><value>5</value></property>
      <property name="noise"><value>0.5</value></property>
   </bean>

The property 'taskList' points to a reference to the bean named 'tasks', declared in the previous step.
The MultiplierFactory will create 5 clones (property 'copies') of each network contained in the task
list, by applying, for each one, a random noise (property 'noise') of amplitude 0.5 to their internal
weights/biases.
Now we need to instantiate the Job that will control the overall distributed training:
   <!­­ Job ­­>
   <bean id="Job" class="org.joone.dte.JooneJob">
      <property name="name">
         <value>SampleXOR</value>
      </property>
      <property name="taskFactory">
         <ref bean="Factory"/>
      </property>
      <property name="saveFolder">
         <value>/tmp/joone/</value>
      </property>
   </bean>

the JooneJob class is provided with the list of neural network through the parameter 'taskFactory',
that points to the reference of the previously declared MultiplierFactory instance.

18
The 'saveFolder' property indicates where all the collected neural networks will be saved along with
the XML file containing the results of the remote training.
Note: the id of the JooneJob bean must be always equal to the string 'Job', because the main
application of the DTE framework searches for the JooneJob instance by using this name.

So our first Job has been implemented and, if launched as described in the previous chapter, at the
end, in the /tmp/joone directory, we will find all the trained networks along with the XML file
describing the results:
<bean id="SampleXOR" class="org.joone.dte.TaskList">
   <property name="tasks">
<list>
<bean class="org.joone.dte.TaskDescriptor">
<property name="netName"><value>null</value></property>
<property name="netFile">
  <value>/tmp/joone/SampleXOR­1.snet</value>
</property>
<property name="trainingRmse">
  <value>0.005823729896670674</value>
</property>
<property name="validationRmse"><value>0.0</value></property>
<property name="date">
  <value>Thu Aug 19 22:13:02 CEST 2004</value>
</property>
</bean>
<bean class="org.joone.dte.TaskDescriptor">
<property name="netName"><value>null</value></property>
<property name="netFile">
  <value>/tmp/joone/SampleXOR­2.snet</value>
</property>
<property name="trainingRmse">
  <value>0.0058612522840911065</value>
</property>
<property name="validationRmse"><value>0.0</value></property>
<property name="date">
  <value>Thu Aug 19 22:13:06 CEST 2004</value>
</property>
</bean>
...continues up to the 10th network
</bean>

The results are returned in a known format, because probably most of you will have recognized the
same syntax used at the beginning of this paragraph to create the initial population of networks.
In fact they are listed using a TaskDescriptor bean, and all the descriptors are contained into a
TaskList bean.
Because the listed networks have been trained, each TaskDescriptor contains also informations
about the training, as the training/validation RMSE and the date/time of the training.
This XML file can be used for several purposes:
1. As input to a custom application in order to elaborate/use the trained networks (for instance, to
load and use the fittest network based on the final training RMSE).
2. As input to a successive step of the DTE, as we'll see below

19
Thanks to the chosen XML format, hence, we are able to use the results of the distributed training in
several manners, making extremely flexible the framework.


Now we'll see how to reiterate the illustrated remote training with a new population of networks
obtained by selecting and cloning the fittest ones, in order to simulate a 'distillation' process.
The following schema illustrates the overall process:

Initial networks

MultiplierFactory
X5

Initial population
(Untrained networks)

Next population

JooneJob Master

MultiplierFactory
Trained networks
X5

FittestSelector
Fittest networks

As you can see, the initial 2 networks are multiplied by 5 to obtain the initial population of 10
networks. They are sent to the Master to be remotely trained, and when they return back, the fittest
two networks are selected and cloned, in order to obtain the next population, composed itself by 10
networks.
The above cycle continues until at least one network in the population reaches a predefined RMSE
value.
To do that, we'll declare also the stop-condition and the post-processing plugins, as described in the
following XML:

20
   <!­­ Job ­­>
   <bean id="Job" class="org.joone.dte.JooneJob">
      <property name="name">
         <value>IteratedXOR</value>
      </property>
      <property name="taskFactory">
         <ref bean="Factory"/>
      </property>
      <property name="saveFolder">
         <value>/tmp/joone/</value>
      </property>
      <property name="control">
         <!-- Stops if a network reaches the desired RMSE,
or anyway after 5 iterations -->
         <bean class="org.joone.dte.control.MaxRmseCondition">
            <property name="maxRmse"><value>0.001</value></property>
            <property name="maxCycles"><value>5</value></property>
         </bean>
      </property>
      <property name="selector">
         <!-- Selects the best 2 neural networks... -->
         <bean class="org.joone.dte.selection.FittestSelector">
            <property name="numTasks"><value>2</value></property>
            <property name="taskListFactory">
               <!-- ...and generates 5 clones of each one,
creating again a population of 10 networks -->
               <bean class="org.joone.dte.factory.MultiplierFactory">
                 <property name="copies"><value>5</value></property>
                 <property name="noise"><value>0.2</value></property>
               </bean>
            </property>
         </bean>
      </property>
   </bean>
</beans>

We have declared a slightly different Job, having set also the properties 'control' – the stop-
condition plugin – and 'selector' – the post-processing plugin.
The instantiated 'control', the org.joone.dte.control.MaxRmseCondition, stops the iterations either
when some network's training rmse is lesser than the value declared in the 'maxRmse' property, or
when 'maxCycles' iterations have been elaborated (this last condition serves to avoid an infinite
loop in the case no network reaches the desired rmse value).
If the above stop-conditions aren't verified, then a new iteration must be done.
In this case the plugin declared in the 'selector' property, the org.joone.dte.selection.FittestSelector
class, selects the two fittest networks (see the 'numTasks' property) and than, by using a
MultiplierFactory (the same class used to generate the initial population), clones each one 5 times
(see the 'copies' property) after having applied a noise of amplitude 0.2. In this manner a new
population composed by 10 networks is built, and used for remote training in the next iteration.
All that is possible because, as seen in the previous example, the output of the training phase is
represented by an XML containing a TaskList instance, and all the used plugins – the
MaxRmseCondition, the FittestSelector and the MultiplierFactory - are able to elaborate as input a
list of networks represented by a TaskList object.

21

Sometime we need to generate an initial population composed by tasks created by different creation
rules. To do it, we should use more than one TaskFactory, so the ChainedFactory has been created,
in order to resolve the above need.
Look at the following XML:
   <bean id="Factory" class="org.joone.dte.factory.ChainedFactory">
     <property name="chain">
        <list>
            <bean id="Factory1" class="org.joone.dte.factory.MultiplierFactory">
                <property name="taskList"><ref bean="XOR1"/></property>
                <property name="copies"><value>3</value></property>
                <property name="noise"><value>0.2</value></property>
            </bean>
            <bean id="Factory2" class="org.joone.dte.factory.MultiplierFactory">
                <property name="taskList"><ref bean="XOR2"/></property>
                <property name="copies"><value>2</value></property>
                <property name="noise"><value>0.5</value></property>
                <property name="randomize"><value>true</value></property>
            </bean>
        </list>
     </property>
   </bean> 

The ChainedFactory class contains a property named 'chain' that permits to declare a list of
TaskFactory objects. When invoked, the ChainFactory will call all the declared Factories and will
generate a population composed by the sum of all the created Tasks.
In the illustrated case we'll obtain a population composed by 5 networks: 3 copies of the network
XOR1 and 2 copies of the network XOR2, by applying to the two cases a different initial noise.
When many other TaskFactory plugins will be available, the importance of this class will be
fundamental, in order to create mixed populations of networks generated in several manners.

22



This initial release represents the first step toward a complete and powerful distributed training
environment. It isn't complete, we know, but its modular architecture based on plugins will permit
everyone interested to enhance it by adding several new useful components, in order to implement a
lot of global optimization algorithms.
Anyone who will build a new plugin, and believes that it could be useful for someone else, is
encouraged to send us the source code, and we'll be very happy to insert it in the next official
release (...and your name will enter forever in the Joone's contributors list).
Thanks in advance.


Maybe it's not very simple for a newbie to write the XML containing all the parameters needed to
implement a Job, so it would be very nice to have a visual tool that would simplify the writing of
whatever parametrization file, by hiding the complexity of the underlying Spring XML syntax.


As said in the introduction, the actual DTE doesn't permit to recovery the elaboration of a task when
the corresponding Worker shuts down due to some hw/sw failure.
Therefore it would be very important to implement a Task recovery mechanism that, based on some
notification mechanism, would control the execution of each remote Worker, and could recovery the
state of a died Worker by making the interrupted Task available for some other Worker in the
cluster.


Although training neural networks takes much more time than retrieving a neural network from the
Master, if really a large set of Workers are available, the Master (and/or Controller) might be
become overloaded, resulting in idle Workers which would be really wasteful.
I don't think this situation will occur fast, but in the future maybe for big projects this situation
might occur.


Another missing feature is the availability of a graphical tool to control the execution of each Job
that has been launched in the DTE. This tool could show a list of running workers and also indicate
the current state of each task and, for each one in the elaborating state, it could show where (i.e. on
which machine) it is running.
This tool could be implemented as a web application, making so very simple to access it from
whatever machine in the LAN by using a simple html browser.

23

The DTE could be very useful for any optimization problem that has some kind of probabilistic
feature or a lot of parameters to tune. We could make it more general so that it can train (Joone)
neural networks, but also any other application if it is encapsulated in some kind of DTE classes...
This would probably take some quite drastic changes in order to make the DTE more general, but it
might attract a much larger community.


Write us to suggest new features or to comment the actual implementation. Any proposal is
welcome.

24

You might also like