You are on page 1of 11

Computers & Operations Research 32 (2005) 1255 1265

www.elsevier.com/locate/dsw

A genetic algorithm for the project assignment problem


Paul R. Harper , Valter de Senna, Israel T. Vieira, Arjan K. Shahani
School of Mathematics, University of Southampton, Southampton S017 1BJ, UK

Abstract
In this paper we present a genetic algorithm as an aid for project assignment. The assignment problem
illustrated concerns the allocation of projects to students. Students have to choose from a list of possible
projects, indicating their preferred choices in advance. Inevitably, some of the more popular projects become
over-subscribed and assignment becomes a complex problem. The developed algorithm has compared well
to an optimal integer programming approach. One clear advantage of the genetic algorithm is that, by its very
nature, we are able to produce a number of feasible project assignments, thus facilitating discussion on the
merits of various allocations and supporting multi-objective decision making.
? 2003 Elsevier Ltd. All rights reserved.
Keywords: Genetic algorithms; Project assignment; Multi-objective decision making

1. Introduction
The problem of assigning projects to students arises in a number of contexts, such as assigning
summer dissertation projects to postgraduate students or sandwich course placements to undergraduates. Students are asked to indicate their preferences, in a studentproject matrix, which typically
takes the form of a scoring system where a one indicates a 9rst choice, two a second and so on
up until a prede9ned maximum number of preferences allowed (see Fig. 1). When the number of
students and projects becomes large and con<icts on project choices begin to arise, then 9nding a
project allocation to suit all students becomes harder. For example, if two students have chosen the
same project as their 9rst choice, as only one student can be allocated this project the other student
will have to be allocated another project, say his/her second choice. This second choice might however be another students 9rst choice and so we again must either deny a student their 9rst choice
or further demote a student to a lower preference, and the same problem may reoccur.

Corresponding author.
E-mail address: p.r.harper@maths.soton.ac.uk (P.R. Harper).

0305-0548/$ - see front matter ? 2003 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cor.2003.11.003

1256

P.R. Harper et al. / Computers & Operations Research 32 (2005) 1255 1265
Project:
1

Student:
1
2
3
4
:
S

2
1

4
1

5
3

1
1

P
2

Fig. 1. A studentproject matrix with S students and P projects.

An optimal solution to the project allocation problem as described above may be found in
the classical integer programming framework approach to the generalised assignment problem [1],
although in some situations there may be no feasible solution to the problem as stated, for example if
enforcing that every student must be allocated a project from their list of preferences. Whenever an
allocation is obtained, the user is usually given only one, namely the optimal, solution. An advantage
of the genetic algorithm approach as presented in this paper is that a project allocation is produced
for any studentproject matrix, and the user is given a number of diDerent solutions which may
facilitate discussion on the merits of the various allocations. This is an extremely useful property
when considering matching problems of this type, since both the student and project sponsors preferences may be taken into account.
2. The project assignment problem
Let S = {1; 2; : : : ; s} be a set of students, and let P = {1; 2; : : : ; p} be a set of projects (p s).
For i S, j P, we de9ne cij as the preference given by student i to being assigned project j. This
studentproject preference matrix (s p) contains the students preferences indicated by a value of
one for a 9rst choice, two for a second choice and so on up to a possible maximum value p. If
cij has not been assigned an integer value (i.e. student i has not included project j in their list of
preferences), then cij is assigned a penalty value B (suitably large). Additionally, we may assign
priority weights wi to each student so as to give some students a better chance of being allocated their
higher preference projects. The mathematical programming formulation of this problem is given by
minimise

p
s 


wi cij xij

(1)

i=1 j=1

subject to


xij = 1

i S;

(2)

xij 6 1

j P;

(3)

i S; j P;

(4)

j P


i S

xij {0; 1}

P.R. Harper et al. / Computers & Operations Research 32 (2005) 1255 1265

where


xij =

if project j is assigned to student i;

otherwise

1257

and
i S; j P

cij = B

if cij  {1; 2; : : : ; p}:

Eq. (2) ensures that each student is assigned exactly one project while (3) ensures that one project
can be assigned at most to one student. As lower values of student preferences are more desirable
(a value of one being the 9rst and most desirable choice of project), and as priority weights, wi ,
are in ascending order (a higher ranking indicating greater priority), then our objective function (1)
is to be minimised.
The project assignment problem forms a special case of the generalised assignment problem and
is similar in nature to both matching problems [2,3] and multi-objective assignment problems [4].
Generalised assignment problems are NP-hard [5]. Algorithms in the literature utilise exact tree search
algorithms, for example see Martello and Toth [6] or various heuristics [1,79]. Other authors have
demonstrated the advantages of a genetic algorithm approach to the generalised assignment problem
[10,11] but as yet no papers have considered this technique speci9cally for project-assignment types
of problem. The distinct advantage of the genetic algorithm approach for matching students to
projects is that a number of allocations may be produced for any studentproject matrix. This permits
multi-objective decision making without the need to explicitly de9ne these objectives in the model
formulation. For example, objectives that might in<uence the selection of the 9nal allocation (from
a choice of the best solutions produced by the genetic algorithm) may informally include the quality
of students (assigning strong students to challenging projects) and preferences on which projects
should run if there are more projects than students (desirable to run a project with a high-pro9le or
loyal company). A GA approach to the project-assignment problem as adopted here is in contrast
to other heuristic approaches, such as simulated annealing and tabu search, that operate on a single
solution.
We adopt an approach similar to that by Chu and Beasley [10] so as to generate a family of
potential solutions and then improve feasibility and optimality simultaneously. We employ a GA
structure similar to that utilised by Wilson [11] but have adapted the 9tness function given the
nature of project assignment problems. Empirical data obtained from a survey of students at the
University of Southampton con9rmed our doubts that a strictly linear scoring 9tness function is
unreasonable. For example, the disappointment in being awarded a second choice project over a
9rst is unlikely to be linearly related to that of being assigned a 9fth choice over a fourth. Our
developed GA permits diDerent 9tness functions to be de9ned and the algorithm has been coded in
Visual Basic with a user-friendly front-end for ease of use. The program has been compared with
integer programming solutions for diDerent sized data sets.
3. Genetic algorithm structure
Genetic algorithms (GAs) are search mechanisms that attempt to mimic the biological processes of
natural selection and natural evolution [12,13]. Working with a population of string structures, GAs

1258

P.R. Harper et al. / Computers & Operations Research 32 (2005) 1255 1265
Student:

Project:

s-1

Fig. 2. Representation of a chromosome.

combine survival of the 9ttest within a structured, yet stochastic information exchange framework.
In every iteration, new arti9cial creatures (strings) are created using parts of 9ttest members from
the last iteration and sometimes also incorporating new parts as a result of mutations.
The strings of genetic algorithms are analogous to chromosomes in biological systems. In natural
terminology, we say that each chromosome (or string) is composed of genes, the basic building
blocks. Genes may take on diDerent values called alleles.
We use an ordered structure (s-dimensional vector) of integers to represent the solution space.
Each chromosome (or vector) contains s genes each containing an integer number. This number
identi9es the project j (j = 1; : : : ; p) that has been assigned to the student denoted by the vector
element i (i = 1; : : : ; s) (see Fig. 2). In other words, an allele is the project number assigned to the
student. This representation ensures that constraint (2) is satis9ed and that computational eDort is
kept to a minimum.
The GA described here incorporates the main operators (a) reproduction; (b) crossover; and
(c) mutation. These operators are applied to chromosomes to create new individuals and although
deceptively simple, they are a powerful search mechanism.
The basic steps of the developed GA are therefore:
Step 1: Generate a family of initial chromosomes (solutions),
Step 2: Calculate the 1tness of each chromosome,
Step 3: Select two parent chromosomes from the family using binary tournament selection,
Step 4: Produce a child chromosome as an oDspring from the parents using a crossover operation,
Step 5: Allow the child to mutate,
Step 6: Calculate the 1tness of the child,
Step 7: Replace the least 9t member of the family with the child providing that the child is 9tter
and is not already a member of the family,
Step 8: Stop if the number of prede9ned cycles has been reached, otherwise go to step 3.
3.1. The initial family of chromosomes
First, we examine the dataset and remove assignments that can be preset. Such a circumstance
arises if a students has indicated a 9rst choice for a project that no other student has included in their
preference list. For the remaining N students, we generate an initial set of N chromosomes. Each
gene in the chromosomes is de9ned for each student by randomly assigning a project from their
chosen preferences. Let rig be the project assigned to student i (i S) in chromosome g (g=1; : : : ; N ).
3.2. The 1tness function
The 9tness of each chromosome must measure the satisfaction of students according to which
project they have been assigned. We must therefore refer back to the students preferences cij and

P.R. Harper et al. / Computers & Operations Research 32 (2005) 1255 1265

1259

the weights wi . Students place preferences in order on a linear scale from 1 to a maximum possible
value p. We adopt the following calculation to 9nd the 9tness fg of chromosome g:
fg =

h(Cirig )Wi :

(5)

i S

We look for those solutions with lower 9tness values. As a strictly linear scoring 9tness function
may seem unreasonable in a project assignment, the function h() de9ned on the preferences cij will
typically be a non-linear relationship on the students preferences. We feel that the disappointment
in being awarded a second choice over a 9rst is unlikely to be linearly related to that of being
assigned a third, fourth or 9fth choice.
A suitable choice of h() must be made. For instance, a squared function would assign penalty
scores of {1; 4; 9; 16; : : : ; p2 } to preferences {1; 2; 3; 4; : : : ; p}, respectively. By using an appropriate
choice of h(), the GA would be able to better discriminate between feasible solutions.
When calculating the 9tness of each chromosome, a project g outside a students preference list
has a suitably large value B assigned to the Cirig coeMcient (as in a mathematical programming
formulation). Additionally, if a project g has been assigned to more than one student, then a large
value M is assigned to the Cirig coeMcient. Since we look for solutions with lower 9tness scores,
the algorithm will tend towards solutions in which constraint (3) is satis9ed and attempts to assign
all students to one of their preferred projects (i S; j P; cij {1; 2; : : : ; p}).
3.3. Parent selection using a binary tournament
The binary tournament selection method [11,12] was chosen to select parents from the family
of chromosomes. Two chromosomes are chosen at random from the population. We keep only the
9ttest (the one with the lowest 9tness value). This 9ttest chromosome is thus chosen as a parent.
The process is repeated to 9nd another parent. Our two parents are now combined to produce an
oDspring.
3.4. Producing an o5spring with a crossover operation
An oDspring is constructed from the two parents using a crossover operation. GA crossover operators are implemented by a process of randomly generating one or more crossover points in the
chromosome and then exchanging subsets of genes from the parents to form a child chromosome.
We employ a generalised 1tness-based crossover operator, the fusion operator, as suggested in [11].
This algorithm produces a single child from the two parents by attempting to build on the 9tness of
the parents.
To prevent unfeasible solutions, that is solutions with two or more students assigned to the same
project entering the population, we modify the fusion operator slightly, as described below.
Let fu and fv denote the 9tness of two parents U and V , respectively. Let C be the child
chromosome to be produced. Finally, let the subscript i denote the ith gene in the chromosome
(i = 1; : : : ; s).

1260

P.R. Harper et al. / Computers & Operations Research 32 (2005) 1255 1265

If both Ui and Vi are diDerent from Ck ; k = 1; : : : ; i 1 then


1. If Ui = Vi then Ci = Ui = Vi
2. If Ui = Vi then
(a) Ci = Ui with probability fv =(fu + fv )
(b) Ci = Vi with probability fu =(fu + fv )
Else, if both Ui and Vi are already genes in Ck , k = 1; : : : ; i 1, choose at random a project
not in Ck , k = 1; : : : ; i 1, and assign it to gene Ci . Otherwise, make Ci equal to the gene Ui
or Vi not already in Ck , k = 1; : : : ; i 1. This is done for each gene to be generated in the new
chromosome, C.
This process passes parental 9tness from the 9tter parent with a higher probability, while keeping
the solutions (chromosomes) feasible.
3.5. Mutation
Child mutation provides the algorithm with a randomising element to work alongside the crossover
operators. Without mutation, the GA would more easily converge to a local minimum due to lack
of genetic diversity. The mutation process allows the possibility of introducing new genes in to the
population. For each of the s genes within the child chromosome C, the probability of changing the
value of the ith gene, by choosing a diDerent allele, is given by the mutation rate. This rate is set
by the user, the default value being 1 divided by (the size of the population times the square root
of the chromosome length) [14]. Again, to keep the chromosome viable, the new allele is chosen
among the projects not already in C.
3.6. Replacement
The child chromosome now replaces an existing member of the population, using a steady-state
model [12,13]. The least 9t existing member of the population is found and replaced with the
child only if the child is 9tter and also diDerent from any other population member. This type
of replacement has the advantage that the family of chromosomes may only become 9tter whilst
preserving genetic diversity. Moreover, this generates a number of diDerent solutions to choose from.
If the child is added to the family, then it immediately becomes eligible to be a parent and to be
replaced.
3.7. Number of cycles
The process described (from parent selection to replacement) is repeated for a number of userde9ned cycles. The values of the 9ttest and least 9t members of the population are periodically
displayed to check for convergence, thus ascertaining the need to increase the number of
cycles.

P.R. Harper et al. / Computers & Operations Research 32 (2005) 1255 1265

1261

4. The developed GA program


The project allocation GA was built in Visual Basic (version 6) and incorporates a user-friendly
front-end. For a de9ned number of students and projects, a student-project table is displayed, and
preferences may be entered into the table. The user is allowed to change the following GA parameters
on screen:

number of students (s),


number of projects (p),
number of chromosomes in the family (N ),
mutation rate,
run size (cycles).
During and after the running of the GA, the screen displays information including:

current run size,


worst 9tness,
best 9tness,
running time (s),
the best project allocation.

Student-project tables may be saved to a 9le thus avoiding the necessity to re-create the table
at a later date. An added feature of the developed program is that the population of chromosomes
obtained from a given run of the algorithm may also be saved and later loaded for a further number
of runs.
One of the advantages of the GA approach is that a number of solutions are produced (each
member of the family represents a solution). The best n solutions, (where n is user-de9ned),
may be exported from the package as a text 9le and readily placed into a spreadsheet or
database.
5. Computational results
5.1. Assigning dissertation projects to postgraduate students
The project allocation problem annually arises when attempting to assign summer dissertation
projects to M.Sc. Operational Research students. Each year at the University of Southampton there
are typically between 20 and 30 students on the M.Sc. and 30 or more projects available. Students
are allowed to choose a maximum of 9ve project preferences, ranked 15 and, as expected, the
popular projects become heavily over-subscribed. In the past, an integer program has been used
with mixed success. In some instances, a feasible solution, where all students obtain one of their 9ve
preferences could not be found and the integer program had to be rerun. Also, when an allocation
is obtained, the academic staD is given only one, namely the optimal solution. There have been
many instances when it was desirable to have access to a number of near-optimal alternatives for
discussion purposes.

1262

P.R. Harper et al. / Computers & Operations Research 32 (2005) 1255 1265

Penalty Weight
(relative dissapointment)

25
20
15
10
5
0

3
Assigned Project

Fig. 3. Nature of preferences amongst surveyed students.

The developed GA is compared to the integer programming solution given by the mathematical programming package XPRESS-MP [15]. The studentproject matrix was taken from the 1998
allocation process, which was regarded to be one of the most complex in recent years. This involved
25 students and 34 projects. A population consisting of 50 chromosomes, within the range 30 100
usually recommended in the GA literature [12,16,17] was chosen for this particular allocation study.
To 9nd a suitable choice of h(), the GA was run with various penalty weights. A squared function
was chosen to assign scores of {1; 4; 9; 16; 25} to preferences {1; 2; 3; 4; 5}, respectively. A score of
100 was assigned to a non-choice (a student assigned to a project outside his/her 9ve preferences)
and 300 for a project assigned to more than one student. It was found through experimentation that
the squared function allowed the GA to better discriminate between feasible solutions, as mentioned
previously, and enabled rapid convergence towards the optimal solution. Furthermore, the squared
function agrees with empirical data obtained from our M.Sc. students as shown in Fig. 3. 1
To evaluate the eDect of the mutation rate and population size, the GA was run with a range
of diDerent values. Fig. 4 shows the results obtained using rates 0.005, 0.02, 0.05 and 0.1 (i.e. the
probability of changing the value of a gene, by choosing a diDerent allele). This graph displays the
relationship between run size (number of cycles), the mutation rate, and the best 9tness within the
population of chromosomes over 60,000 runs.
During the initial period the 9tness converges rapidly. The higher mutation rates enabled the GA
to converge at a faster rate. There was no improvement possible with rates above 0.1 and for a
run size in excess of 60,000. The best 9tness was obtained after 49,000 iterations. Fig. 5 displays
how the best and worst 9tness within the population of chromosomes converges towards the optimal
solution (using a mutation rate of 0.1).
After 60,000 cycles, taking 55 s on a Pentium MMX 600 MHz computer, the GA was able to
produce a best and worst 9tness that were 93% and 85%, respectively of the optimal value. This
illustrates one of the advantages of this GA approach, namely that it can rapidly provide a number of
good and feasible solutions to be chosen from. The mathematical programming package XPRESS-MP
took 35 s to obtain the optimal solution.
1

From a sample of 35 students who were asked to provide disappointment scores of being awarded a second, third,
fourth or 9fth choice project relative to receiving their 9rst choice (score of 1).

P.R. Harper et al. / Computers & Operations Research 32 (2005) 1255 1265

1263

800
Mutation Rate
700

0.005
0.02

600

0.05
0.1

Best Fitness

500

400

300

200

100

50

52

54

56

58

60

52

54

56

58

60

48

50

46

44

42

40

38

36

34

32

30

28

26

24

22

20

18

16

14

12

10

0
Run Size (in thousands of iterations)

Fig. 4. GA convergence with diDerent mutation rates.

600

500

Fitness

400

Worst Fitness

300

200
Best Fitness
100
Optimal
48

46

44

42

40

38

36

34

32

30

28

26

24

22

20

18

16

14

12

10

0
Run Size (thousands of iterations)

Fig. 5. Convergence of GA towards optimal 9tness.

5.2. Computational results with larger sized studentproject matrices


The GA has also been used on publicly available assignment problem test data sets from the
OR-Library maintained by Beasley [18] and taken from literature [19]. These large data sets provided the studentproject preference matrices (s s) for diDerent values of s (100, 200, 300,
400). Each data set was slightly modi9ed for the purpose of our tests to provide cij in the range
110, with preferences above 10 excluded. The modi9ed datasets together with the GA program are
downloadable from http://www.maths.soton.ac.uk/itv/ga.

1264

P.R. Harper et al. / Computers & Operations Research 32 (2005) 1255 1265

Table 1
Computational results for large studentproject matrices
Number of projects
and students
100
200
300
400

Fittest solution
1

308
513
644
1102

317
503
706
1038

323
477
699
1043

305
515
650
1000

338
489
676
1095

Average
9tness

Optimal
solution

Computation time (s)


GA

Optimal

318
499
675
1056

305
475
626
991

95
199
317
450

2
23
60
124

The GA was run using a squared preference penalty function, a score of 100 was assigned to a
non-choice and 300 for a project assigned to more than one student, with a population size of 50
and a 9xed mutation rate of 0.02. Table 1 shows how for each data set the GA performs over 9ve
diDerent trials, showing the 9ttest solution in each trial, the average 9ttest solution and the average
computation time (in s) of the 9ve trials. For comparison the optimal solution and computation time,
obtained using ILOG CPLEX [20,21], is also shown and all values are provided on a linear scale
(sum of the preferences).

6. Conclusions
In this paper we have presented a genetic algorithm as an aid for project assignment. The developed
GA applies reproduction, crossover and mutation operators to chromosomes to create new individuals.
These three operators together act as a very powerful search mechanism.
We feel that a strictly linear scoring 9tness function seems unreasonable in the project assignment
problem, and have incorporated the possibility of using various penalty weights de9ned on the
student preferences. It was seen during experimentation that a squared function for h() works well
and agrees with empirical data obtained from our M.Sc. students.
The GA has compared well to an optimal integer programming approach, both for small and large
complex assignments, and the developed program (built in Visual Basic) is user friendly and able to
quickly produce a population of very 9t solutions. A distinct advantage of GA approach to matching
problems of this type is that it provides the user with a number of diDerent assignments to facilitate
discussion and aid the multi-objective decision-making process.

Acknowledgements
The authors are grateful for the assistance of Marta Cabo Nodar with using the ILOG CPLEX
solver and for the useful comments and suggestions made by the referees.

P.R. Harper et al. / Computers & Operations Research 32 (2005) 1255 1265

1265

References
[1] Catrysse D, Van Wassenhove LN. A survey of algorithms for the generalized assignment problem. European Journal
of Operational Research 1992;60:26072.
[2] Irving RW, Manlove DF. The stable roommates problem with ties. Journal of Algorithms 2002;43:85105.
[3] Irving RW. Matching medical students to pairs of hospitals: a new variation on a well-known theme. Proceedings
of ESA98: the Sixth Annual European Symposium on Algorithms, Venice, Italy, vol. 1461; 1998. p. 38192.
[4] Zitzler E, Deb K, Thiele L. Comparison of multiobjective evolutionary algorithms: empirical results. Evolutionary
Computation 2000;8:17395.
[5] Narciso MG, Lorena LAN. Lagrangean/surrogate relaxation for generalized assignment problems. European Journal
of Operational Research 1999;114:16577.
[6] Martello S, Toth P. Knapsack problems: algorithms and computer implementations. New York: Wiley; 1990.
[7] Lorena Lan, Narciso MG. Relaxation heuristics for a generalized assignment problem. European Journal of
Operational Research 1996;91:60010.
[8] Ross GT, Soland RM. A branch and bound algorithm for the generalized assignment problem. Mathematical
Programming 1975;8:91103.
[9] Muhlenbein M. In: Aarts E, Lenstra JK, editors. Local search in combinatorial optimization. Chichester: Wiley; 1992.
p. 13771.
[10] Chu PC, Beasley JE. A genetic algorithm for the generalised assignment problem. Computers and Operations Research
1997;24:1723.
[11] Wilson JM. A genetic algorithm for the generalised assignment problem. Journal of the Operational Research Society
1997;48:8049.
[12] Mitchell M. An introduction to genetic algorithms. Cambridge: MIT Press; 1996.
[13] Coley DA. An introduction to genetic algorithms for scientists and engineers. Singapore: World Scienti9c; 1999.
[14] Hessner J, MUanner R. In: Schwefel P, MUanner R, editors. Proceedings of the First Workshop on Parallel Problem
Solving from Nature. Lecture Notes in Computer Science, vol. 496. Berlin: Springer; 1991. p. 2331.
[15] Dash Associates, XPRESS-MP User guide and reference manual (Release 8). Northants; Dash Associates; 1997.
[16] De Jong KA. An Analysis of the behavior of a class of genetic adaptive systems. Doctoral thesis, Department of
Computer and Communication Sciences, University of Michigan, Ann Arbor; 1975.
[17] Goldberg DE. Sizing populations for serial and parallel genetic algorithms. Proceedings of the Third International
Conference on Genetic Algorithms. Los Altos, CA: Morgan Kaufman Publishing; 1989.
[18] Beasley JE. OR-Library: distributing test problems by electronic mail. Journal of the Operational Research Society
1990;41:106972. http://mscmga.ms.ic.ac.uk/info.html.
[19] Beasley JE. Linear programming on Cray supercomputers. Journal of the Operational Research Society 1990;
41:1339.
[20] ILOG Inc. ILOG CPLEX version 7.11, Arlington, US: ILOG Inc.
[21] Bixby RE. Solving real-world linear programs: a decade and more of progress. Operations Research 2002;50:315.

You might also like