You are on page 1of 110

Alexandria University

Faculty of Engineering
Computer Science and Automatic Control Department
B.Sc. Graduation Project (2004/2005)

Web Enabled Services for Railway Networks


Querying transportation networks for constrained traveling schedules

Supervised by:

Prof. Dr Nagwa M. El Makky


Dr Amr El Masry

Presented by:

Ahmed Farouk El Amir


Ahmed Mohamed Aglan
Peter Guirguis Soccar
Raymond Wadie Gaballah
Acknowledgement
We are grateful to:

Prof. Dr Nagwa El Makky


who:
Directed and encouraged us along the whole journey ☺
Told us when to work and when to study.
Never overloaded us with work, and never let us waste time
Offered us a lot of her time and effort in revising and corrected every tiny
issue in the project.

Nothing can be said but "She is the most kind and caring person we have ever
seen".

Dr Amr El Masry
who:
Was the main reason behind our love and passion to algorithms
Helped us analyze and understand the problem well
Never underestimated or offended any of our thoughts

Eng. George Anwar


who:
Was always there, with his lovely smile, to answer all of our questions
Was always telling us that we can make it.

Eng. Edward Elia (vise president of Egyptian Railways)


who:
Was always welcoming, and helpful in facilitating our connection with the
Central Department of Information and Computer Systems of the Egyptian
Railways in Cairo.

Eng. Hassan Kamel Tosson (president of the Central Department of Information and
Computer Systems Egyptian Railways)
who:
along with his staff, answered every question about the existing system and
even took us in a walk through the whole system.

Prof. Dr Adel Lotfy Mohammaden (Dean of the Faculty of Engineering)


who:
Trusted us and gave us a recommendation letter to communicate with
Egyptian Railways, as representatives of the faculty.

i
Summary
This project aims at designing and implementing Web enabled services for railway
networks.

The main feature our system offers is a query facility, which gives all the possible
journeys from a departure station to an arrival one. These journeys are not only the direct
train trips, but also the indirect journeys that consist of more than one direct train trip,
which may be more suitable for the user over the counterpart direct trips. The journeys
proposed by the system are optimized for time under multi-constrains such as the
departure date and time, the classes and the maximum number of train exchanges.
Implementing such a feature involves a multi constrained graph search. It may seem a
simple graph search problem, but in fact, the complexity of the problem is due to the
multi-edged railway network that is constrained by traveling schedules.

The system provides a reservation facility that increases the utilization of each seat in a
train, by choosing the seat to reserve using some heuristics, not randomly. The system
provides also online database administration tools that facilitate the data entry process for
the employees.

Developing and deploying this project requires considering the existing railway system,
making minimal assumptions and achieving high performance and reliability. We have
proposed a four-tier system architecture: a database server tier, an application server tier
that isolates the algorithm complexity and allows algorithm reuse, a web server tier that
stores the Web pages and enables multi-client access to the system and a thin client tier
that is composed only of a Web browser.

ii
Contents
Summary
Acknowledgement

1. Introduction………………………………………………………………………1
1.1. Motivations…………………………………………………………………...1
1.2. Past and Present systems…………………………………………………...…1
1.3. Objectives of the project………………………………………………...……1
1.4. Organization of the report……………………………………………….……2

2. Background…………………………………………………………………….…3
2.1. Different System Architectures……………………………………………….3
2.2. Java Server Pages Technology……………………………………………..…4
2.3. Overview of some searching strategies…………………………………….…5
2.4. Railway network terminologies………………………………………………6

3. Statement of the problem ……………………………………………….………7


3.1. Railway network…………………………………………………………...…7
3.2. Different constraints………………………………………………….…….…7
3.3. System Architecture………………………………………………….…….…8
3.4. Implementation Environment………………………………………..………10

4. Suggested Routing Algorithms and their drawbacks ………………………..12


4.1. Flooding……………………………………………………………………..12
4.2. Journey-wise approach………………………………………………………14

5. Proposed Routing Algorithm……………………………………………….….19


5.1. The Routing Algorithm (GAlgorithm) ……………………………………...19
5.2. Comparison between search strategies………………………………………25
5.3. Time and space complexity analysis ………………………………………..26
5.4. Implementation of the comparator…………………………………………..28
5.5. Producer-Consumer paradigm………………………………………………34
5.6. Auxiliary Classes…………………………………………….……………...40

6. Reservation …………………………………………………………….……….54
6.1. Reservation Problem……………………………………………….…..……54
6.2. Available Techniques………………………………………………..………55
6.3. UML Class Diagram…………………………………………...……………56
6.4. Pseudo Code…………………………………………………...…………….57

7. Database Analysis and Design …………………………………………………58


7.1. R Functional analysis……………………………………………………..…58
7.2. Data Modeling………………………………………………………….……62
7.3. The Entity Type Specifications (ETS) …………………………………...…66
7.4. Logical Database Design……………………………………………………71
7.5. Data integrity………………………………...………………………………77
7.6. Physical Design…………………………………...…………………………82
7.7. Concurrency Control……………………………...…………………………83

iii
7.8. Data Recovery………………………………………….……………………84
7.9. Administration Tools……………………………………………..…………85

8. Conclusion and future work…….…………………...…………………………86


8.1. Conclusion………………………………………………...…………………86
8.2. future work……………………………………………………………..……86

References…………………………………………………………….……………..88

Appendix A: Routing and Reservation Case Study


A.1. Routing Case Study …………………………………………………...A-1
A.2. Reservation Case Study ……………………………………………….A-7
Appendix B: User Guide
B.1. Passenger guide………………………………………………….…… B-1
B.2. Administration guide …………………………………………….……B-4

iv
Chapter 1
Introduction
1.1 Motivations
Choosing a train journey from Alexandria to Cairo is not difficult, because there are many
direct trains. But when trying to go from Sohag to Fayed, for example, it gets really annoying,
because there are many trains passing by both cities but with no direct connection. The user
may choose to take a direct train to an intermediate main city, like Cairo, and then ask for the
next train for Fayed. Unfortunately the user may not find a free seat on that train and may have
to wait for a long time to catch another one. This project aims at designing and implementing a
Web enabled system that solves such kinds of problems.

Reserving a seat in a train is an easy task if all the passengers reserve their seats for the whole
trip, but it is not always the case. Some passengers reserve seats for a certain segment in the
middle of the trip, leaving these seats free in the other segments. Choosing the seat to reserve
using some heuristics, not randomly, will increase the utilization of each seat in a train. The
proposed system offers a reservation facility to solve this problem.

Finally, the data entry process may be a tiring and an exhaustive process for railways
employees, unless it is well organized. Providing employees with database administration tools
that are user friendly and well documented will facilitates the process.

1.2 Past and Present systems

Past railway systems in Egypt were non-computerized. The only way to get information about
trains and their schedules was to ask an employee; for an employee it is difficult to provide all
the possible detailed train journeys that can connect two stations, especially when there are no
direct connections between them.

Although present railway systems are computerized, like the Virtual Machine Environment
(VME) system that is used internally in the National Organization for Egyptian Railways, they
are not Web enabled. We cannot ignore that some Web sites that provide railway information
exist, but they only provide static information. For example www.touregypt.com Web site
contains only static tables for train schedules. Unfortunately, users may lose many better
solutions for journeys between cities with no direct railway connections between them. This
approach has not yet been implemented for Egypt’s railways.

1.3 Objectives of the project

The project aims at designing and implementing an interactive Web enabled system that
supports querying and reserving train journeys between two cities. Online payment is out of
our project’s scope. The proposed system finds all feasible railways connections with a
reasonable cost between two cities ranked by arrival time. Although the project targets

1
Egyptian Railways, it can be used to serve other railway network as well as similar
transportation network.

The proposed system is developed to meet the requirements of passengers who need a fast and
efficient facility for querying and reserving a railway journey between two cities which have
no direct train trips between them. Egyptian Railways can use this system to improve its
service, and to decrease the load on information-desk clerks at stations.

1.4 Organization of the report

The report is organized as follows:

Chapter 1. Introduction: This chapter introduces the project motivations, past and present
systems, the objectives of the project and the report organization.
Chapter 2. Background: presents the different system architectures, Java Server Pages (JSPs)
technology, some graph search techniques and some terminologies.
Chapter 3. Statement of the problem: defines the railway network, shows the problems of the
routing algorithm, and presents our system architecture and implementation environment.
Chapter 4. Suggested Routing Algorithms and their drawbacks: discusses the different
approaches invoked before settling on the final solution.
Chapter 5. Proposed Routing Algorithm: discusses the details implementation the routing
algorithm.
Chapter 6. Database Analysis and Design: discusses the details of designing and implementing
the underlying database system in our project.
Chapter 7. Reservation: discusses the available strategies for solving this problem, presents the
selected strategy and shows the design and the implementation of this strategy.
Chapter 8.Conclusion and future work: shows our conclusion and the suggested future work.

2
Chapter 2

Background
This chapter introduces some topics that are related to the project, like the different
system architectures, Java Server Pages (JSPs) technology, some graph search
techniques and some terminologies.

2.1 Different System Architectures


Architecture is the subject of design and implementation as it reflects the spatial
arrangement of application data and the spatial and temporal distribution of
computation (processing).

The minimal configuration of a Web application is the one so-called two tiers
architecture, shown in figure 2.1, which closely resembles the traditional client-server
model. The only difference from client-server model is that in the two-tiers solution
clients are thin (browsers only), i.e., they are lightweight applications responsible
only for presentation. Web pages, application logic and data are on the server side.
In fact embedding the Web pages in the application, as well as binding the application
with the data, are ugly design decisions. The last thing a data manager needs to see is
the application logic and the last thing the application developer cares about is the
Web page. It’s clear that some sort of discrimination needs to be done.

A more advanced configuration, shown in figure 2.2, separates the application logic
from data, introducing the model so-called three tiers architecture.. But still a further
improvement can be done.

Here comes the even more advanced configuration so–called four tiers architecture
(figure 2.3), which separates the Web pages from the application and resides them on
a separate web server.

The four tiers configuration achieves independency which leads to


1. Simplicity and transparency of implementation
2. Ease of update
Beside those previous advantages, a major benefit is gained by this separation of
layers which is reusability, now any Web server that knows how to use the one
application server, can use it to run the algorithm on the database it specifies,
provided that the database conserves the model (format) that the application is used
to. That’s why it may be needed to develop an interfacing application that
understands any database and parses it to the form needed by the application that runs
the algorithm.

3
Figure 2.1: Two tiers architecture

Figure 2.2: Three tiers architecture

Figure 2.3: Four tiers architecture

2.2 Java Server Pages Technology


Java Server Pages (JSPs) are similar to HTML files, but provide the ability to display
dynamic content within Web pages. JSP technology was developed by Sun Microsystems
to separate the development of dynamic Web page content from static HTML page
design. The result of this separation means that the page design can change without the need
to alter the underlying dynamic content of the page. This is useful in the development life-
cycle because the Web page designers do not have to know how to create the dynamic
content, but simply have to know where to place the dynamic content within the page.

How Java Server Pages work:

4
Java Server Pages are made operable by having their contents (HTML tags, JSP tags and
scripts) translated into a Servlet by the application server. This process is responsible for
translating both the dynamic and static elements declared within the JSP file into Java
Servlet code that delivers the translated contents through the Web server output stream to
the browser.
Because JSPs are server-side technology, the processing of both the static and dynamic
elements of the page occurs in the server. The architecture of a JSP/Servlet-enabled Web
site is often referred to as thin-client because most of the business logic is executed on the
server.
The following process outlines the tasks performed on a JSP file on the first invocation of
the file or when the underlying JSP file is changed by the developer:
• The Web browser makes a request to the JSP page.
• The JSP engine parses the contents of the JSP file.
• The JSP engine creates temporary Servlet source code based on the contents of the
JSP. The generated Servlet is responsible for rendering the static elements of the JSP
specified at design time in addition to creating the dynamic elements of the page.
• The Servlet source code is compiled by the Java compiler into a Servlet class file.
• The Servlet is instantiated. The init and service methods of the Servlet are called,
and the Servlet logic is executed.
• The combination of static HTML and graphics combined with the dynamic
elements specified in the original JSP page definition are sent to the Web browser
through the output stream of the Servlet's response object.

Subsequent invocations of the JSP file will simply invoke the service method of the
Servlet created by the above process to serve the content to the Web browser. The Servlet
produced as a result of the above process remains in service until the application server is
stopped, the Servlet is manually unloaded, or a change is made to the underlying file,
causing recompilation.

2.3 Overview on some searching strategies


This section introduces some famous graph search strategies and the difference between
them, like Breadth First Search (BFS), Depth First Search (DFS), Uniform Cost Search,
Greedy Best-First Search, A* (pronounced A-star) Search.
It is important to note that search strategies like BFS, DFS, Uniform Cost that use no
heuristics are called "Uninformed", while A* and Greedy which use heuristics are called
"Informed"
We need to introduce two notations OpenList which is a list of nodes that we have not yet
expanded (meaning we haven't searched below them) while ClosedList is a list of nodes
that we have visited and searched beyond.

2.3.1 Uninformed Search Strategies

Breadth First Search "BFS"


BFS extracts the front element from the OpenList and inserts its successors in the end of
the OpenList. It is similar to level-wise traversal.

5
Depth First Search "DFS"
DFS extracts the front element from the OpenList and inserts its successors in the front of
the OpenList. It is similar to pre-order traversal.
Uniform Cost Search
The Uniform Cost search is a bit more pensive about how it choose nodes for exploration
in searching for solutions.
It extracts the Least-Cost element from the OpenList and inserts its successors, with their
associated costs, in the OpenList.

2.3.2 Informed Search Strategies

Informed Search Strategies use a heuristic function, h (n) that estimates cost of cheapest
path from node n to the goal

Greedy Best Search

Greedy extracts the node with the least expected cost h (n) from the OpenList, and inserts
its successors, with their associated expected costs, in the OpenList.

A * (A Star)
It combines two costs
• f(n) = g(n) + h(n)
– g(n) = cost to get to n from start
– h(n) = estimated cost to get from n to goal
Similar to others it extracts the node with the least combined cost f (n) from the
OpenList, and inserts its successors, with their associated combined costs, in the
OpenList.

2.4 Railway network terminologies

• Hub (stop): a certain train passing by a station at a certain time.


• Segment: a direct train connection between two successive stations in a certain
train.
• Route: an identifier of a physical railway connection (i.e. railway lines)
• Trip Frequency: a bit stream that identifies which days of the week the trip runs
in.

6
Chapter 3

Statement of the problem


The main problem in the project is finding an algorithm that gives the set of feasible
journeys between two cities under multiple constrains (multi-constrained graph search
problem). We have to get the optimal solution for the problem given an objective
function and a set of constraints. In this chapter we are going analyze the railway network
and solve the problem of choosing the metrics of optimizing the journeys and the metrics
that will be constrains on the proposed journeys. Finally, we present the system
architecture and the implementation environment.

3.1 Railway network


The railway network includes a set of railway lines, trains, stations and time schedules.
This network is not represented by a simple graph. A simple graph is a set of nodes and
edges without self loops or multiple edges between two given nodes. In the railway
network there are self loops, which represent waiting in a station for another train to
continue a journey. Also there are multiple edges which represent multiple direct trains
between two given stations at different times.

Although the railway lines represent a static simple graph, the set of stations, train trips
and their schedules do not. At a certain time or date some trains are available, others are
not. Thus the corresponding graph is a dynamic one (time variant).

To represent the railway network with a single-edged graph, each station is represented
by many hubs. A hub is a train at a certain station at a certain time. This approach
eliminates the multi-edged property and the self loops, but there is still the problem of the
time constraint, which adds the difficulty that not all these edges are valid at any time.
Also, each edge of these edges can not be assigned a cost because some of the constraints
do not depend only on an edge between two stations, but also depend on the previous
choices of edges, as it will be declared in the next section.

3.2 Different constraints


We have to get the optimal solution for the problem given an objective function and a set
of constraints. In optimizing the railway journey we have many metrics for each journey.
Some of these metrics should be chosen for optimization and the others to be constraints

Some of the metrics that characterize any journey are: period of the journey, cost,
departure and arrival times, number of stops, number of train exchanges in the journey,

7
type of these trains and many more metrics. Using these metrics to construct a weighted
edge between each couple of hubs is difficult, because some of these metrics do not
depend only on an edge between two stations, but also depends on the previous choices
of trains, like the number of train exchanges in the journey, this number depends on the
previous choices of trains so it can not be assigned to a certain edge in the graph as it
does not depend on the couple of hubs.

Some of these metrics do not follow the transitivity rule, like the cost. Imagine a trip that
passes three hubs in that order a, b and c. The cost between the hubs (a,c) is some time
less that the sum of the costs between (a,b) and (b,c). For this reason it is difficult to
optimize the journey with respect to the cost.

As a result of these reasons, the arrival time metric is chosen to be the objective of our
optimization problem, and the other metrics are constrains on the journeys. The system
cannot ask the passenger to limits to all the set of constraints. So in order to decrease the
number of solutions that the passenger will choose from, the passenger will be asked
some questions. The questions are: the preferred departure date and time, the preferred
classes and the maximum number of train exchanges. By answering these questions some
of the solutions are discarded and the other solutions are displayed sorted by their arrival
time. As the numbers of solution can be exponential, the solutions are displayed in
batches. The user can change the batch size as required.

3.3 System Architecture


Our proposed system is four-tier architecture: a database server tier, an application server
tier that isolates the algorithm complexity and provides algorithm reusability, a Web
server tier that stores the Web pages and enables multi-client access to the system and a
thin client tier that is composed only of a Web browser, as shown in figure 3.1. The four-
tier architecture achieves isolation which leads to simplicity, transparency of
implementation and ease of update and reusability.

Figure 3.1: four-tier architecture

8
Each tier consists of some modules as shown in figure 3.2.

Client WebServer Application Server Database Server

Query Routing Algorithm


User pages (Multi-constrained
search algorithm)

Reservation Reservation
pages Algorithm
Employee Database

Data manipulation
Adminstration bean
page

Figure 3.2: system architecture

Client tier:
It consists only of a browser (thin client), which is used by the user and the
employee.

Web Server tier:


It contains the SERVLET and JSPs which handle the requests from the users and
employees and return the responses to these requests. These requests enable the
user to perform the routing query and reserve a seat in a train, and the employee
to manipulate the data or reserve tickets to the passengers.

Application Server tier:


It contains modules of the system that performs the routing algorithm (multi-
constrained graph search algorithm) in order to find the possible journeys for the
user, the modules that apply the chosen reservation technique to choose the seat to
reserve, and the modules that connect to the data base and get the data required
and for manipulating the data.

Database Server tier:


It contains all the necessary data for the railway system, like the data that
represents the physical network, the trip schedules and the reservation data.

9
3.4 Implementation Environment
In this section, the implementation environment components are presented. These
includes the database management system (DBMS), the application sever and the web
application technology. The selected tools are:

• DBMS: Oracle 8i
• Web application server: Apache Tomcat Version 4.1.12
• Web application technology: Java Server Pages (JSP)
• Java IDE: Oracle JDeveloper

The use of a DBMS:

We had to choose whether to use a DBMS, or to rely on the traditional file processing
approach. We chose the former approach (using DBMS), and the following comparison
will explain the reasons behind our choice.

The problems that appear in the file processing approach are:

• Update problems: when an item of information has to be changed, it may need to


be changed in several different places in the database.

• Inconsistency problems: over time, it is possible that the database may contain
two different values for the same data item in two different places, because some
update operation did not catch all of the places that need to be changed.

• Data isolation problems: it is not easy in such a system to pull together a report
containing all the information stored on one particular entity, since it is scattered
over many files.

• Concurrency problems: if the data is contained on a multi-user system as in our


case, it may be that two different users might access the same data item
simultaneously. If both are trying to update it, inconsistencies could result.

• Security: In a file processing system, security must be done on a file by file basis:
any user having access to a file has access to all the fields in it.

• Integrity Constraints: often, the values of certain items in a database are logically
constrained to only certain possibilities. It is desirable for software that modifies
such an item to ensure that the new value obeys the appropriate constraints. This
is difficult since each program that accesses the data must know and apply the
constraints.

10
A database management system approach breaks the tight coupling between application
programs and data, by putting a software layer in between:

Users
Application Programs
DBMS
Actual data files

Application programs that need data do not get it directly from the files where it is stored,
but rather from the DBMS, which in turn gets it from the file. Application programs are
not allowed to access the data directly.

In addition, the database contains META-DATA, data about the data, which takes the
form of a data dictionary, which contains a standard name for the data item which
application program uses to access it, in which file the data item is stored, security
constraints and integrity constraints.

The DBMS is responsible for the concurrency control; it can ensure atomicity of
transactions. The DBMS can allow multi-user access to the data by managing accesses in
such a way to prevent inconsistency.

The use of JSP:

Java Server Pages simplify the delivery of dynamic web content. They enable Web
application programmers to create dynamic content by reusing predefined components
and by interacting with components using server-side scripting. JSP technology can run
on many Web servers and application servers, including the Sun ONE Application
Server, Microsoft’s Internet Information Services (IIS), Apache HTTP Server and IBM’s
WebSpeher application server.

11
Chapter 4

Suggested routing algorithms and their


drawbacks
This chapter introduces the different solutions that were approached to solve
the problem, namely Flooding and Journey-wise approaches along with their
drawbacks. These approaches were investigated before settling on the final
approach that is introduced in chapter 5, so uninterested readers may skip this
chapter and go directly to chapter 5.

4.1 Flooding
The flooding approach to solve the problem follows no criterion in selecting the next
journey to continue in, but it is helpful in finding all possible solutions and comparing
them with our proposed algorithm. It has asymptotical exponential order in both time
and space. We also implemented this approach in our prototype.

Main methods

Function name:
runPrototype

Input:
String from
String to
Date depDate
Time depTime
int classType
int maxExchanges

Output:
Array of DetailedJourney

Pseudo code:
• Connect to the database.
• Get the serial number of the source and destination stations.
• Run the flooding algorithm by calling the function runAlgorithm.
• Convert the output of the function runAlgorithm into an array of
DetailedJourney.
• Return the array of DetailedJourney.

12
Function name:
runAlgorithm

Input:
int sourceStation
int destinationStation
Time departureTime

Output:
Array of Journeys

Pseudo code:
• Connect to the database.
• Get all the hubs of the source station that their departure time is after the
departureTime required by the user.
• For each hub
• Construct a journey with a FROM_WAIT state that begins with this
hub.
• Enqueue this Journey to a queue.
• While the queue is not empty
• Dequeue a Journey.
• If the Journey’s state is FROM_MOVE
o S Å the station of the last hub in the Journey.
o Get all the hubs of S that their departure time is after the
currentTime of the Journey.
o For each hub
- Construct a new journey with a FROM_WAIT state that
contains all the hubs of the Journey and this hub.
- Enqueue this new journey to the queue.
• Get the next hub to the last hub in the Journey.
• If there exist a next hub
o Add this hub to the Journey.
o Make the state of the Journey FROM_MOVE.
o If this hub is one of the hubs of the destination station
Add this Journey to the output array of Journey.
o Else
Enqueue the Journey in the queue.
• Return the output array of Journey.

13
4.2 Journey-wise approach
The nature of our problem forces multiple constrains on the route that the passenger
could take from source to destination, we noticed that the main character of a certain
journey is its exchanges, (i.e. leaving a train and taking another one at some station).If
we could specify the exchanges in a journey, that means we had solved the problem.
This approach takes this point in consideration.
The main feature of this approach is that it beholds the network with respect to train
journeys. So we can call it journey-wise approach.
The term “trainJourney” here represents a record carrying the stops, prices, time
schedule (e.g.departure and arrival times, which days of the week it runs in, and what
are the off days in the year) for a specific train journey.

Each station should have a bit stream, where each bit in the stream represents a
specific trainJourney.
For example: Banha station has a bit stream 0000101101.
This means: trainJourney 5, 7, 8, 10 pass by Banha station while the rest don’t.
Now, if we AND the bit streams of 2 stations we could find which trainJourney
passes by the two cities. We call this procedure (i.e. the ANDing procedure) the
connection.
Thus if we begin with Banha station as a start point at a certain time we could find
connections with all stations. Thus we could construct a table with the trainJourney
we may take from the start station to each station and their corresponding times. We
will call this a step (i.e. finding the connection from the source station to all station
under certain schedule constraints).
If we could develop a dynamic algorithm that traverses the stations with respect to
steps, the required algorithm's rule is to determine the order of steps to make.
The main advantage here is that, before each step we could always compare between
connections that lead to it and discard useless and redundant ones. It is a mean of
clustering.

One proposed search strategy (here a search strategy means, selection order of steps)
is Dijkstra algorithm. The problem of Dijkstra algorithm is that it provides only one
solution but our output should contain more than one alternative. Also Dijkstra needs
the costs on edges to get the station with the minimum cost, which we will make the
next step from.
The problem now is choosing the station to make a step from it. As the order of
stations is the only guarantee of finding minimal solution.
Other hints
1- Number of solutions at any station, at anytime is less than some polynomial
function of the number of stations, in order to guarantee a certain complexity of the
algorithm.
2- Throw the sink nodes that will never be a part of our solution.

A proposed method:

By logical thinking of the problem, we could see that the next step should be from the
station where the trains reaches first.
So, we will construct an openList, firstly it will contain the source station at the given
departure time.

14
The algorithm repeats the following process: extract the station with the minimal
departure time from openList, make a step from it and add all resulting connections to
the openList. It keeps looping until the extracted station is the destination itself. It
would be the first solution. If we continue extracting and making steps we could find
the following solutions.
Let us assume the passenger clones himself and takes all the trains leaving from the
source ( i.e. make a step from the source) so he will reach all possible destinations at a
certain time. The first clone of him that reaches any destination will clone himself
again and make a step. And so on. It is simply “Dijkstra algorithm". Where the metric
is the time. But without throwing any solution.

Simply, It is a sweep line over time, we always extract the station with minimal time,
so at any given time we are sure we already extracted all reachable connections so far,
then if we could reach destination in two feasible times T1 and T2 and T1 is less than
T2 we are sure we extracted T1 and all stations leading to it before extracting T2 that
means we will find T1 first.

That is why this algorithm will reach the destination surely in the minimal time (get
the minimal time)
So it could be terminated when the next STEP becomes the destination it self
By now we could find the Minimal solution.
To find more solution (may be cheaper or with fewer connections) we could do one of
the following approaches:
1- Continue the algorithm until the sweep line reaches a multiple times of the
minimal solution (i.e. if minimal duration is 1.5 hours we could find all
possible solutions to twice this duration (3 hours))
2- Make a similar algorithm with a sweep line over cost (money). And find
solution with minimum financial cost, now we have 2 solutions : A(
minimal_time , A_cost) and B( B_time, Minimal_time )
If we run time sweep line till B_time and cost sweep line till A_cost. We will
get all solutions. Which are not worst than A, B the rest of solutions must be
worst than both of them.

Example 4.1: Consider the following network

A B C D E

Figure 4.1: a simple railway network

15
A1 B1 C1 D1
5:00 6:00 7:00 8:00

A2 C2 E2
4:00 6:00 8:00

A3 E3
2:00 9:00

B4 D4 E4
6:30 7:00 7:30

Figure 4.2: a railway network's train schedule

Train# 1 2 3 4

A 1 1 1 0
B 1 0 0 1
C 1 1 0 0
D 1 0 0 1
E 0 1 1 1
Table 4.1: bit stream representation of each station in the example

Trace:
For going from A to E, we begin by
Step 1:
ANDing the string of A with each one:

B { (1,”6:00”) }
C { (2,”6:00”) , (1,”7:00”) }
D { (1,”8:00”) }
E { (2,”8:00”) , (3,”9:00”) }

Step from B(1,6)

A{}
B{}
C{(2, ”6:00” ) , (1, ”7:00” ) }
D{(1,4, ”7:00” ,wait=0:30) , (1,”8:00”)}
E {(1,4, ”7:30”, wait= 0:30 ) , (2,”8:00”) , (3,”9:00”) }

Step from C (2, ”6:00” )

A{}

16
B{}
C{(1, ”7:00” ) }
D{(1,4, ”7:00” ,wait=0:30) , (1,”8:00”)}
E {(1,4, ”7:30”, wait= 0:30 ) , (2,”8:00”) , (3,”9:00”) }

Step from C(1, “7:00”) cancel it as last step was from C too (delete (1, ”7:00” ))
Step from D(1,4, ”7:00” ,wait=0:30)
A{}
B{}
C{}
D{ (1,”8:00”)}
E {(1,4, ”7:30”, wait= 0:30 ) , (2,”8:00”) , (3,”9:00”) }
Destination is minimal time = “7:30” end algorithm
Solution =
Source Destination intermediate trains wait
A E A,B,D,E 1,4 0:30
A E A,C,E 2 0:0
A E A,E 3 0:0

Complexity analysis:

N : number of stations
M: number of trains
In time domain:
Since we don’t repeat steps Æ we have at most N steps (steps < M* N)
Each step involves N “anding” operationsÆ N^2
Each “anding” operation involves M bitsÆO(M*N^2)
So the total order O(M^2 * N^3)

In space domain:
We guarantee that for every station “S”, we store only one entry for each
distinct arrival time; if a tie happens (two ways reach “S” in same time) we
choose only one of them based on some other criteria like (number of train
exchanges or waiting time or cost….etc), so for every station there will be at
most M entries, so total space is O(M^2 * N^2)

17
The previous assumption is not acceptable, consider this example:

B1
1:00

A1 B2 C
0:00 1:30 2:00

A2
0:30

We can't discard one of the two solutions we should provide them both to the user

That is why this approach cannot be implemented because it discards solution to keep
the space complexity of polynomial order in number of hubs, or it will keep all
solutions but the space would be exponential.

This problem is solved in our implemented approach using two phase mechanism; the
first phase builds a graph in which the solution are implicitly kept, and the second
traverses it to get all solutions.

18
Chapter 5

Proposed Routing Algorithm


This chapter focuses on our proposed routing algorithm; in fact it is the core of the project.
Each section contains the proposed solution along with its implementation (UML or pseudo
code). The first section illustrates how the Routing Algorithm (Galgorithm) module works.
The second section discusses the comparator module. The third discusses the time and
space complexities of the searching techniques used in the proposed routing algorithm, the
fourth discusses the implementation of the comparator. The sixth section illustrates the
Producer-Consumer paradigm and its use and implementation. While the last section
discusses some auxiliary classes necessary for the implementation.

5.1 The Routing Algorithm (GAlgorithm)


The potential problem is that the number of solutions is exponential in the number of hubs,
so keeping track of all paths at a certain hub leads to exponential space requirement. The
trick we have made is constructing a graph that keeps track of the expanded hubs and edges
between them, so it implicitly contains all solutions. Although the construction and the
space requirement of the graph is polynomial, traversing this graph produces all possible
solutions (whose number may be exponential in the number of hubs). We overcame the
exponential traversal order through the Producer-Consumer paradigm which will be
discussed shortly.
The algorithm is composed mainly of two phases; the first phase is the construction of the
graph, represented in the method called runAlgorithm. The second phase is the traversal of
the graph and the storage of the obtained paths (solutions) in a buffer.
The runAlgorithm method operates as follows: it uses an open list of unexpanded hubs. It
extracts the most desirable hub from the open list. The desirability of the hub is determined
by the search strategy used, which is determined by the algType (stands for "algorithm
type") parameter sent to the runAlgorithm. (The comparison section 5.1 discusses in detail
the different search strategies, and compares between them). Whenever a hub is extracted
(i.e. expanded), it generates two children hubs: the first one represents the next hub (stop)
in the same train, while the other represents the next train departing from the same station
of the expanded hub. These children hubs are supposed to be inserted in the open list.
Unfortunately, simply inserting the children hubs in the open list may lead to replication
(inserting the same hub twice through different parents), and consequently all
grandchildren hubs of the replicated hubs will be replicated too. The key solution resides in
checking whether the new hub, that is to be added in the open list, already exists in it, (or in
the closed list in case of the A* search strategy, that is discussed in the comparison
section). In this case we just need to add another parent pointer from this hub to the hub
that it was generated from (its parent), without inserting it again in the open list. Since no
hub is replicated in the open list so its size is limited by the total number of hubs.

19
The following diagram illustrates this idea:

A
A

B C
B C

D D D

E F
E F E F

G
G G G G

Figure 5.1: shows the difference between replicating nodes in the open list and only adding parent pointers

Every generated hub has a pointer or two (maximum) to its parent(s), the graph is
constructed in this way. Using this graph we are capable tracing all the possible routes from
this hub up to the source.
Why we need only two parent pointers?
Because any hub can be generated either through being the next stop in the train of its
parent or through being the next train in the same station of its parent. (i.e. it will have two
parents at most).
Why do we use backward not forward pointers?
Because the traversal should be done backwards. (i.e. from the goal hub to the source hub),
in order not to waste time in misleading paths (those who reach no goal) which may happen
if forward pointers is used instead. What we are saying that, backward pointers will
eventually lead to the source hub unlike forward pointers which may reach no goal.

A
A

B C
B C

D
X X
X D X
E F
E F

G
G
Figure 5.2: Difference between forward and backward pointers

20
UML and pseudo code of routing algorithm

Figure 5.3: Routing Algorithm (Galgorithm) UML

class GAlgorithm {
constructor(Buff b) {
SÅnew Stack() to be used in the traversal method
this.bÅb
}

boolean checkEnd() {
if (OL is empty) {
return true
}
return false
}
The algorithm terminates when the open list is empty

int validateAndAddIfValid()
{
ArrayList tempÅnew ArrayList() to be added in the buffer
temp.add(S.elementAt(0))
for ( i=1 to S.size()-1)
{

21
if (station of hub i differs from that of hub i+1 or hub i-1) which means it is not an
exchange
temp.add(S.elementAt(i))
}
temp.add(S.elementAt(S.size()-1));
countedExchangesÅ0
iÅ0
while (countedExchanges less than or equal permitted exchanges and i less than
temp.size()-1)
{
if (station of hub i is the same as that of hub i+1)
increment countedExchanges
increment i
}
if (countedExchanges is greater than permitted exchanges) {
discard this solution and return 0
insert this solution in the buffer
return 1
}

The validateAndAddIfValid method checks the validity of the solution in the sense that it
does not contain train exchanges more than the permitted and compresses the waiting
period in the same station. Finally it adds the solution in the buffer if it is valid.

boolean checkLoop(Hub h,Hub newHub)


{
if (newHub.station()==h.station()) return true;
for (int i=0;i<S.size();i++)
{
if (((Hub)S.get(i)).station()==newHub.station()) return false;
}
return true;
}

Method checkLoop checks if the new hub to be added does not introduce loops (traverses
the same station twice) as this solution has a time equivalent and more economic
counterpart as a waiting in the station that is traversed twice in the first solution.

int traverse()
{
noOfPathsÅ0
while (the stack is not empty) {
hÅ S.pop( )
if (h.station( ) is the source station)
{
S.push(h)
noOfPaths+=validateAndAddIfValid()
S.pop()
}
else

22
if (h.goLeft is true and checkLoop(h,h.parent1)) {
S.push(h)
S.push(h.parent1)
h.goLeft Åfalse
}
else
if (h.goRight is true and h.parent2 is not null and checkLoop(h,h.parent2)) {
S.push(h)
S.push(h.parent2)
h.goRightÅfalse
}
else {
h.goLeft Åtrue
h.goRight Å true
}
}
return noOfPaths
}

The stack initially contains the goal, the method traverse is used to get all paths from this
goal to the source through the constructed graph. It is an iterative implementation to the
depth first search; it is implemented iteratively because we need to keep track of the path
itself (i.e. hubs in the stack).
goLeft and goRight are flags to indicate whether traversing the parent1 and parent2 link is
allowed.

respondToNewRequest(from, to, ODate, OTime, classType, exchanges, algType, FCC,


TKTType) {
destinationÅto
connectorÅnew instance of DBconnector(from,to, ODate, OTime,
classType, exchanges, algType, FCC,TKTType);
switch (algType) {
case 0:
runAlgorithm(new QueueForBFS(), algType) the open list should be a queue
break;
case 1:
runAlgorithm(new QueueForDFS(), algType) the open list should be a stack
break;

case 2: in best first greedy the open list is a heap

case 3: in Uniform Cost the open list is also a heap

case 4:
runAlgorithm(new Heap(), algType) in A* the open list is a heap

}
}

23
The method respondToNewRequest is used to decide the type of the open list, whether it is
a simple queue or a priority queue (heap). The blind search techniques like BFS and DFS
use a queues while the greedy algorithms like A* and Uniform Cost use a priority queue.

runAlgorithm(OpenList OL, int algType) {


startHubÅconnector.getFirstHub()
OL.enqueue(startHub)
Put startHub in open list's hash table

while (!checkEnd()) {
h ÅOL.extractFront()
remove h from open list's hash table
if A* put it in the closed list hash table
if (h.station() is the destination) {
a solution is found
S.push(h)
countAddedSolutions += traverse() which traverses(stack containing real goal h)
continue
}
h1 = connector.getNextHubInTheSameTrain(h)

if (h1 is not null) {


h1InOL Åsearch for h1 in open list hash table (and closed list hash table if A*)
if (h1InOL is not null) {
OL.addParent(h1InOL, h)
}
else
{
OL.enqueue(h1, h) enqueue and add parent pointer in cell
Put h1 in open list's hash table
}
}
h2Åconnector.getNextTrain(h)

if (h2 is not null) {


int h2InOL Å search for h2 in open list hash table (and closed list hash table if A*)
if (h2InOL is not null) {
OL.addParent(h2InOL, h)
}
else {
if (!(h.move is1 and h.exchanges ==exchanges))
{
OL.enqueue(h2, h) enqueue and add parent pointer in cell
Put h2 in open list's hash table
}
}
}
}
}

24
5.2 Comparison between search strategies

The general algorithm is implemented in a way that enables changing the search strategy
(the order in which the nodes are expanded). The search strategy affects the overall spatial
and temporal complexity. The different strategies we discuss and implement are as follows:

Blind techniques: breadth first search (BFS) and depth first search (DFS)
Best first techniques: Greedy and A* (pronounced "A star") (informed)
Uniform Cost

Note: the term cost, time and time cost are used interchangeably because our cost metrics is
the time (the optimal path is the one that reaches the destination earliest).
The blind techniques are so called because they do not prefer nodes to expand over others
in any logical manner. The BFS expands the nodes in a level wise manner, i.e. it expands
the shallowest unexpanded node. The DFS follows a preorder manner in expanding the
nodes, i.e. it expands the deepest unexpanded node. Both of these techniques are not
appealing because they are not optimal (i.e. they do not always find the shortest path to the
goal). The Greedy technique is a special case of the best first techniques; it expands the
node that is expected to be closest to the goal (the estimation of the time to reach the goal is
calculated through an estimation function) (H(n)). In graph structures with loops the
Greedy technique may get stuck, i.e. it is not optimal, but in our model there are no loops
(as each node when expanded generates two nodes with advanced time attribute, so no
node can reinsert its parent), so the Greedy technique is complete (i.e. it finds a solution if
any exists). Like BFS and DFS, the Greedy technique is not optimal; it can reach goals
through a path worse than the optimal. The A* is another special case of the best first
techniques, its evaluation function (F(n)) that calculates the desirability of the nodes adds
the time cost so far (i.e. from the source to the current node) (G(n)) to the estimate to
destination to avoid the paths that are already expensive. It is complete but its optimality
depends on the estimation function. The estimation function needs to be admissible (i.e.
underestimates the true time cost) for the A* to be optimal. The Uniform Cost technique
expands the least cost unexpanded node, it uses no estimations, but it uses the time cost so
far. It performs like a time sweep line which was discussed before in the journey wise
approach section. It is complete and optimal.
As we can see, all the techniques are complete, that is because the nature of the
GAlgorithm does not allow child nodes to regenerate parent nodes, as the children are after
the parents in time.
Concerning optimality the admissible A* and the Uniform Cost are the only optimal
techniques. So our comparison will concentrate on them.
As we said before a hub is a train in certain station and a certain time, every node
represents a hub. When a hub is expanded (removed from the open list) the two children it
generates are the next hub (stop) in the same train, and the next train from the same station.
Thus every hub can have at most two parents only.
As we explained before by figure 5.1 the proposed solution avoids adding the same node in
the open list more than once by adding only a link to its new parent (the one that will cause
its insertion again). In this scheme the only condition not to replicate a hub, is that when it
is to be inserted for the second time, it shouldn't have been expanded yet (i.e. still in the
open list).

25
Although it might seem that the A* expands less nodes to reach the goal through the
optimal path, it suffers from another problem that may counter its appealness. The problem
is that there is no guarantee that the node that is extracted from the open list (expanded)
will not be inserted in it (regenerated) again.
Consider the following case:

B1
G=1:00
H=2:00
F=3:00

A1 B2 C
G=0:00 G=1:30 G=20:00
H=10:00 H=2:00 H=---
F=10:00 F=3:30 F=---

A2
G=0:30
H=10:00
F=10:30

Figure 5.4: admissibility's effect on A* complexity

From A1, the next station in the same train (trip) is B1, while the next hub in the same
station is A2 (i.e. the next train that departs from station A after 0:00 o'clock)

When the GAlgorithm runs, the first hub to be inserted in the open list will be A1
(OL={A1} CL={}), which when extracted generates A2 and B1 with time costs 10:30 and
3:00 respectively.(OL= {A2,B1} CL={A1}). The next hub to be extracted is B1 (OL=
{A2,B2} CL={A1,B1}). Then extract B2 (OL={A2,C} CL={A1,B1,B2}). Then extract A2
(OL= {B2,C} CL={A1,B1,B2}), here is the problem, do we add B2 again in the OL? If we
do so the time complexity can grow to an exponential order in the number hubs. But an
important thing to note is that all the nodes in all the paths that lead to the optimal goal, is
eventually expanded before the goal is reached, this is due to the admissibility of the A*
estimation function. So all paths to a goal will be established before the goal is reached and
the graph is traversed back to find those solutions. This means that it will never be too late
to add a parent pointer from B2 to A2, as it will never be needed in traversal before
establishing it. So the solution to the proposed problem is search for a hub identical for the
one to be inserted, not only in the open list but also in the closed list, and whenever it is
found, just add a link from it to its parent. This search is not so expensive if we used a
suitable data structure, the most efficient structure for such a situation is a hash table.

It is important to mention that searching in the closed list is needed in all search strategies
except the Uniform Cost, as the previous situation cannot happen using the Uniform Cost
search strategy. That is because it operates like a time sweep line (i.e. the parents are
always inserted before their children).

26
5.3 Time and space complexity analysis

BFS and DFS techniques

The problem is that there is no guarantee that all paths to a goal will be established before
the goal is reached and the graph is traversed back to find those solutions. We traverse the
graph and produce solutions whenever a goal is reached, as we need to be able to produce
some solutions before the graph is completely established (see Producer-Consumer
paradigm). For the previous example on extracting A2 and inserting B2 in the open list, a
goal might have been already reached and the paths to it already obtained (i.e. we will not
traverse the graph back from this goal again), so adding a link from B2 to A2 will not do,
and the solutions that paths through this link will not be discovered. So to be sure that all
possible solutions are discovered B2 must be added in the open list again, not just linked to
A2. The fact that a node (hub) may be inserted in the open list more than once causes the
time and space complexity to be of exponential order in hubs number, as the graph may be
transformed into a tree with repeated nodes. See figure …
Note : in general DFS needs a linear space for its open list, but in our case the nodes that is
removed from the open list is not removed from the memory, but they are still resident in
the constructed graph, that is why the space complexity is still exponential in the number of
hubs like that of BFS.

Greedy technique

Beside that it is not guaranteed to find the optimal path, it also suffers from the same
problems as BFS and DFS, so its time and space complexity is the same. But in the real
time analysis, it may behave better than BFS and DFS as it is more informed (not blind).

A* and Uniform Cost techniques

Since all paths to a goal will be established before the goal is reached and the graph is
traversed back to find those solutions (A* uses admissible heuristic and Uniform Cost is a
sweep line on time). There is no need to reinsert nodes in the open list; we could just add a
link in the constructed graph, which means that the graph size can never exceed the number
of hubs in the database. So the space complexity is linear in the number of hubs.
Concerning time complexity, on inserting each hub we need to search for it in the open list
or (open list and closed list) for Uniform Cost and A* respectively. Using a good hashing
technique for the search, the total time-complexity will be the product of number of hubs
and the order of searching a hash-table of size equals the number of hubs added to the order
of insertion in the open list. The upside of Uniform Cost over A* is that the A* needs more
data from the database to evaluate the estimate time cost from a current hub to the
destination, and A* needs to search in the closed list too. While the upside of A* over
Uniform Cost is that for large graphs the A* performance (run time) will be better as does
not waste time in misleading routes, as it uses a heuristic (estimate of time cost to
destination).

27
5.4 Implementation of the comparator
To compare between different search strategies, the algorithm takes two parameters: the
open list and the algType that specifies the evaluation function. The open list can be a
priority queue (heap) or a queue with insertions at head (QueForDFS)(i.e. a stack), or a
queue with insertions at its tail (QueForBFS) (FIFO queue). The heap is used in the
Greedy, Uniform Cost and A* strategies, while the stack is used in DFS, and the FIFO
queue is used in BFS. The order of extracting the front (which is the minimum), as well as
the insertion is logarithmic in the number of elements in the heap (hubs). The order of
extracting the front or insertion in the stack or the queue is constant.

Conclusion:
Search strategy Open list Evaluation Space Time
function complexity complexity
BFS Queue 0 2^n 2^n
DFS Stack 0 2^n 2^n
Greedy Priority queue H(x) 2^n 2^n * lg 2^n
(heap) = n*2^n
Uniform Cost Priority queue G(x) n n * lg n
(heap)
A* Priority queue F(x)=G(x) + H(x) n n * lg n
(heap)
n: number of hubs
x: the current hub
G(x): true time to reach station of x from the source
H(x): estimate of time to reach the goal from station of x
Note: lg(n) is the order of insertion or extraction front (minimum) from a heap (priority
queue).

28
UML and pseudo code of comparator

The following UML shows that QueForBFS, QueForDFS and Heap are all OpenList
implementations, this is done through inheritance as shown. QueForBFS and QueForDFS
have a common implementation for method extractFront, that is why it is implemented in
GQueue which they extend.

Figure 5.5: OpenList hierarchy UML

29
• OpenList Class

The abstract class OpenList is used to act as a base class for the Heap and GQueue
classes. It has an ArrayList to hold hubs harr (stands for hubs’ array). It enforces its
derived classes to implement methods enqueue(Hub) and extractFront(). It has a
default implementation for method enqueue(h, parent of h) where it uses the
abstract method enqueue(Hub) and the addParent(Hub) method of class Hub.

Abstract class OpenList {


harr
abstract Hub extractFront()
abstract enqueue(Hub h)

addParent(index, parent) {
harr.get(index)).addParent(parent)
}

enqueue(Hub h, Hub parent) {


hubÅh
hub.addParent(parent)
enqueue(hub)
}
}

30
• Heap Class

The class Heap extends abstract class OpenList, it represents a priority queue where
every element at its front must always be the minimum value element. The time
complexity to enqueue a new element, or to adjust the heap after extracting the
element at its front, is O (log n) where n represents the number of elements in the
heap. Beside the adjustHeap, enqueue and extractFront methods implemented in the
heap, method searchfor is inherited from class OpenList, this method is used to
indicate whether an element already exists in the hub before its addition in order not
to allow replicated elements.

class Heap extends OpenList


{
adjustheap(root) {
if (there is less than two elements)
return
currentroot Å harr.get(root)
child Åroot * 2 + 1 index of left child
while (child is less than or equal harr.size()-1) {
if ( (child less than harr.size()-1) and
( ( harr.get(child)).value is greater than
(harr.get(child + 1)).value)) right child is smaller than left
child++ make it right child

if (currentroot.value is greater than ( (Hub) harr.get(child)).value)


{
harr.set( (child - 1) / 2, harr.get(child))
child Åchild * 2 + 1
}
else
break
}
harr.set((child-1) / 2, currentroot)
}
enqueue(h) {
harr.add(h)
child Å harr.size()-1
while (child is not less than 1) {
if ( (harr.get(child)).value is less than
(harr.get( (child - 1) / 2)).value) {
tempÅharr.get(child)
harr.set(child, harr.get( (child - 1) / 2))
harr.set( (child - 1) / 2, temp)
child Å (child - 1) / 2
}
else
break
}
}

31
Hub extractFront() {
if (harr is not empty) {
temp Å harr.get(0)
harr.set(0, harr.get(harr.size()-1)) copy the last element at the root
harr.remove(harr.size()-1) remove the last element
adjustheap(0)
return temp
}
else
return null
}
}

32
• GQueue, QueueForBFS and QueueForDFS Classes

The abstract class GQueue extends OpenList to implement method extract fron that
is identical for both its derived classes QueueForBFS and QueueForDFS. The
difference between those two classes is that the first implements method
enqueue(Hub) to insert elements at the tail of the queue (harr), while the second
inserts elements at the head of the queue.

abstract class GQueue extends OpenList{

public Hub extractFront() {


hÅ harr.get(0)
harr.remove(0)
return h
}
abstract public void enqueue(Hub h)
}
_______________________________________________________________________
class QueueForBFS extends GQueue{
public void enqueue(Hub h) {
enqueues at end
harr.add(harr.size(),h)
}
}
_______________________________________________________________________
class QueueForDFS extends GQueue{
public void enqueue(Hub h) {
enqueues in front
harr.add(0,h)
}
}

33
5.5 Producer-Consumer paradigm
In spite of all the trials we have made to reduce the complexity of the algorithm, the fact
that the number of solutions can be exponential cannot be avoided. Consequently if the
implementation of the algorithm was designed to produce all the possible solutions as a
bulk before responding back to the user, the user may have to wait for a time of exponential
order (the number of solutions). So our implementation had to be dynamic in the sense that
it can respond back to the user with a subset of the solutions of convenient size, while still
producing the rest of solutions.

One of the well known approaches to solve such problems is the Producer-Consumer
Paradigm, in which the producer and the consumer share a buffer, the producer can
produce items in the buffer whenever there are new items and the latter permits insertion,
and the consumer can consume the next available item from the buffer whenever it exists.
The buffer has two pointers one of them locates the next item to be consumed, we call this
pointer "readFrom", the other pointer locates the next nearest empty cell where the next
new item can be placed. In our implementation we keep all produced solutions in the buffer
(i.e. no solutions are overwritten by newer ones) to enable the consumer to re-consume
them upon the users request. Obviously the second pointer's value is always equal to the
buffer's size, so there is no need to use a separate pointer. The buffer permits insertion
whenever the difference between the "readFrom" pointer and the insertion location (the
end pointer of the buffer) is less than or equal to a specific size (we call it "window"). The
buffer permits consumption whenever the difference between the "readFrom" and the
insertion location is greater than zero.

The main appealing merit of the Producer-Consumer paradigm is that it can be


implemented in a straight forward way through multi-threading. The producer and
consumer can be two separate threads sharing a common buffer. When the user issues a
new query two threads are instantiated and started: the algorithm runs as a producer thread,
and it keeps producing solutions as long as the buffer permits. The other thread is the
consumer thread which responds to the user with the first "window" of solutions, and then
it waits for the user to ask for the next "window". After the first "window" is returned to the
user, the next "window" will always be ready in the buffer, so the user only has to wait for
the transmission time not the computation time. In this scenario the user can see all
solutions if he has time, and the response time will always be polynomial not exponential.

Solution1
Solution2
Solution3
Solution4 ÅreadFrom
Solution5
Solution6

Figure 5.6: The buffer's view after consuming the first "window" using window size of three

34
Method finish of class Buffer is called by the producer to indicate that there will be no
more solutions to insert, and notify the consumer thread so that it won't wait for a complete
batch size to be ready in the buffer, instead it will consume whatever is there and return.
As mentioned before in section … the GAlgorithm loops while there is a possibility of
finding more solutions, and on finding a new solution it inserts it in the buffer or the
producer waits if the buffer does not permit insertion at this moment.
The Consumer's methods getFirstBatch and getNextBatch are called by the JSP pages. The
getFirstBatch is invoked when the user issues a new query and presses "Submit" button.
The results window will be displayed in the output page that contains a "Next" button to
invoke the getNextBatch method when pressed, and so on.

35
UML and Pseudo Code of Producer-Consumer implementation

Figure 5.7: Producer Class UML


Producer extends Thread{
constructor (request parameters, buffer)
{
Initialize instance variables (fields) with the request parameters
}

run()
{
Instantiate an instance of the GAlgorithm Class
Call method respondToNewRequest (fields)
Call the buffer.finish() method that assigns false to the producerNotFinished
flag.
}
}

36
Figure 5.8: Consumer Class UML

Consumer extends Thread{


constructor( ) {
start this
}

Window getFirstBatch(batch size, request parameters){


bufÅ a Buffer with batch size as a parameter
prodÅ a new Producer thread with request parameters and the created buffer as parameters
suggest a call to the garbage collector to free the possible old producer space
start prod
return getNextBatch(batchSize)
}

Window getNextBatch(batchSize) {
For (i=0 to batchSize){
Object oÅbuf.consumeNext()
if (o is not null) add o Object to Window
}
return Window
}

Note: Window can be an ArrayList of size equal to the batch size

37
Figure 5.9: Buff Class UML

Buff {
constructor(batch size) {
producerNotFinishedÅtrue
this.batchSizeÅbatchSize
bufÅnew ArrayList()
}

synchronized Object consumeNext(){


while (readFrom>=buf.size() and producerNotFinished){
wait()
}
if (readFrom>=buf.size()) return null
Object oÅget the element pointed to by readFrom in buf
increment readFrom
notify the producer thread
return(o)
}

synchronized insert(Object o){


while (buf.size-readFrom>batchSize){
wait()
}

38
add the object o in the buf
notify the consumer thread
}

finish(){
producerNotFinishedÅfalse
notify the consumer thread that there will be no more solutions
}
}

39
5.6 Auxiliary Classes

5.6.1 Hub Class

The Hub class is used to represent the node in the heap or the graph; it represents a certain
trip (i.e. a train at a certain time) in a certain station. It has the following instance variables
(fields):
number: an identifier of the hub, it is composed of the trip number
concatenated with the station number.
value: will hold the time cost (i.e. the desirability of the hub)
arrivalTime: the time at which the station is reached by the train
hubDate: the date at which the station is reached by the train
parent1: a pointer to the first hub that generated this hub
parent2: a pointer to the second hub that generates this hub
move: a flag to indicate whether this hub is last reached from an
exchange of a train or a move.
exchanges: a counter that holds the minimum number of exchanges
needed to reach this hub.
goLeft: a flag to indicate whether the parent1 link should be used
(again) in traversing the graph of hubs or not.
goRight: a flag to indicate whether the parent2 link should be used (again) in traversing
the graph of hubs or not.

The method addParent of class Hub is used to do the following:


Add a pointer to the hub that generated this hub in parent1 pointer or parent2 pointer. If the
parent link is added in parent1 pointer, goLeft is set to true, but if the parent link is added in
parent2 pointer, goRight is set to true.
If the station of this hub differs from that of its only parent, then the hub inherits the
number of train exchanges in its parent hub.
If the station is the same as that of the parent hub, and the parent hub was not reached from
a move then the hub inherits the number of train exchanges in its parent hub, else the the
hub inherits the number of train exchanges in its parent hub incremented by one.
Note: the flag "move" is appropriately set in the constructor of the hub.
If parent1 is not null (i.e. the hub now has two parents) the parent with the smaller number
of exchanges is used. If both parents have the same number of exchanges then the hub
inherits this number as it is, as it must be generated from a move from one of them. And the
goRight flag is set to true.

40
UML and Pseudo Code of Hub Class

Figure 5.10: Hub Class UML

class Hub {
number
intSize
value
arrivalTime
hubDate
routeCode
parent1
parent2
move
exchanges
goLeft
goRight

constructor(trip,station,hubDate,action,arrivalTime,routeCode)
{
goLeftÅfalse
goRightÅfalse
intSize Å 65536
this.routeCodeÅrouteCode

41
moveÅaction
numberÅstation+trip*intSize
this.hubDateÅhubDate
this.arrivalTimeÅarrivalTime
}

long trip()
{
return number/intSize
}

long station()
{
return number % intSize
}

setValue(value)
{
this.valueÅvalue
}

addParent(Hub parent)
{
if (parent1 is null)
{
parent1Åparent
goLeft=true;
if (parent.station()is not this.station())
{
exchangesÅparent.exchanges
}
else
{
if (parent hub was not reached from a move)
{
exchangesÅparent.exchanges
}
else
{
exchangesÅparent.exchanges+1
}

}
}
else
{
if (parent1.exchanges is greater than parent.exchanges)
{

42
if (this.station() is the same as parent.station())
{
if (parent was not reached from a move)
{
exchangesÅparent.exchanges
}
else i.e. parent.move eauals 1
{
exchangesÅparent.exchanges+1
}
moveÅ0
}
else
{
exchangesÅparent.exchanges
moveÅ1
}
}
else if (parent1.exchanges is less than parent.exchanges)
{
if (this.station() is the same as parent1.station())
{
if (parent1 was not reached from a move)
{
exchangesÅparent1.exchanges
}
else
{
exchangesÅparent1.exchanges+1
}
moveÅ0
}
else
{
exchangesÅparent1.exchanges
moveÅ1
}
}
else i.e. parent1.exchanges equals parent.exchanges
{
exchangesÅparent.exchanges
moveÅ1
}
parent2Å parent
goRightÅtrue
}
}
}

43
5.6.2 DBconnector Class

Since we use the four tiers architecture, we need an interface between the application tier
and the underlying data base tier. This interface is represented in the DBconnector class,
which encapsulates all the interactions between the database and application tiers. The
modularity of the design facilitates changing the database entirely, and only the
DBconnector class will need to be modified. The Details of the underlying database design
and implementation is discussed in chapter 7.

UML and Pseudo Code of DBconnector Class

Figure 5.11: DBconnector Class UML

44
DBconnector {
constructor(request parameters, algType,connectionString) {
conn Å create a new connection(connectionString)
maxVelocityÅgetMaxTrainVelocity()
setParameters(request parameters, algType)
}

The constructor creates a connection to the database, gets the maximum train speed
and keeps it in the instance parameter maxVelocity, then calls setParameters with the
request parameters and the algorithm type as parameters.

ArrayList getFares(trip,distance)
{
rsÅexecuteQuery
"select TRN_TYPE_CODE_FK from TRIP where trip_code_fk = trip"
trainTypeÅ get train type from rs
rsÅexecuteQuery
"select CLASS_CODE, TKT_FARE from FARE where DSTNC = distance
and TRN_TYPE_CODE_FK = trainType and FCC_CODE = FCC and
TKT_TYPE_CODE = TKTType"

For each record in the rs{


class Åget class code from rs
fareÅ get ticket fare from rs
if ( class is accepted by user)
add this class fare
}
return fares
}

The getFares method is used to get the fares of all accepted classes (by the user) for a
specific trip and distance.
long getStationCode(stationStr){
rs ÅexecuteQuery
" select STN_CODE from station where name = stationStr"
return(station code from rs)
}

The getStationCode method takes a string representing a station's name and returns
the corresponding station code (number).
String getStationName(stationCode){
RsÅexecuteQuery
" select name from station where STN_CODE = stationCode"
return(name from rs)
}

The getStationName method takes a long representing a station's code and returns the
corresponding station name (string).

45
Time getTimeCost(h)
{
switch(algType){
case 0:
case 1:{ blind technique BFS or DFS
tÅ(0, 0, 0)
break
}
case 2:{Uniform Cost : g(n)
tÅ(h.arrivalTime+totalSeconds(h.hubDate))
break
}
case 3:{ h(n) for greedy (best first)
tÅgetEstimateToDestination(h)
break
}
case 4:{ A*
tÅ(h.arrivalTime + totalSeconds(h.hubDate)+ (getEstimateToDestination(h)))
break
}
}
return t
}

The getTimeCost method is used to evaluate the desirability of the hub according to
the algType field. In case of blind search like BFS and DFS technique all hubs have
equal desirability, so the returned value is zero. In case of the Uniform Cost search
strategy the returned value is the time cost to reach this hub from the source hub (i.e.
the arrival time). In case of greedy search (Best First) the returned value is the
estimate time cost of reaching the goal from this hub. In case of the A* search
strategy the returned value is the sum of the time cost to reach this hub and the
estimate of reaching the goal from it.
Connection connectToDB(string connectionString) {
DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver());
conn ÅDriverManager.getConnection
(connectionString)
return conn
}

The connectToDB method takes a connection string, e.g.


("jdbc:oracle:thin:@localhost:1521:ORCL", " para3", " para3") to connect to a
particular database server. The connection is kept opened for the duration of the
session.

Hub getNextTrainFromSameStation(trip, station, hubDate, hubTime){


rs ÅexecuteQuery " select trip_code_fk, day_offset,DPTR_TM
,VIA_ROUTE_FK from trip_segment where stn_from_fk = station and
DPTR_TM > hubTime and trip_code_fk <> trip order by DPTR_TM asc"
for each record in rs{

46
newTrip Å get trip code from rs
offset Å get arrival day offset from rs
arvlTimeÅ get arrival time from rs
routeÅget route code from rs
if (class is accepted by user)
{
if (checkDate(newTrip,offset,hubDate)==1)
{
Hub tempÅ new
Hub(newTrip,station,hubDate,0,arvlTime ,route)
temp.setValue(getTimeCost(algType,temp))
return temp
}
}
}
}

The getNextTrainFromSameStation method, as implied by its name, is used to get the


next train departing from a particular station after a particular time.
Hub getNextHubInTheSameTrain(Hub h) {
rs ÅexecuteQuery
" select stn_to_fk, DPTR_TM, ARVL_TM ,VIA_ROUTE_FK from
trip_segment where trip_code_fk = h.trip() and stn_from_fk =h.station()"
station Åget station from rs
departure Åget departure time from rs
arrival Å get arrival time from rs
route Å get route time from rs
if (departure time is after arrival time){
tempÅnew Hub(h.trip(),station,new GregorianCalendar(h.hubDate.year,
h.hubDate.month,h.hubDate.day_of_month+1),1,arrival.getTime(),route)
temp.setValue(getTimeCost(algType,temp))
return temp
}
else{
tempÅnew Hub(h.trip(),station,h.hubDate,1,arrival.getTime(),route)
temp.setValue(getTimeCost(algType,temp))
return temp
}
}
The getNextHubInTheSameTrain method, as implied by its name, is used to get the
next hub (stop) in the same trip (train).

Time getArrivalTime(Hub h){


rsÅexecuteQuery " select ARVL_TM from trip_segment where trip_code_fk
= h.trip() and stn_from_fk = h.station()"
arrival Å get arrival time from result set
return (arrival)

47
Time getDepartureTimeOfSegment(Hub h)
{
rs ÅexecuteQuery " select DPTR_TM from trip_segment where trip_code_fk
= h.trip() and stn_from_fk = h.station()"
departure = get departure time from rs
return (departure)
}
int checkClasses(long newTrip)
{
rs Å executeQuery " select distinct (CLASS_CODE) from TRIP_VEHICLES
where trip_code_fk = newTrip"
foreach record in the result set{
class = get class code from rs
if (class is accepted by user){
return 1
}
}
return 0
}
The method checkClasses returns 1, if there exists at least one class accepted by the
user in the given trip.
int getTrueDistance(from, to, route)
{
rs ÅexecuteQuery
"select DSTNC from SEGMENT where STN_FROM = from and STN_TO
= to and VIA_ROUTE =route"
return ( distance from rs)
}
int checkDate(newTrip, offset, hubDate)
{
dtÅnew GregorianCalendar(hubDate.year,hubDate.month,hubDateday_of_month-
offset)
rsÅexecuteQuery " select trip_code_fk from trip_instance where trip_code_fk =
newTrip and trip_date = dt"
int found Å get trip code from rs
if (found==newTrip)return 1
else return 0
}

The method checkDate is used to check if there is an instance of the given trip in the
given date.
Hub getNextTrain(Hub h) {
tempÅgetNextTrainFromSameStation(h.trip(),h.station(),h.hubDate,hubTime);
if (temp equals null)tempÅgetNextTrainFromSameStation(h.trip()
h.station(),new GregorianCalendar(h.hubDate.year,h.hubDate.month,
h.hubDate.getday_of_month+1)
,new Time(0))
return temp
}

48
The getNextTrain method is used to get the next train from the station of the last hub.
Firstly it tries to find a departing train in the given date, if none exists, it tries to find
the first departing train in the next day. Waiting for more than two days is not
permitted.
Hub getFirstHub() {
tempÅgetNextTrainFromSameStation(-1,from,ODate,OTime);
if (temp equals null)temp=getNextTrainFromSameStation(-1,from,new
GregorianCalendar(ODate.year,ODate.month,ODate.day_of_month+1),new
Time(0))
return temp
}

The getFirstHub method tries to get the first train to depart from the source station
satisfying the request parameters. It firstly tries to find a train in the day submitted by
the user after the given time, if it fails, it gets the first departing train in the next day.
Waiting for more than two days is not permitted.
Coordinates getCoordinates(long station){
rs = stmt.executeQuery
" select X_DSTNC, Y_DSTNC from station where STN_code = station"
x Å get X coordinate from rs
y Å get Y coordinate from rs
return( new Coordinates(x,y))

}
long getMaxTrainVelocity(){
rs ÅexecuteQuery
" select max(velocity) from M_TRN_TYPE "
return( velocity from rs)
}

Time getEstimateToDestination(h) {
Coordinates sourceÅgetCoordinates (h.station())
distanceÅsource.distanceTo(destinationCoord)
return Time(distance/maxVelocity)
}
The getEstimateToDestination method estimates the time cost to reach the destination
station from the given hub's station. The time is evaluated by dividing the straight line
distance by the maximum train velocity. (i.e. the maximum velocity of the fastest
train), and that is to ensure that the estimate is admissible (i.e. less than or equal to the
true time needed). The admissibility constraint is required to ensure the optimality of
the A* search strategy.
setParameters(request parameters) {
Stores the request parameters in instance parameters to ease their reusability.
}
}

49
5.6.3 DetailedJourney

The class DetailedJourney is used to store and manipulate the solution that will be
presented to the user. The instance variables (fields) of the class are:
stations: an ArrayList that holds the names of the stations where the trip stops (hubs)
times: an ArrayList that holds the arrival and departure time of each hub, except the first
and the last who has only a departure and an arrival time respectively, also the hubs of a
train exchange have only one time attribute.
dates: an ArrayList that holds the arrival and departure date of each hub, except the first
and the last who has only a departure and an arrival time respectively.
numberOfTrains: an integer that holds the total number of trains involved in the trip
trains: an ArrayList of the trains involved in the trip
costs: an ArrayList that holds the costs of each acceptable (by the user) class, for each train
envolved in the trip.
waitingPeriods: an ArrayList that holds the waiting periods at every train exchange in the
trip.

The constructor of class DetailedJourney takes an ArrayList of hubs as a parameter and


deduces the information necessary to set the instance fields.

50
UML and Pseudo Code of DetailedJourney Class

Figure 5.12: DetailedJourney Class UML

class DetailedJourney {
stations
times
dates
NumberOfTrains
trains
costs
waitingPeriods

constructor (journey, connector) {


tripDistanceÅ0
currentHub Åjourney.get(journey.size()-1) first hub in the journey
add current hub’s station to stations
add current hub’s arrival time to times
add current hub’s date to dates

for (i=journey.size( )-2 to 0)


{
Note: hubs are stored in journey in reverse order (i.e. journey[0] is the goal hub)
currentHub Åjourney.get(i)
add current hub’s station to stations
add current hub’s arrival time to times
add current hub’s date to dates

51
if (currentHub.station() is the same as journey.get(i+1).station()) a train exchange
{
journey.get(i+1).trip( ) to trains (i.e. add the previous train)
add getFares(journey.get(i+1).trip(), tripDistance) to costs
add(currentHub.arrivalTime-journey.get(i+1).arrivalTime) to waiting times
tripDistanceÅ0
}
else (i.e. still in the same train)
{
if ( i is not 0) (because the hub at journey[0] has no following hub)
{
if (journey.get(i).station() is not the same as journey.get(i-1).station())
{
add the departure time of hub at journey[i] to times
if (the last added time in times is after its proceeding ){
that means it is in the same day
add the last date in dates to dates again
}
else
{
it must be in the new day
add the last date in dates incremented by one to dates again
}
}
}
add the true distance between the station of the current hub (i) and the previous hub
(i+1) via
their route to the counter tripDistance.

}
add the last train to trains
add getFares(currentHub.trip(),tripDistance) to costs
}
}

52
5.6.4 Coordinates Class

This class contains the coordinates of a station's location to be used in estimating the
straight line distance between two stations, which is used in the estimation of the time cost.

UML and Pseudo Code of DetailedJourney Class

Figure 5.12: DetailedJourney Class UML

53
Chapter 6

Reservation

In this chapter we are going to define the reservation problem, discuss the available
strategies for solving this problem, present the selected strategy and show the design and
the implementation of this strategy.

6.1 Reservation Problem

After the passenger chooses the most suitable trip for him, a seat is to be reserved on that
trip. The passenger should choose the required class, ticket type and the type of discount.
The passenger could choose the required seat or it may not be important for him to
choose the seat. If the passenger has not chosen a particular seat, the system can choose
the seat for the passenger. The system could increase the utilization of each seat in the
train, by choosing the seat to reserve with some heuristics, not randomly.

In a long trip, that passes through many station, not all the passengers want to go from the
start station of the trip to the last one. Some passengers reserve seats for a certain
segment in the middle of the trip, leaving these seats free in the other gabs. The goal is to
get the best use of these gabs, and decreasing them as much as possible.

This problem is similar to that of memory placement strategies in a variable-partition


multiprogramming system, which determine where in main memory to place incoming
programs and data. However, there is a difference between seat reservation and memory
placement. In memory placement, the only restriction is the size of the data or the
program to be placed, while in the seat reservation the passenger needs to go from a
certain station to another, not just any number of continuous stations.

54
6.2 Available Techniques
Here are some heuristics in order to choose the seat to be reserved:

• First-fit strategy: the system reserves the first available seat that is free in the
stations that the passenger wants to pass through. This strategy experiences a less
overhead.

• Best-fit strategy: the system reserves the seat which fits more tightly and leaves the
smallest gab. It seems the most intuitive strategy. It incurs the overhead of searching
all the seats for the best-fit seat. However, this overhead is theoretically of the same
order compared to the First-fit strategy, they are both linear in the number of seats,
i.e. O(n); where n is the number of seats.

• Worst-fit strategy: the system reserves the seat which fits worst. The intuitive appeal
is simple; after reserving the seat, the seat will be free for a relatively large number of
stations, enables the system to reserve this seat for another passenger that wishes to
continue with this trip, after the first passenger leaves the seat. This strategy has the
same overhead of the Best-fit strategy, but practically in the case of seat reservation,
it does not increase the utilization as the Best-fit strategy, as the number of stations
that the trip passes through is already not very big, so depending on leaving large
gabs that can be reserved to another passenger to increase the utilization is not a
strong reason for choosing the Worst-fit strategy.

The selected strategy


Since the overhead of using any of the reservation strategies is of linear order in the
number of seats, and the intuition of the Worst-fit strategy does not work well with the
seat reservation, as described, also because the system gives the ability to the passenger
to choose a certain seat if he or she insists on, the First-fit strategy will not give good
utilization like that of the Best-Fit strategy, so our choice was to choose the Best-Fit
strategy.

55
6.3 UML Class Diagram
Here we are going to show the design of the reservation facility, using the Best-Fit
strategy. The design is shown by the UML Class Diagram shown below.

Reservation
Seat
trip_code: int
date: Date trip_code: int
1 1 seat_num: int
stn_from: int
stn_to: int best seat vehicle_num: int
class_code: int segments: ArrayList
fcc: int gab: double
tktType: int factor: const double
1 * stmt: Statement
dstnc: double
trnType: int all partially
fare: double reserved seats + Seat()
refundFare: double - initializeSegments()
bestSeat: Seat + addSegment (stn_from:int,
stmt: Statement stn_to:int): int
conn: Connection + checkGab (stn_from:int,
stn_to:int):double
Reservation() - index (station: int)
getBestSeat():Seat
insertTicket() 1
getAllStations(): ArrayList segments
getAllTrainClass(): ArrayList
getAllTktTypes(): ArrayList *
reserved Seat_Segment

station: int
Ticket reserved: boolean
trip_code: int
date: Date
stn_from: int Seat_Segment(station: int)
stn_to: int
class_code: int
seat_num
vehicle num
Ticket()

56
6.4 Pseudo Code
The implementation reservation facility, using the Best-Fit strategy, is shown by the
pseudo code of the main methods:

Class: reservation
Method: getBestSeat()
• Given trip code, departure and arrival station calculate the distance, by
adding the distance of each segment in between the departure and
arrival station.
• Given trip code bring the train type from the data base.
• Given the FCC (type of discount), ticket type, required class calculated
distance and train type bring the fare and refund fare from the data
base.
• For each reserved ticket in this trip and required class
For each partially reserved seat calculate the gab caused be the
ticket required to be reserved.
Return the seat with the min gab, if there exist one.
Else return a totally free seat, if There exist one.
Else return null.

Class: Seat
Method: checkGab(stn_from:int,stn_to:int):double
• The array ‘segments’ has all the stations that this seat passes through,
and a boolean for each station represents whether the station is
reserved at this station or not.
• Call addSegment(stn_from,stn_to), which will make the boolean of the
stations between ‘stn_from’ and ‘stn_to’ true. It returns -1 if one of the
Booleans is already true, which represents that it is invalid to reserve
this seat.
• After adding this segment to the seat, calculate the formed gab before
and after the segment.
• If there is two gabs (before and after the required segment), multiply
the summation of the two gabs by a factor, and that to prefer the gabs
at one side.

57
Chapter 7

Database Analysis and Design

This chapter shows the phases of developing the underlying database system for our
project. It begins with the functional analysis, for the three main functions in the system:
query facility, reservation facility and the administration facility. The chapter then
presents the data modeling, by analyzing the data needed to be stored and maintained like
the data of the physical railways network which consists of the stations and the railway
connections between them, the data of the trips and their schedules and the data required
for the reservation (e.g. the fares of each kind of ticket and the already reserved tickets).

After the data modeling, it is the time for the logical design, which is the transforming of
the conceptual data model into relational data model, and specifying the data integrity
constraints. Finally, it comes the phase of physical design and tuning.

The database recovery methods, provided by the used database management system
(Oracle 8i), is shown in the database recovery section.

The last section of this chapter discusses the database administration tool that our system
provides.

7.1 Functional analysis

The main three functions in the system: query facility, reservation facility and the
administration facility. The query facility is responsible for running the routing algorithm
to find the feasible journeys, which satisfy user’s constraints, ranked by their arrival time.
The reservation facility is responsible for reserving a seat for the user that increases the
utilization of each seat in a train. Finally, the administration facility is the tool that we
provide for the employees to facilitate the process of data entry.

We are going to use the Data Flow Diagram (DFD) in analyzing these functions. The
purpose of the DFD is to show where the data comes from, where the data goes to when
it leaves the system, where the data is stored, what processes transform it, and the
interactions between data stores and processes.

58
DFD of the query facility

1 Trip Segments 2 Trip Vehicles 3 Segment

departure and arrival


time of each segment Classes for
in a trip a trip segment distance

1 4 Trip Instance
Trip code and date
User Departure and arrival stations
User constraints Routing
5 Train types
Algorithm Velocity

Trip code
Sets of hubs train type

Feasible 6 Trip
journeys

X_Coordinate
Y_Coordinate

2
Name
Construct 7 Station
Detailed Ticket fares
Journeys for each 8 Fare
class

In order to find the set of feasible journeys, the routing algorithm reads the departure
and arrival stations with all the constraints of the user and then connects to the
database to read all the data related to the trip schedules and the user constraints.
Finally, the ‘Construct Detailed Journeys’ process prepares a detailed description of
the algorithm solution and send it to the user.

59
DFD of the reservation facility

1 Trip Segments 2 Trip Vehicles 3 Segment

All segments vehicles of


a certain distance of the
in a trip segments
class

1 4 Trip Instance
Selected trip Trip code and date
User Ticket type and class
Reservation Ticket fare
Reserved seat and And refund 5 Fare
vehicle number, fare
fare and refund
fare of the ticket

Trip code 6 Trip


already reserved
tickets New
reserved
ticket

7 Reserved tickets

The reservation facility requires the knowing of the trip selected by the user and the
already reserved tickets in order to decide the suitable seat to reserve. During this
process all the vehicles of this trip with the requested class are needed to choose from.
Also the segments that this trip passes through this trip is needed to check whether the
seats are available or not during these segments. Finally the fare and the refund fare is
needed to be shown to the user.

60
DFD of adding a new Trip:

2 Trip

Max (trip_code) +1 New Trip

2 3 Train types
Trip_code, TRN_type, All train types
Employee weekdays, STN_departure,
operating and expiring date
Insert new trip
4 Station
All stations
Trip_code, List of
all vehicles with
their classes and
User name,
number of seats
password
5 Train classes
3
All classes

Insert trip
List of all
vehicles trip vehicles 6 Trip_Vehicles
1
Trip_code, List of
all segments with
Verify/connect
their arrival and
departure time

User names,
passwords
4 7 Segments
All segments

1 Account Insert trip


List of all trip
segments segments 8 Trip segments

The main page in the administration tool is the page of inserting a new trip, as it
facilitates the steps of adding a new trip to the database, which implies dealing with
many tables. In order to add a new trip, first the employee should be authorized to add
a trip. Authentication is verified by username and password, which represent his role
in the data base. The employee gives the TRN_type, weekdays, STN_departure,
operating and expiring date of the trip, the system suggests a trip code equals to the
Max(trip_code) +1, which the employee can accept or change it. This trip is inserted
in the Trip table. Then, the employee should supply the list of vehicles and the class
of each one. The system supplies the employee by the names of all classes in order to
select one of them for each vehicle. Finally, the list of segments that represent the
rails that the trip passes through is inserted; during this operation the system supplies
the employee by all the segments in the network to choose the list from them.

61
7.2 Data Modeling:
Data modeling is a very important task in building a database system since it has an
impact on efficient database design. We applied the Entity Relationship approach in
defining the conceptual abstract view of the database system. The database system of any
railway network is divided into three main subjects the physical network, the trip
schedules and the reservation information. We are going to use the Entity Relationship
Diagram (ERD) in representing these subjects.

ERD of the Physical Network:

Physical Network
X_ Y_
Coordinat Coordinat

Stn_code
name
Station

From To
name

M_Route Segment
Via

Route_code Dstnc

The physical network consists of all the stations each with its horizontal and vertical
coordinates. Also there are the segments (rails) from a certain station to another with the
length of the rail between them. The via-route between in the segment is used to
differentiate between the rails that connect the same stations.

62
ERD of the Trip Schedules:

M_TRN_
Type

Schedule
Trip_operating
_date

departure Trip_expiring
Station Trip _date

Trip_code
weekdays

Vehicle
_code

Segment Trip_Segment Trip_Vehicles

Dptr_TM Arvl_TM Day_ Class_ Seats


offset code

A schedule of the railway system is represented by the trips that run on the segments. The
trip is characterized by the train type, the departure station, the days of the week the trip
runs at, the trip starting date (operating date) and the expiring date. For each trip there is a
list of segments together with its arrival and departure times. Also for each trip there is a
list of train vehicles that go for this trip each with its class and number of seats.

63
64
ERD of the Reservation:

M_TRN_type

Reservation
Dstnc TKT_fare

Class_code
TKT_Rfnd_
Fare fare

FCC_code

TKT_type_
code

Trip_date

depature
Station Trip_Instance Trip
Ticket
Arrival

Seat_num Vehicle_num No_of_available_


seats

The fare of a ticket is determined by the requested class, train type, ticket type, discount
type (FCC) and the distance of the segment. For each fare there is a refund fare that will
be returned to the customer if he cancelled the reservation.

The trip instance is instantiated from the trip a week before the date of the trip to enable
passengers to reserve tickets on that trip a week before its date. Tickets cannot be
reserved unless the instance is instantiated.

65
7.3 The Entity Type Specifications (ETS)

Project: EgyTrains Subject: Physical Network Page 1/1

Object: Station Date: 1/6/2005

Attribute Type Size Validations


STN_code NUMBER 5 PK, Not Null
STN_name Varchar2 25 Not Null
X_coordinate NUMBER 5 Not Null
Y_coordinate NUMBER 5 Not Null

The table Station contains the station code, station name beside the horizontal and
vertical coordinates of the station measured from some origin, these distances is used to
derive the physical straight line distance between any two stations. The x and y distance
can be positive or negative kilometers. The straight line distance can be used with the
max train velocity to estimate the real distance between to stations, which help us
estimating the real cost or time.

Project: EgyTrains Subject: Physical Network Page 1/1

Object: M_Route Date: 1/6/2005

Attribute Type Size Validations


Route_code NUMBER 5 PK, Not Null
name Varchar2 25 Not Null

The master table of routes contains all the possible routes between stations. As any two
stations may have more than one possible route to connect them, each two stations with
one of their connecting routes are called a segment.

Project: EgyTrains Subject: Physical Network Page 1/1

Object: Segment Date: 1/6/2005

Attribute Type Size Validations


STN_from NUMBER 5 PK, Not Null, primary key in STN as STN_code
STN_to NUMBER 5 PK, Not Null, primary key in STN as STN_code
VIA_Route NUMBER 5 PK, Not Null primary key in M_Route as route_code
DSTNC NUMBER 5 Not Null, graeter than 0

The Segment table contains all the segments with their real distances not the straight line
distance. Each train trip consists of many segments.

66
Project: EgyTrains Subject: Master Information Page 1/1

Object: M_TRN_type Date: 1/6/2005

Attribute Type Size Validations


TRN_type_code NUMBER 3 PK, Not Null
name Varchar2 20 Not Null
velocity NUMBER 5 Not Null, greater than 0

The master information about train types contains the train type code with its name and
velocity. The velocity helps in estimating the real distances when there is no direct
railway connection between two stations as described before. Examples of train types are:
Spanish, turbine or French.

Project: EgyTrains Subject: schedule Page 1/1

Object: Trip Date: 1/6/2005

Attribute Type Size Validations


Trip_code NUMBER 5 PK, Not Null
Not Null, primary key in M_TRN_type as
TRN_type_code_fk NUMBER 2
TRN_type_code
STN_departure_fk NUMBER 5 Not Null, primary key in Station as STN_code
weekdays NUMBER 7 Not Null, each digit is 0 or 1
Trip_operating_date date Not Null
Trip_expiring_date date Not Null

Trip is the master information about all valid trips. It contains train type, the departure
station, days of the week at which the trip runs, the trip starting date (operating date) and
the expiring date. Any expired trip is deleted from the Trip table and it is archived.

Project: EgyTrains Subject: schedule Page 1/1

Object: Trip_Segment Date: 1/6/2005

Attribute Type Size Validations


Trip_code_fk NUMBER 5 PK, Not Null , primary key in Trip as Trip_code
STN_from_fk NUMBER 5 PK, Not Null, primary key in Segment as STN_from
STN_to_fk NUMBER 5 PK, Not Null , primary key in Segment as STN_to
VIA_Route_fk PK, Not Null , primary key in Segment as VIA_Route
DPTR_TM Time Not Null
ARVL_TM Time Not Null
Day_Offset Number 1 Not Null, system maintained

Trip_Segment contains all the segments of a certain trip, the departure station, the
departure time, the arrival station, the arrival time, and day offset between this segment
and the first segment in this trip.

67
Project: EgyTrains Subject: schedule Page 1/1

Object: Trip_vehicles Date: 1/6/2005

Attribute Type Size Validations


Trip_code_fk NUMBER 5 PK, Not Null , primary key in Trip as Trip_code
Vehicle_code NUMBER 4 PK, Not Null
Class_code NUMBER 1 Not Null, in the range [1,3]
seats Number 5 Not Null, greater than 0

The Trip_vehicles table contains a record for each vehicle in a certain trip, with its class
code and the number of seats in this vehicle.

Project: EgyTrains Subject: Reservation Page 1/1

Object: Trip_Instance Date: 1/6/2005

Attribute Type Size Validations


Trip_code_fk NUMBER 5 PK, Not Null , primary key in Trip as Trip_code
Trip_date date PK, Not Null
No_of_available_seats Number 5 Not Null, greater than or equal 0, system maintained

Each day, an instance of each trip in the next week should be made, and given a date.
One cannot reserve a ticket on a trip unless this instance is made. Old instances should be
deleted or archived.
Each instance has a system maintained field containing the number of available (totally
free) seats till now. The number of available seats represents the number of totally free
seats. If it reaches zero it does not mean that no one can reserve any ticket, as a passenger
can reserve a ticket for a part of the trip and sit in a seat that is not totally free provided
that it is free through the required part of the trip. However it is a good estimate that
shows whether it will be easy to find a seat or not in this trip and it can be used
statistically to show the frequency of demand on the concerned trip.

68
Project: EgyTrains Subject: Reservation Page 1/1

Object: Fare Date: 1/6/2005

Attribute Type Size Validations


DSTNC NUMBER 5 PK, Not Null
TRN_type_code_fk NUMBER 2 PK, primary key in M_TRN_type as TRN_type_code
Class_code NUMBER 1 PK, Not Null, in the range [1,3]
FCC_code NUMBER 1 PK, Not Null, in the range [1,4]
TKT_type_code NUMBER 1 PK, Not Null, in the range [1,2]
TKT_fare NUMBER (6,2) Not NULL
TKT_RFND_fare NUMBER (6,2) Not NULL

Fare is the table of all fares. Each fare is determined by the distance, train type, reserved
class, FCC (whether there is a discount or not, and the type of this discount), ticket type
(single ticket or part of a return ticket).With each fare there is also the refund fare which
is returned to the customer if he cancelled the reservation.

Project: EgyTrains Subject: schedule Page 1/1

Object: Ticket Date: 1/6/2005

Attribute Type Size Validations


PK, Not Null , primary key in Trip_Instance as
Trip_code_fk NUMBER 5
Trip_code_fk
Trip_date_fk date PK, Not Null, primary key in Trip_Instance as Trip_date
STN_from_fk NUMBER 5 PK, Not Null, primary key in Station as STN_code
STN_to_fk NUMBER 5 PK, Not Null, primary key in Station as STN_code
Seat_num NUMBER 5 PK, Not Null
Vehicle_num NUMBER 4 PK, Not Null
DSTNC NUMBER 5 Not Null, primary key in Fare as DSTNC
TRN_type_code_fk NUMBER 2 Not Null, primary key in Fare as TRN_type_code_fk
Class_code NUMBER 1 Not Null, primary key in Fare as Class_code
FCC_code NUMBER 1 Not Null, primary key in Fare as FCC_code
TKT_type_code NUMBER 1 Not Null, primary key in Fare as TKT_type_code

Ticket is the table of already reserved tickets. The information with each ticket is the trip
instance (determined by the trip code and date), departure and arrival station, reserved
class, ticket type, discount, distance (to be able to determine the fare from Fare table),
vehicle number and seat number.

69
Project: EgyTrains Subject: Master Information Page 1/1

Object: M_FCC Date: 1/6/2005

Attribute Type Size Validations


FCC_code NUMBER 1 PK, Not Null
name Varchar2 20 Not Null

The FCC is the kind of the discount on the ticket (full price, half price or militant).

Project: EgyTrains Subject: Master Information Page 1/1

Object: M_TKT_type Date: 1/6/2005

Attribute Type Size Validations


TKT_type_code NUMBER 1 PK, Not Null
name Varchar2 20 Not Null

Ticket type indicates whether it is a single ticket or part of a return ticket.

Project: EgyTrains Subject: Master Information Page 1/1

Object: M_TRN_class Date: 1/6/2005

Attribute Type Size Validations


Class_code NUMBER 1 PK, Not Null
name Varchar2 20 Not Null

Train classes are first, second or third class. This table is used when printing these names.
If in the future, other classes are added, the validation rules of the class_code in any other
table should be changed.

70
7.4 Logical Database Design
Logical database design is transforming the conceptual data model into relational data
model. The obtained relations should be first normalized to the third normal form.
Normalization is the decomposition of complex data structures according to a set of
dependency rules, designed to give simpler and more stable data structures.

There are many normalization forms. The first normal form is that the relation does not
have a multi valued attribute. A relation is in the second normal form if it is in the first
normal form and non prime attributes are functionally dependent on the entire primary
key and not on part of the key. A relation is in the third normal form if it is in the second
normal form and each nonprime attribute s independent of any other nonprime attribute.

Here are each relation in the database and its functional dependency diagram that shows
whether the relation is in the third normal form or not.

STATION

NAME

STN_CODE X_DSTNC

Y_DSTNC

M_ROUTE

ROUTE_CODE NAME

SEGMENT

STN_FROM

STN_TO DSTNC

VIA_ROUTE

71
M_FCC

FCC_ CODE NAME

M_TKT_TYPE (TKT_TYPE_CODE, NAME)

TKT_TYPE_ NAME
CODE

M_TRN_CLASS (CLASS_CODE, NAME)

CLASS_ NAME
CODE

M_TRN_TYPE

NAME

TRN_TYPE_
CODE
VELOCITY

72
TRIP TRN_TYPE_CODE_FK
STN _DEPARTURE_FK

WEEKDAYS
TRIP_CODE

TRIP_OPERATING_DATE

TRIP_EXPIRING_DATE

Note that Weekdays is not a multi valued attribute, it is an integer of seven digits, each
for one day of the week. If the trip runs on that day then the digit is one else the digit is
zero.

TRIP_ SEGMENT

TRIP_CODE_FK
ARVL_TM

STN_FROM_FK
DPTR_TM

STN_TO_FK
DAY_OFFSET

VIA_ROUTE_FK

TRIP_VEHICLES

CLASS_CODE

TRIP_CODE_FK

VEHICLE_CODE
SEATS
73
TRIP_INSTANCE

TRIP_CODE_FK
NO_AVAILABLE
_SEATS
TRIP_DATE

FARE

DSTNC

TKT_FARE
TRN_TYPE_
CODE_FK

CLASS_CODE

TKT_RFND_FARE
FCC_CODE

TKT_TYPE_CODE

74
TICKET

TRIP_DATE_FK

STN_FROM_FK
DSTNC
STN_TO_FK

TRIP_CODE_FK TRN_TYPE_CODE_FK

CLASS_CODE
VEHICLE_NUM

SEAT_NUM FCC_CODE

TKT_TYPE_CODE

The Ticket relation is not in the second normal form as there are some non prime
attributes that are functionally dependent on part of the key not the entire primary key. To
normalize the relation ticket,
The attributes DSTNC, TRN_TYPE_CODE_FK and CLASS_CODE must not belong to
the Ticket relation. This normalization will slow down the speed of knowing the fare and
refund fare of any ticket.

After normalization, the steps to get the fare of a ticket are: getting train type from the
table Trip using the Ticket.TRIP_CODE_FK, getting the class from the table
Trip_segment using the Ticket.TRIP_CODE_FK and Ticket.VEHICLE_NUM, getting
all the segments that connect the Ticket.STN_FROM_FK and Ticket.STN_TO_FK,
getting distances of all these segments from the table Segment, adding all these distances
and then getting the fare from the table Fare.

For this reason , we chose to denormalize the Ticket relation. Leaving the Ticket relation
as it is, will enhance the performance of getting the fare of a certain ticket, as the fare will
be brought from the table Fare using Ticket.DSTNC, Ticket.TRN_TYPE_CODE_FK,
Ticket.CLASS_CODE, Ticket.FCC_CODE, Ticket.TKT_TYPE_CODE.

75
Codd’s representation for the relations
STATION (STN_CODE, NAME, X_DSTNC, Y_DSTNC)

M_ROUTE (ROUTE_CODE, NAME)

SEGMENT (STN_FROM, STN_TO, VIA_ROUTE, DSTNC)

M_TRN_TYPE (TRN_TYPE_CODE, NAME, VELOCITY)

TRIP (TRIP_CODE, TRN_TYPE_CODE_FK, STN _DEPARTURE_FK, WEEKDAYS,


TRIP_OPERATING_DATE, TRIP_EXPIRING_DATE)

TRIP_ SEGMENT (TRIP_CODE_FK, STN_FROM_FK, STN_TO_FK,


VIA_ROUTE_FK, ARVL_TM, DPTR_TM, DAY_OFFSET)

TRIP_VEHICLES (TRIP_CODE_FK, VEHICLE_CODE, CLASS_CODE, SEATS)

TRIP_INSTANCE (TRIP_CODE_FK, TRIP_DATE, NO_AVAILABLE_SEATS)

FARE (DSTNC, TRN_TYPE_CODE_FK, CLASS_CODE, FCC_CODE,


TKT_TYPE_CODE, TKT_FARE, TKT_RFND_FARE)

TICKET (TRIP_CODE_FK, TRIP_DATE_FK, STN_FROM_FK, STN_TO_FK,


SEAT_NUM, VEHICLE_NUM, DSTNC, TRN_TYPE_CODE_FK,
CLASS_CODE,
FCC_CODE, TKT_TYPE_CODE)

TRIP_ARCHIVE (TRIP_CODE, TRN_TYPE_CODE_FK, STN _DEPARTURE_FK,


STN_ARRIVAL_FK, WEEKDAYS, TRIP_OPERATING_DATE,
TRIP_EXPIRING_DATE)

M_FCC (FCC_CODE, NAME)

M_TKT_TYPE (TKT_TYPE_CODE, NAME)

M_TRN_CLASS (CLASS_CODE, NAME)

76
7.5 Data integrity
There are different types of relational integrity constraints, like the domain constrains, entity
integrity constraint, referential integrity constrains and semantic integrity constrains.

Domain constrains:

Domain constrains specify that within each field, the value of each attribute must be an atomic
value from its domain. Domain constraints are preserved by the DBMS. Here are all the relations
with the domain of each field.

STATION (STN_CODE: integer, NAME: string, X_DSTNC: real, Y_DSTNC: real)

M_ROUTE (ROUTE_CODE: integer, NAME: string)

SEGMENT (STN_FROM: integer, STN_TO: integer, VIA_ROUTE: integer, DSTNC: real)

M_TRN_TYPE (TRN_TYPE_CODE: integer, NAME: string, VELOCITY: real)

TRIP (TRIP_CODE: integer, TRN_TYPE_CODE_FK: integer, STN _DEPARTURE_FK:


integer, WEEKDAYS: integer, TRIP_OPERATING_DATE: date,
TRIP_EXPIRING_DATE: date)

TRIP_ SEGMENT (TRIP_CODE_FK: integer, STN_FROM_FK: integer, STN_TO_FK:


integer, VIA_ROUTE_FK: integer, ARVL_TM: time, DPTR_TM: time,
DAY_OFFSET: integer)

TRIP_VEHICLES (TRIP_CODE_FK: integer, VEHICLE_CODE: integer, CLASS_CODE:


integer, SEATS: integer)

TRIP_INSTANCE (TRIP_CODE_FK: integer, TRIP_DATE: date, NO_AVAILABLE_SEATS:


integer)

FARE (DSTNC: real, TRN_TYPE_CODE_FK: integer, CLASS_CODE: integer, FCC_CODE:


integer, TKT_TYPE_CODE: integer, TKT_FARE: currency,
TKT_RFND_FARE: currency)

TICKET (TRIP_CODE_FK: integer, TRIP_DATE_FK: date, STN_FROM_FK: integer,


STN_TO_FK: integer, SEAT_NUM: integer, VEHICLE_NUM: integer,
DSTNC: real, TRN_TYPE_CODE_FK: integer, CLASS_CODE: integer,
FCC_CODE: integer, TKT_TYPE_CODE: integer)

M_FCC (FCC_CODE: integer, NAME: string)


M_TKT_TYPE (TKT_TYPE_CODE: integer, NAME: string)
M_TRN_CLASS (CLASS_CODE: ineger, NAME: string)

77
Entity integrity constraint:

The entity integrity constraint states that no primary key value can be null. This is because the
primary key value is used to identify individual tuples in a relation. The DBMS is responsible for
preserving this integrity constrains.

Referential integrity constrains:

Referential integrity constraint is specified between two relations and is used to maintain the
consistency among tuples in the two relations. The referenced attribute should be the primary
key of the referenced relation. Referenced tuples cannot be removed until all referencing tuples
are removed. A referencing tuple cannot be inserted before the referenced tuple. Again, The
DBMS is responsible for maintaining referential integrity constrains.

Semantic integrity constrains:

Domain constrains and referential constrains not sufficient, database triggers should be used, to
allow defining and enforcing semantic integrity constrains. Here comes some of the integrity
rules for our railway system and the pseudo code of the set of triggers that enforce this rule.

Integrity rule:
In each record of the Segment table (STN_FROM, STN_TO, VIA_ROUTE, DSTNC)
Stn_from must not be equal to stn_to.

Trigger:
Before insert on Segment
If new.stn_from = new.stn_to then return error

Integrity rule:
The expiring date in the Trip table should be greater than the date of inserting its record;
otherwise it should be inserted in the archive.

Trigger:
Before insert on Trip
If new.expiring_date < operating_date
return error(“expiring date should be after the operating date”)
If new.expiring_date < current system date
return error(“expiring date has already passed, you can insert this trip on
the archive or check the date”)

78
Integrity rule:
The date in the Trip_Instance table should be greater than the date of inserting its record.

Trigger:
Before insert on Trip_Instance
If new.date < current system date
return error(“the date has already passed”)
Integrity rule:
The list of segments for a certain trip, that are stored in the table Trip_segment should be
a continuous list that does not form a loop, i.e. can not depart from any station twice, nor
arrive at any station twice, nor return to a previous visited station. Also one can not
depart from a station to arrive to it.

Trigger:
Before insert on Trip_segment

If it is not first segment to insert in this trip, there must exist one and only one
segment in this trip in which stn_to = new.stn_from
else
Return error(“the list of trip_segments is not continuous”)

Trigger:
Before insert on Trip_segment
For each segment on the same trip
segment.stn_from_fk must not equal to new.stn_from_fk
segment.stn_to_fk must not equal to new.stn_to_fk
segment.stn_from_fk must not equal to new.stn_to_fk
new.stn_from_fk must not equal to new.stn_to_fk
else return error(“There exist a loop, check this segment with previous segments”)

Integrity rule:
The list of segments for a certain trip, that are stored in the table Trip_segment should be
continuous in time, i.e. if the time of departure is before the time of arrival, then the day
offset must be more than zero.

Trigger:
Before insert on Trip_segment
If it’s the first segment in this trip then new.day_offset = 0
Else
Check the departure time of the previous segment
If new.DPTR_TM is on the same day of previous_segment.DPTR_TM
new.day_offset = previous_segment.day_offset
Else
new.day_offset = previous_segment.day_offset + 1

79
Integrity rule:
Each part (set of trip_segments) of the trip should have a set of fares assigned to it in the
Fare table, one fare record for each combination of classes, ticket types, and discount
types.

Trigger:
Before insert on Trip_segment
For each class of this Trip in the table Trip_vehicles
For each type of discount in the table M_FCC
For each ticket type in the table M_Tkt_type
Check that there is a fare in the table Fare for this class,
discount, ticket type and the distance between the
new.STN_from_fk and new.STN_to_fk

Trigger:
Before insert on Trip_ vehicles
For each type of discount in the table M_FCC
For each ticket type in the table M_Tkt_type
For each distance of each combination of segments of that trip
Check that there is a fare in the table Fare for this discount,
ticket type, distance and new.Class_code

Integrity rule:
The number of available seats in the Trip_Instance table is system maintained. It should
be decreased when a certain seat is reserved.

Trigger:
Before insert on Trip_Instance
Select sum(seats) from Trip_vehicles
where Trip_vehicles.trip_code_fk = new.trip_code_fk
new. NO_AVAILABLE_SEATS Å this sum

Trigger:
After insert on Ticket
Select distinct vehicle_num, seat_num from the table Ticket
get the count of these records
update the number of available seats in the Trip_Instance with this count

80
Integrity rule:
The distance of the rail between the departure and arrival stations in the Ticket table
should be equal to the sum of distances between the segments that connects the departure
station with the arrival station.

Trigger:
Before insert on Ticket
dist = get_dist (new.TRIP_CODE_FK, new.STN_FROM_FK, new.STN_TO_FK)
if dist = 0 then there is no segments connect these stations
if the employee entered a distance it must equal dist
if the employee did not enter a distance, insert the record with the calculated dist

Function:
get_dist (trip_code: number, stn_from: number, stn_to: number)
for each segment in the table Trip_segment,
that connect stn_from and stn_to with the Trip_code = trip_code, do
bring the distance of this segment from the table Segment.
Add this segment to dist
Return dist

Integrity rule:
No one can reserve a ticket from a station to the same station.

Trigger:
Before insert on Ticket
If new.stn_from = new.stn_to then return error

Integrity rule:
No one can reserve a ticket on a certain seat passes through certain stations, if this seat is
already reserved within one of these stations.

Trigger:
Before insert on Ticket
For each Ticket on the required seat
If there is intersection between the stations passed by this ticket and the
stations passed by the new ticket return error

81
7.6 Physical Design
The goal of the physical design is to guarantee a good performance beside the appropriate
structuring of data. It is not possible to make meaningful physical design decisions until
we know the queries, transactions and applications that are expected to run on the
database.

The routing and the reservation algorithm represent the applications that run on the
database. First we are going to analyze the set of queries invoked by the routing
algorithm and then the queries that are invoked by the reservation algorithm.

Here are the queries invoked by the routing algorithm. All these queries are in the class
DBconnector. This class deals with the database, in order to search for possible trips that
the passenger will choose from. Each query is listed with its frequency of invocation per
one query.

query frequency of invocation


getNextTrainFromSameStation O(closed list size)
getNextStationInSameTrain O(closed list size)
checkClasses O(closed list size * average number of
classes per trip)
getFares O(number of trips in output * average
number of classes per trip)
getStationName O(number of stations in output)
getArrivalTime O(open list size)
getDepartureTime O(number of intermediate stations in
output) (i.e. where no exchange occurs)
getTrueDistance O(number of segments in output)
checkDate O(closed list size) frequently updated
getEstimateToDestination O(open list size) used in A* only

Here are the queries in the class that deal with the database, in order to reserve a ticket.
All these queries are in the class Reservation. Each method is listed with its frequency of
invocation per reserving a single ticket.

query frequency of invocation


getAllSegmentsOfTrip O(1)
getDistance O(number of segments in this trip)
getAllTicketsOnSeat O(1)
getVehiclesOfClass O(1)

82
Each query will imply certain decisions, like the attribute on which indexes should be
defined and the type of this index. Whether this decision should be made or not, it
depends on the frequency of this query and the frequency of updating this attribute.

Final decisions:

The following table shows the final decisions that are implemented; these decisions are
chosen due to the analyzed queries and their frequencies.

Table Attribute Type of decision


Trip_segment Trip_code clustering
Trip _Vehicles Trip_code_fk, Class_code clustering
Ticket vehicle_num,seat_num clustering
Station STN_name Secondary index
Trip_segment STN_from_fk Secondary index

7.7 Concurrency Control

Concurrency control is needed when the system is multi-user system, as many


transactions may access the same database at the same time. For such reasons
concurrency control techniques are used to ensure noninterference or isolation property
of concurrently executed transactions.

Oracle DBMS manages the concurrency control for our system. Oracle locking is
performed automatically and requires no user action. Implicit locking occurs for SQL
statements as necessary, depending on the action requested. Oracle's lock manager
maintains several different types of row locks, depending on what type of operation
established the lock. In general, there are two types of locks: exclusive locks and share
locks. Only one exclusive lock can be obtained on a resource (such as a row or a table);
however, many share locks can be obtained on a single resource.

In our routing algorithm all the transactions are read only transactions, as the routing
algorithm just reads the trip schedules to decide the feasible journeys.

The concurrency problem usually come form write after read transactions, which is the
case in the reservation feature in our system. The system reads the current reserved
tickets and decides the suitable seat to reserve and then insert this ticket in the new
reserved ticket in the Ticket table. However, the problem gets more important when the
online payment, which we left as a future work, is developed.

83
7.8 Data Recovery

Data loss can occur for various reasons. Here are some of the most common types of
failures that can lead to data loss.

A statement failure is a logical failure in the handling statement in an Oracle program.


For example, a user issues a statement that is not a valid SQL construction. When
statement failure occurs, Oracle automatically undoes any effects of the statement and
returns control to the user.

A process failure is a failure in a user process accessing Oracle, i.e., an abnormal


disconnection or process termination. The failed user process cannot continue work,
although Oracle and other user processes can. If the user process fails while modifying
the database, Oracle background processes undo the effects of uncommitted transactions.

An instance failure is a problem that prevents an Oracle instance from continuing to


function. Instance failure can result from a hardware problem such as a power outage, or
a software problem such as an operating system crash. When an instance fails, Oracle
does not write the data in the buffers of the SGA to the datafiles.

A user or application error is a user mistake that results in the loss of data. For example,
a user can accidentally delete data from a payroll table. Such user errors can require the
database or object to be recovered to a point in time before the error occurred.

A media failure is a physical problem that arises when Oracle tries to write or read a file
that is required to operate the database. A common example is a disk head crash that
causes the loss of all data on a disk drive. Disk failure can affect a variety of files,
including datafiles, redo log files, and control files. Because the database instance cannot
continue to function properly, it cannot write the data in the database buffers of the SGA
to the datafiles.

Oracle provides users a choice of several basic methods for recovery handling. The
methods include:

• Recovery Manager (RMAN) - A component that establishes a connection with a


server process and automates the movement of data for backup and recovery
operations.
• Oracle Enterprise Manager - A GUI interface that invokes Recovery Manager.
• Oracle Data Pump - The utility makes logical backups by writing data from an
Oracle database to operating system files in a proprietary format. This data can
later be imported into a database.
• User Managed - The database is backed up manually by executing commands
specific to the user's operating system.

84
7.9 Administration Tool:
Data entry process may be a tiring and exhaustive process unless it is well organized. In
order to integrate our system we provided a database administration tool that is user
friendly and well documented to facilitate the entry process.

In the user guide appendix, we provide the necessary information for the employees to
use the administration tool in inserting the data, screen shots of the administration pages
and we suggest a schema for the data entry process.

We developed this tool using JSP technology. We choose the JSP technology as it made
potable by having their contents translated into Java servlet by the application server, and
it is a server-side technology. The architecture of a JSP/servlet-enabled Web site is often
referred to as thin-client because most of the logic is executed on the server.

The administration tool consists of many pages. An administrator should first login in
with a user name and password in the home page. Each page concerns with a certain
table, like: the station, route, segment, train type, ticket fare or trip instance. Inserting a
new trip schedule requires the insertion of data in three tables: Trip, Trip_segment and
Trip_Vehicles. The Data Flow Diagram (DFD) of inserting a new trip schedule, which
concerns with inserting trip general information, the list of segments composing this trip
and the list of vehicles and their classes for this trip is shown in the first section of this
chapter.

85
Chapter 8

Conclusion and Future Work


8.1 Conclusion
We designed and implemented a Web enabled services for railway networks. The main
services are the query and reservation facilities. The query facility helps the users to
choose the best journey from a set of feasible time ordered journeys that satisfy the user
criteria. Each journey may consist of more than one trip, as some cities have no direct
trips to connect them. The designing phase of the query facility took a lot of efforts in
analyzing the problem of the routing algorithm in a railway network with its multiple
constrains, searching for similar algorithms that solve similar problems, comparing these
algorithms and choosing the final algorithm. The reservation facility tries to increase the
utilization of each seat in a train, by choosing the seat to reserve using some heuristics,
not randomly.

We also designed and implemented a database system for railway networks. In order to
facilitate the deployment of the system for servicing the Egyptian railway system, we
contacted the National Organization for Egyptian Railways to learn about the existing
railway system, and to know the types of data they need and the policy of calculating the
ticket fares. We also got the data of the physical network and the trip schedules as a hard
copy, but we failed to get it as a soft copy. Anyway we implemented database
administration tools that facilitate the data entry process.

During designing and implementation, we got a learning experience in many fields, like:
modeling multi-constrained graphs, routing algorithms, solving optimization problems,
database modeling and Web developing.

8.2 Future Work


There is some extra work that may be done in the future in order to deploy the system for
servicing the Egyptian Railways:

Online payment

The reservation phase composed of two stages, choosing a free seat that increases the
utilization, which we implemented, and the online payment before the reservation of that
seat is committed, which we left as a future work.

86
After paying the fare of the reserved ticket, the system responses with a serial number,
then we suggest one of these scenarios:

• Providing each station with a device that reads this serial number from the
passenger and print the ticket.
• Providing each ticketing man with a PDA (personal digital assistant) that
validates the serial number.
• The passenger gives the serial number to a desk clerk, to take the ticket associated
to that serial number.

Egyptian railways data entry

The Egyptian Railways serves more than 1.4 million passengers daily, and runs about
1260 trains daily, and it is growing continuously by adding new lines, that is why it is
difficult for us to insert and maintain the whole real data.

Although we implemented database administration tools that facilitate the data entry
process, we did not actually insert the real data. In the user guide we provide the
necessary information for the employees to use the administration tool in inserting the
data, and we suggest a schema for the data entry process.

87
References
1. Luger and Stubblefield, "Artificial Intelligence, Structures and strategies for
complex problem solving". Addison Wesley Longman, 1998.

2. the German Railway Web site: www.bahn.de

3. Dietel, "Java, How to program". Prentice Hall, fifth edition, 2003.

4. Cormen, Leiserson and Rivest, "Introduction to Algorithms". McGrow-Hill


Company, 2000.

5. Kevil Loney, George Koch, "Oracle 8i: Complete Reference",


Osborne /McGraw-Hill, 2000.

6. El Masri, Navathe, "Fundeamentals of Database Systems". Addison Wesley,


fourth edition, 2003.

7. Dietel, Dietel and Choffnes, "Operation Systems", Prentice Hall, third


edition, 2004.

88
Appendix A
Routing and Reservation Case Study
A.1 Routing Case Study

Consider the following network

A B C D F E

Figure A.1: A simple railway network

A1 B1 C1 F1 E1
5:00 6:00 7:00 8:00 12:00
6:10 7:10 8:10

A2 C2
5:30 6:30

A3 D3 E3
2:00 8:00 9:00
8:10

F4 E4
9:00 10:00

D5 E5
9:00 10:00

Figure A.2: A simple railway network's train schedule

The request is : finding all routes from station A to station E. after 1:00 o'clock
We have 5 search techniques. We will illustrate how each of which behave under the
request.

A-1
A*

serial The The enqueued Notes OpenList


extracted children in the
node from OpenList
the
OpenList
A1 First node A1
1 A1 B1 , A2 A2 ,B1
2 A2 C2 , A3 B1, C2, A3
3 B1 C1 No next hub from same C2, A3, C1
station exist
4 C2 C2 generated C1 but it was A3, C1
not added as it already
exists in OpenList, and
train 2 ended
5 A3 D3 No next hub from same C1, D3
station exist
6 C1 F1 No next train also F1, D3
7 F1 E1, F4 D3, F4, E1
8 D3 E3 , D5 F4, D5, E3, E1
9 F4 E4 No next train D5, E4, E3, E1
10 D5 E5 No next train E4, E5, E3, E1
11 E4 Destination E4 is reached E5, E3, E1
12 E5 Destination E5 is reached E3, E1
13 E3 Destination E3 is reached E1
14 E1 Destination E1 is reached
Table A.1 : A* trace

A-2
Uniform Cost
serial The The enqueued Notes OpenList
extracted children in the
node from OpenList
the
OpenList
A1 First node A1
1 A1 B1 , A2 A2 ,B1
2 A2 C2 , A3 B1, C2, A3
3 B1 C1 No next train A3, C2, C1
4 A3 D3 No next train C2, C1, D3
5 C2 C2 generated C1 but it was C1, D3
not added as it already
exists in OpenList, and
train 2 ended
6 C1 F1 No next train D3, F1
7 D3 E3 , D5 F1, E3, D5
8 F1 E1, F4 D5, F4, E1, E3
9 D5 E5 No next train F4, E5, E3, E1
10 F4 E4 No next train E4, E5, E3, E1
11 E4 Destination E4 is reached E5, E3, E1
12 E5 Destination E5 is reached E3, E1
13 E3 Destination E3 is reached E1
14 E1 Destination E1 is reached
Table A.2 : Uniform Cost trace

A-3
Greedy
serial The The enqueued Notes OpenList
extracted children in the
node from OpenList
the
OpenList
A1 First node A1
1 A1 B1 , A2 B1, A2
2 B1 C1 No next train C1, A2
3 C1 F1 No next train F1, A2
4 F1 E1, F4 E1, A2, F4
5 E1 Destination E1 is reached A2, F4
6 F4 E4 No next train E4, A2
7 E4 Destination E4 is reached A2
8 A2 C2, A3 C2, A3
9 C2 C1 C1 is enqueued again in C1, A3
OpenList
10 C1 F1 No next train F1, A3
11 F1 E1, F4 E1, F4, A3
12 E1 Destination E1 is reached F4, A3
again
13 F4 E4 No next train E4, A3
14 E4 Destination E4 is reached A3
again
15 A3 D3 No next train D3
16 D3 E3, D5 E3, D5
17 E3 Destination E3 is reached D5
18 D5 E5 No next train E5
19 E5 Destination E5 is reached
Table A.3 : Greedy trace

A-4
BFS
serial The The enqueued Notes OpenList
extracted children in the
node from OpenList
the
OpenList
A1 First node A1
1 A1 B1 , A2 B1, A2
2 B1 C1 No next train A2, C1
3 A2 C2, A3 C1, C2, A3
4 C1 F1 No next train C2, A3, F1
5 C2 C1 C1 is enqueued again in A3, F1, C1
OpenList
6 A3 D3 No next train F1, C1, D3
7 F1 E1, F4 C1, D3, E1,F4
8 C1 F1 No next train D3, E1,F4, F1
9 D3 E3, D5 E1,F4, F1, E3,
D5
10 E1 Destination E1 is reached F4, F1, E3, D5
11 F4 E4 No next train F1, E3, D5, E4
12 F1 E1, F4 E3, D5, E4,
E1, F4
13 E3 Destination E3 is reached D5, E4, E1, F4
14 D5 E5 No next train E4, E1, F4, E5
15 E4 Destination E4 is reached E1, F4, E5
16 E1 Destination E1 is reached F4, E5
again
17 F4 E4 No next train E5, E4
18 E5 Destination E5 is reached E4
again
19 E4 Destination E4 is reached
again
Table A.4 : BFS trace

A-5
DFS
serial The The enqueued Notes OpenList
extracted children in the
node from OpenList
the
OpenList
A1 First node A1
1 A1 A2, B1 A2, B1
2 A2 A3, C2 A3, C2, B1
3 A3 D3 D3, C2, B1
4 D3 E3, D5 D5, E3, C2, B1
5 D5 E5 E5, E3, C2, B1
6 E5 Destination E5 is reached E3, C2, B1
7 E3 Destination E3 is reached C2, B1
8 C2 C1 C1, B1
9 C1 F1 F1, B1
10 F1 E1, F4 F4 , E1, B1
11 F4 E4 E4, E1, B1
12 E4 Destination E4 is reached E1, B1
13 E1 Destination E1 is reached B1
14 B1 C1 C1
15 C1 F1 F1
16 F1 E1, F4 F4 , E1
17 F4 E4 E4, E1
18 E4 Destination E4 is reached E1
again
19 E1 Destination E1 is reached
again
Table A.5 : DFS trace

Observations

• Because OpenList is limited in our example the power of A* cannot be noticed,


so it behaves similar to Uniform Cost, that is why A* is not always preferred over
Uniform Cost.
• Also the total number of hubs in the graph is 14 hubs. Greedy, BFS and DFS
techniques generates 19 hubs that is because they regenerate hubs again after they
where expanded and moved to the closed list.
• A* and Uniform Cost reach the goals only once, but greedy, BFS and DFS repeat
goals for different routes.

A-6
A.2 Reservation Case Study

Consider the following example:

We have a journey from station Alex to station Aswan and it passes Tanta, Cairo and
Kena stations respectively. Alex Æ TantaÆ Cario Æ KenaÆ Aswan

Suppose there are some passengers who want to reserve tickets on this journey:
Let:
• First passenger travel from Alex station to Cairo station
• Second passenger travel from Alex station to Kena station
• Third passenger travel from Tanta station to Kena station
• Forth passenger travel from Kena station to Aswan station
• Fifth passenger travel from Kena station to Aswan station

Then, according to the Best Fit algorithm, which is applied in reservation, the seats’
numbers returned to these passengers from the reservation page, if they request their
tickets in the shown order will be as follows:

For first passenger the seat number will be 1 which is the first seat because all of seats are
available before his request.

For second passenger the seat number will be 2 because first seat is reserved from Alex to
Cairo, which overlaps this passenger’s journey from Alex to Kena.

For third passenger the seat number will be 3 also because seat 1,2 rservation periods
overlap this passenger’s journey.

For forth passenger the seat number will be 2, and here, the effect of the algorithm appear
since, seats 1,2 and 3 are available for the segment from Kena to Aswan but in seat 2 the
available segment is shorter than seat 1 and 3 available segments.

For fifth passenger the seat number will be 3 also the seat 3 has shorter right available
segment than seat 1 and 2.

A-7
Appendix B

User Guide
In our web site we tried to provide a user friendly interface that minimizes the possibility
of error occurrence by using drop lists and applying initial suggestions for the other
inputs.

The user guide is divided into two sections, one for passenger and the other for the
administrator of the database.

B.1 Passenger guide

• Query page:

Figure B.1: query page

The query page is divided into three parts:


o Source and destination
Here, you can determine the source and destination stations for your
journey.

B-1
o Date and time
Here, you can enter the preferred departure date and time but this date
and time is the lower limit one, which means that the output journeys’
departure date and time may be after the entered one but not before it.
o Additional information
It is an optional part to put your criteria. You can leave this part with
its default values or change some or all of them.
These criteria are the preferred classes, maximum number of
exchanges which is upper limit for number of changing trains in the
intermediate stations; also you can enter the type of discount and ticket
type. Because the number of solutions may be large and takes time to
be displayed, you can determine the number of solution which will
appear at a time, and then you can see the rest of solutions as we will
explain in the output page.

• Output page

Figure B.2: output page

The output page displays the journeys in a table; the meaning of each column is as
follows:
o Journey
This field is detailed information of the journey displayed in this row
which is composed of:
• Station
• Departure and arrival Times for this station (note for the source
station there exists only a departure time and for the destination
station there exists only an arrival time.)

B-2
• Date
• Waiting time which exists at changing trains and it is the
difference between the departure time of second train and the
arrival time of first train, like in the first journey in changing
the trains in Fayoum station.
• Trip number you need it in reserving a ticket for this trip.

o Time
It is the departure and arrival time for the whole journey.
o Duration
The duration of the journey is the difference between arrival and departure
times for whole journey.
o No of trains
It is the total number of trains used in this journey.
o Cost
Cost is displayed for each trip and for each class available in this trip
according to the preferred classes entered in the query page followed by its
seat price.
Beside each class in each trip a link that enables you to reserve a ticket in
this trip with this class as explained in the reservation page.

The number of journeys displayed in output page is according to the preferred one
in which you enter it in the query page. You can see previous or next journeys by
pressing previous or next buttons respectively whenever they are enabled.

• Reservation page:

Figure B.3: reservation page

B-3
For example, if you want to take the first journey displayed in output page in figure B.2,
you may need to reserve two tickets one for trip 3 and the other for trip 4, when you press
the Reserve link beside the class1 for 6.0 L.E to reserve a seat in trip 3, the link will lead
you to the reservation page shown in figure B.3, which will be initialized with all the data
of selected trip.

If these values are suitable for you press the submit button to reserve the ticket if there is
an available seat in this trip a message will appear to tell you the number of the reserved
seat and vehicle number and ticket fare and refund fare as shown in figure B.4

Figure B.4: reservation message

If no seat is available, another message will inform you that there is no available seat.

B.2 Administration guide


To be able to access administration pages you should first login in with your user name
and password in home page.

After login the administration page will be appear as shown in Figure B.5.

Figure B.5: administration page

You can insert a new station, trip, route, segment, train type, ticket fare or trip instance.
We will discuss next a preferred order of inserting data using the insertion pages.

B-4
Inserting pages

The data entry process should begin with filling the master tables that are not updated
frequently and have no insertion pages, such as: M_FCC, M_TRN_CLASSES, and
M_TKT_TYPE. After that we suggest to use insertion pages in following order:

1- Insert all stations

Figure B.6: station page

The new station page consists of


ƒ Station serial number
Each serial number suggested in any page is the maximum serial number,
found in its table plus one.

ƒ Station name.
ƒ X coordinate to a reference point (e.g. Cairo station).
ƒ Y coordinates to a reference point.

B-5
2- Insert all routes

Figure B.7: route page

The new route page consists of

• Route serial number.


• Route name.

3- Insert segments

Figure B.8: segment page

The new segment page consists of three drop lists for:


• From station ( Departure station).
• To station (Arrival station).
• Via (Route).
And a text box for:
• Distance between these two stations.

B-6
4- Insert all train types

Figure B.9: train type page

The new train type page consists of


• Train type serial number.
• Train type name.
• Train type velocity.

5- Insert fares

Figure B.10: ticket fare page

The new ticket fare page consists of drop lists for:


• Train type.
• Train class.
• Ticket type.
• Type of discount (FCC).

B-7
And text boxes for:
• Distance.
• Ticket fare (price).
• Ticket refund fare.

6- Insert trips

The new trip page is divided into three part each part is for inserting data in
different table

Figure B.11: trip page, trip information part

The first part concerns with taking inputs of


• Trip serial number.
• Train type of this trip.
• Start station (Departure station).
• Days of the week in which the trip runs.
• Starting date of operating this trip.
• Expiring date of this trip.

These data will be stored in the Trip table.

B-8
Figure B.12: tripe page, vehicles part

The vehicles part concerns with adding vehicles to the trip, each vehicle has a
class and number of seats to add a vehicles press the button add vehicle, the
vehicles will be stored in the Trip_vehicles table.

Figure B.13: trip page, segments part

The segments part concerns with the intermediate segments.


For each segment you insert

• Station from (Departure station).


• Departure time from the departure station.
• Station to (Arrival station).
• Arrival time to the arrival station.
• Route name.

Departure station of first segment should be the start station of the trip. And the
arrival station of each segment should be the departure station of the followed
segment. So, you can use the buttons: add segment, add segment before selected,
and remove segment to modify your intermediate segments.

B-9
7- Insert trip instances

Figure B.14: trip instance page

The new trip instance page consists of


• Trip code
• Trip date

The new trip instance page is used daily to insert that trip instances of the trips
that are going to run in the next week, to enable the passengers to reserve tickets a
week before the date of the trip.

B-10