Sustainable Travel During The Olympic and Paralympic Games: A Methodology To Model Public Transport Travel For Paris 2024

DEGREE PROJECT IN TECHNOLOGY
SECOND CYCLE, 30 CREDITS

STOCKHOLM, SWEDEN 2021
Sustainable travel during the

Olympic and Paralympic
Games: A methodology to
model public transport travel
for Paris 2024
KTH Master Thesis Report
Axel Dumont
May 11th, 2021
KTH ROYAL INSTITUTE OF TECHNOLOGY

ARCHITECTURE AND THE BUILT ENVIRONMENT
Author
Axel Dumont
<adumont@kth.se> & <axel.dumont@eleves.ec-nantes.fr>
Master Transport and Geoinformation Technology
KTH Royal Institute of Technology, Sweden & Ecole Centrale de Nantes, France
Place for Project

Île-de-France Mobilités
MEP Department
Paris, France
Examiner
Professor Anders Karlström
Head of the Division System Analysis and Economy
KTH Royal Institute of Technology
Supervisor
Christer Persson
Division System Analysis and Economy
KTH Royal Institute of Technology
ii
Abstract
This Master Thesis develops the challenges of travel modeling during the Olympic and Para-
lympic Games, more specially for the Paris Olympics in 2024. This problem as been set by
Île-de-France Mobilités (IDFM), the transport organisation authority of Paris and its re-
gion, that has therefore to deal with public travel during the Olympics. A very simplified model
was already in use, but is no longer sufficient.
The exceptional nature of this event, considered as a mega-event [Müller, 2015], requires
a precise understanding of the subject as well as a different and adaptive modeling process.
Thus, this work presents a detailed methodology for public transport travel modeling
in Paris and its surroundings during the Olympics. This model will become more and
more refined until the end of this mega-event, in order to present results or advert the multiple
stakeholders around the topic of the Olympic Games transportation (event organizers, transport
operators).
The two significant parts of the model are distinguished and described: the Olympic Games
related trips and the background demand, which require two very different approaches.
The Olympic Games (OG) demand needs several assumptions which are often in constant
evolution: the versatility of the parameters is a very important point to take into account. On
the other side, the background demand prediction is a significant challenge because it differs
from what is usually done. Both of these parts are adapted from the principle of the four-
step transportation model and reuse parts of the IDFM model, ANTONIN 3, specifically
calibrated for the Île-de-France region.
It is also necessary to conceive with the will to adapt as much as possible the available transport
data and the tools already in operation, such as the model already in use. Suggestions for
further improvements are also mentioned to refine the results until the final day which will be
possible thanks to enhancements of the input assumptions over time, such as ticketing data for
instance.
Keywords
Transport modeling, Modeling process, Four-step model, Mega-event, Olympic Games, Public
transport
iii
Abstract
Detta examensarbete utvecklar utmaningarna med resemodellering under de Olympiska Spelen

och de Paralympiska Spelen, mer speciellt för OS i Paris 2024. Dette problem har fastställts
av Île-de-France Mobilités (IDFM), transportorganisationen i Paris och dess region, som
därför måste ta itu med allmän resa under OS. En mycket förenklad modell var redan i bruk
men är inte längre tillräcklig.
Den händelsens exceptionella karaktär, betraktas som en mega-händelse, kräver en exakt
förståelse av ämnet och en annorlunda och adaptiv modelleringsprocess. Således presenterar
detta arbete en detaljerad metod för modellering av kollektivtrafikresor i Paris och
dess omgivningar under OS. Denna modell kommer att förfinas mer och mer fram till slutet
av detta mega-evenemang, för att presentera resultat eller annonsera för flera intressenter kring
ämnet Olympiska Spelen (arrangörer, transportoperatörer).
De två viktiga delarna av modellen kännetecknas och breskrivs: de olympiska spelen relaterade
resor och bakgrundskravet, vilket kräver två mycket olika tillvägagångssätt.
OS-kravet behöver flera antaganden som ofta är i konstant utveckling: mångsidigheten hos
parametrarna är en mycket viktig punkt att ta hänsyn till. På andra sidan är förutsägelsen
för bakgrundsbehov en betydande utmaning eftersom den skiljer sig från vad som vanligtvis
görs. Båda dessa delar är anpassade från principen i fyrstegs transportmodell och åter-
användningsdelar av IDFM-modellen, kallad ANTONIN 3, specifikt kalibrerad för regionen
Île-de-France.
Det är också nödvändigt att tänka med viljan att så mycket som möjligt anpass tillgängliga
transportdata och de verktyg som redan är i drift, såsom den modell som redan används.
Förslag på ytterligare förbättringar nämns också för att förfina resultaten fram till D-dagen
vilket kommer att vara möjligt tack vare förbättringar av ingångsantagandena över tid, till
exempel biljettdata.
Nyckelord
Transportmodellering, Modelleringsprocess, Fyrstegsmodell, Mega-evenemang, Olympiska spe-
len, Kollektivtrafik
iv
Acknowledgements
I would like to thank Île-de-France Mobilités for hosting me during this Master Thesis and
to all those who have helped or worked with me during this period, in particular Laurence
Debrincat, head of the Prospective & Etudes (Prospects & Studies Data) (PE) department,
and my colleagues from the team Modélisation et Evaluation de Projets (Modelling and Projects
Evaluation) (MEP).
I would like to thank Hervé Genest, my supervisor at the company, for his guidance, advice,
confidence and permanent good mood during these six months.
I would also like to thank Christer Persson, my supervisor at KTH for his regular feedback and
his help on the report, and Anders Karlström, my examinator, for accepting this thesis and all
that it implies, in particular scheduling it and examining it.
Finally, thank you to all those with whom I could talk about the Olympics, my family, my
roommates and friends, Baptiste and Antoine, for those months spent longer than expected in
the apartment.
v
List of Acronyms
ANTONIN Analyse des Transports et de l’Organisation des Nouvelles Infrastructures

(Transport and Organization of New Infrastructures Analysis)
EGT Enquête Globale Transport (Global Transport Survey)
IDFM Île-de-France Mobilités
IOC International Olympic Committee
IPR Institut Paris-Région
JOB Jours Ouvrables de Base (Basic Working Days)
LOS Level of Service
MEP Modélisation et Evaluation de Projets (Modelling and Projects Evaluation)
MNL Multinomial Logit
OCOG Organising Committees for the Olympic Games
OD Origin-Destination
OG Olympic Games
OMNIL Observatoire de la Mobilité en Ile de France (Observatory of mobility in
Île-de-France)
P+E Population et Emploi (Population and Employment)
PE Prospective & Etudes (Prospects & Studies Data)
RATP Régie Autonome des Transports Parisiens
RER Réseau Express Régional (Regional Express Network)
SIDV Système d’Information des Données de Validation (Validation Data Information
System)
SNCF Société Nationale des Chemins de Fer français
TAZ Traffic Analysis Zone
TDM Travel Demand Management
TJRF Trafic Journalier du Réseau Ferré (Daily Railway Traffic)
TfL Transport for London
vi
List of Figures
1.2.1 CO2 emissions (in grams) by passenger by kilometer for each mode for urban
and suburban passenger transport in France, 2018 . . . . . . . . . . . . . . . . . 2
1.3.1 The Olympic flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.2 Number of tickets sold for the past five Summer Olympic Games . . . . . . . . . 4
1.4.1 Paris 2024 Logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.2 Paris 2024 Olympic Venues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.1 Location of the Île-de-France region (in red) in France . . . . . . . . . . . . . . 5
1.6.1 Île-de-France Mobilités Logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Four-step transport model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.2 Example of a tree representation . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.1 ANTONIN 3 workflow, adapted from [Tuinenga et al., 2015] . . . . . . . . . . . 20

3.2.1 ANTONIN 3 zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.2 Train line representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 ANTONIN 3 trip chain simplification . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.1 MD structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.2 MD1 structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 Existing OG model workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3.1 Detailed model workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5.1 OG-demand part workflow. Zoom of Figure 4.3.1 . . . . . . . . . . . . . . . . . 32
4.6.1 Background demand part workflow. Zoom of Figure 4.3.1 . . . . . . . . . . . . . 35
5.2.1 Database schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2.2 Example of a random arrival departure profile . . . . . . . . . . . . . . . . . . . 42
5.2.3 Extract from the counting data file of the RER A, 2018 . . . . . . . . . . . . . . 44
5.2.4 Extract after treatment of the counting data file of the RER A, occupancy of
each section per hour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2.5 Extract from the metro.lin file and details for the Metro 1, Vincennes (Vin) -
La Défense (Def) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2.6 Screenshot of the visualization of carwax.net . . . . . . . . . . . . . . . . . . . . 46
5.4.1 Example of the inside of a CUBE box . . . . . . . . . . . . . . . . . . . . . . . . 47
vii
List of Tables
3.2.1 LOS parameters for each mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3.1 Tour and trip patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
viii
Contents
1 Introduction and Context 1

1.1 Degree Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Sustainability and Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Olympic Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Paris 2024 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Public transportation in Île-de-France . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 The company: Île-de-France Mobilités (IDFM) . . . . . . . . . . . . . . . . . . . . 7
1.7 The role of IDFM for the Olympics . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.8 Modelling and Projects Evaluation (MEP) Department . . . . . . . . . . . . . . . . 7
1.9 Collaborations within the project . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Literature Review 9
2.1 Predicting Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Multi-step models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Gravity models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.3 Discrete choice modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.4 Level of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Previous Olympic Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Two parts to be distinguished . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Modeling process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Tools used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4 Points to keep in mind . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 ANTONIN, the IDFM model 19

3.1 General description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Initial data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 EGT 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.3 Zonal data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.4 Transport supply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.5 Level of Service by mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.6 Public transportation Level of Service (LOS) . . . . . . . . . . . . . . . . . 22
3.2.7 Private car LOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Demand generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.1 Trip chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Combined mode and destination selection models . . . . . . . . . . . . . . . . . . 24
ix
CONTENTS
3.5 Hourly distribution of trips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.6 Pivot procedure principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Model 27
4.1 OG-demand and background demand . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 The existing OG model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.2 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.3 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2.4 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 General presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5 OG demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5.1 Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5.2 Step 1: Trip destinations . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5.3 Step 2: Trip origins and distribution . . . . . . . . . . . . . . . . . . . . . 32
4.5.4 Step 3: Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.5.5 Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.6 Background demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.6.1 Step 1: Network loaded in current period . . . . . . . . . . . . . . . . . . . 36
4.6.2 Step 2: Evolution by 2024 . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.6.3 Step 3: Evolution by summer 2024 . . . . . . . . . . . . . . . . . . . . . . 37
4.6.4 Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.6.5 Travel Demand Management . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.7 General outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5 Data and Tools 40

5.1 Data required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1.1 OG-demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1.2 Background demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Available data and collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2.1 OG-demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2.3 Common data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3 Consideration of future available data . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Set up and tools used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6 Analysis 48
6.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.2 Limits and weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7 Discussions 51
7.1 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.1.1 OG-demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.1.3 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
x
CONTENTS
7.2 On the importance of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

7.3 From designing a model to project management . . . . . . . . . . . . . . . . . . . 55
7.4 Possible reuses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.5 General conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
References 57
xi
CONTENTS
xii
Chapter 1
Introduction and Context
1.1 Degree Project

The Degree Project was conducted as part of my Master’s degree in Transport and Geoin-
formation, in the System Analysis and Economics division at KTH Royal Institute of Tech-
nology in Stockholm, Sweden.
This project took place from September 2020 to March 2021 in collaboration with Île-de-
France Mobilités (IDFM), transport organisation authority, in Paris, France.
Paris2024 Olympic and Paralympic Games concept aims to be as responsible and sustainable
as possible. This includes of course the transport sector. One of the objectives is that all
spectators travel to and from the sites by using sustainable transport, and particularly
public transport. But spectators will no be the only ones traveling on the network. Other OG
stakeholders are also expected, to a lesser extent, but mostly, local residents also want to be able
to use the public transport. As the transport organisation authority in Paris and its region, it
is the responsibility of IDFM to ensure that this goal is achievable and that the entire network
is ready for the summer of 2024. Hence the need for a model to predict demand in advance
and adapt the public transport offer to meet demand.
The initial objective of the Degree Project was to build a model to forecast public transport
travel during the Paris 2024 Olympic and Paralympic Games. This actually means defining a
methodology and then producing an IT-tool, allowing to test different scenarios. Finally, only
the methodology section will be developed in this report. Indeed, the creation of the
tool as well as the concrete implementation of the model requires more time and represents
more than a Degree Project. In fact, there is no result presented in this Degree Project,
since the IT-tool is not yet available. Its design is thus out of the scope of this project. The
methodology definition requires an understanding of the challenges of OG modeling in order
to understand the characteristics and difficulties that this may entail. Then, it is necessary
to analyse and collect data and (IT-) tools available to the Organising Committees for the
Olympic Games (OCOG) or IDFM.
The report here presents first of all a contextualization phase focused on the Olympics, the
company, its role for Paris2024 and motivates the need of a new model (Chapter 1). Then, a
Literature Review chapter provides theoretical basis useful for understanding the following
1
CHAPTER 1. INTRODUCTION AND CONTEXT
chapters (Chapter 2). The model used by IDFM, ANTONIN 3, is described in Chapter
3, since the detailed methodology of the proposed Olympic Games model presented
afterwards is based on it (Chapter 4). This model chapter is followed by a focus on the data
used and to be used in this model (Chapter 5). The report concludes with an analysis and
discussions (Chapter 6) about what could be improved, as well as the evocation of further
improvements and discussion (Chapter 7).
1.2 Sustainability and Transport

One of the first definition of the concept of sustainable development has been given by the World
Commission on Environment and Development as the ”development that meets the needs of
the present without compromising the ability of future generations to meet their own needs”
[WCED, 1987]. This implies a more comprehensive, long-term and integrated approach.
The field of transport has a significant role to play in this sustainable development. Indeed,
in 2019, transportation is responsible for 24% of direct CO2 emissions from fuel combustion
in the world, with an annual increase of 1.9% since year 2000 [International Energy Agency,
2020]. We can therefore see that efforts can be made, especially concerning road vehicles, which
represent 3/4 of these emissions.
Figure 1.2.1 presents CO2 emissions by passenger by kilometer for urban and suburban passen-
ger transport in France [Agence de l’environnement et de la maîtrise de l’énergie (ADEME),
2018]. There is an factor of at least 20 between private cars emissions and public transport
emission that is mostly electric, such as tramways or subways. Hence the need to increase the
use of public transport and zero-emissions vehicles.
Figure 1.2.1: CO2 emissions (in grams) by passenger by kilometer for each mode for urban and
suburban passenger transport in France, 2018
From top to bottom: 1. Tramway, 2. Subway, 3. Commuter train, 4. Motorized two-wheeler,
5. Bus, 6. suburban passenger car, 7. Urban passenger car
2
NB: The values presented vary greatly from country to country, depending on the method of
electricity production (mostly nuclear in the case of France).
Historically, the Games represent of a big boost for the host city: new infrastructures, sports
facilities, new housing and districts, etc. But the projected costs of several billion dollars are
often exceeded, often by a large margin [Flyvbjerg et al., 2016]. And, unfortunately, once the
event is over, all these novelties are not always reused and are abandoned, which is the opposite
of the principle of sustainable development.
For the Olympic Games of Paris 2024, the organizers have a will of sobriety1 : few new infras-
tructures will be built, with reduced investment costs and carbon footprint. Thus, the current
budget of the Games (3.9 billion dollars) is the second lowest since at least 30 years for Summer
Games (behind Athens 2004).
This willingness of sustainable Games also involves the transportation sector. Of course, the
organisers will not be able to do much about international travel to Paris: millions of trips will
be done by plane, which has a huge carbon impact. But once in France, their main objective
is to convince the spectators but also the other actors of the Games to forsake as much as
possible the personal vehicles and to use the public transport, which, as we have seen on the
previous figure, are clearly less emitting of greenhouse gases (GHG). For this, an agreement of
free public transport for the owners of tickets for the Games has been signed with IDFM.
1.3 Olympic Games
Figure 1.3.1: The Olympic flag
The Olympic Games are certainly of the most famous international events in the world. The first
Games of the modern era were held in 1896, in Athens, Greece, based on a historical tradition
of Ancient Olympic Games organized between Greek cities of ancient Greece. Nowadays, the
Olympics gather several thousands of athletes from more than 200 nations, millions of spectators
(cf Figure 1.3.2) and billions of viewers all around the world, over a period of 2 weeks. The
Olympic Games clearly has its place among the mega-events [Müller, 2015].
1
https://www.paris2024.org/en/compact-and-accessible-games/, accessed on May 10, 2021
3
Figure 1.3.2: Number of tickets sold for the past five Summer Olympic Games
Source: Statista, International Olympic Committee (IOC)
1.4 Paris 2024
Figure 1.4.1: Paris 2024 Logo
Since 2016, the world knows that the 2024 Summer Olympic Games will take place in Paris.
But this is not only Paris; several venues are located in the Île-de-France region, close to Paris.
From a public transport point of view, this means that many different lines, not only the ones in
the city center, will to be used in a very different way than usual. This represents an important
challenge. Île-de-France residents must be able to travel in good conditions during this period,
while around five hundred thousand of spectators will use the same network each day (thus one
million trips), without affecting the smooth running of the network. This is why it becomes
necessary to plan these trips well in advance, because the Olympic Games are like no other
event, and therefore require specific work.
The Olympics is scheduled to take place from July 26 to August 11 and the Paralympics are
planned from August 28 to September 8. The OCOG, also called Paris2024, responsible for the
organization, planning and financing of the Games, has been created in 2018.
4
Figure 1.4.2: Paris 2024 Olympic Venues
1.5 Public transportation in Île-de-France
Figure 1.5.1: Location of the Île-de-France region (in red) in France

Source: Wikimedia, License Creative Commons
The Île-de-France region is a region in the center of France, and around its capital, Paris. This
territory is occupied by around 12 million inhabitants.
5
The public transport network in Île-de-France is structured as follows:
• Metro network. These 16 subway lines (1 to 14, plus 3bis and 7bis) grid the city of
Paris and its inner suburbs. The company which operates on these lines is the Régie
Autonome des Transports Parisiens (RATP)2 . The first lines are more than 100 years
old.
• Transilien. Transilien is the name of the commuter train network of SNCF Voyageurs
serving the Île-de-France. This network weighs 3.5 millions trips per day and is constituted
by:
– Réseau Express Régional (Regional Express Network) (RER). The RER

is a rapid transit service but also commuter trains serving Paris and its suburbs.
These lines, on the contrary of the other commuter train lines, cross Paris on both
sides. Five lines, from A to E, currently operates. The RER A is the busiest line
in Europe, with 1.4 million trips per working day. These lines are quite complex
because of their different branches. The RER A and B are both jointly operated
by the RATP and the Société Nationale des Chemins de Fer français (SNCF)3 , the
three others by the SNCF only.
– Non-RER lines. Eight of these lines (called H, J, K, L, N, P, R, U) do not cross

Paris, but each of them reach an important Parisian train station (Saint-Lazare,
Gare du Nord, Gare de l’Est, Montparnasse, Gare de Lyon). These lines are only
operated by the SNCF. Two tramway lines (T4 and T11) are also included in this
network.
• Tramway lines. About ten tramway lines are available or under construction in the
region.
• Bus network. The Île-de-France bus network is actually composed of several networks
operated by different operators, and constitutes a set of 1500 lines, with around 5 millions
trips per day.
Thus, in 2010, 8.3 million trips took place every day on the entire network.
Of course, the public transport network is constantly evolving, is growing and getting strong
year after year. One of the structuring projects for the next 15 years is the Grand Paris Express
(GPE)4 , composed of four new automatic lines around Paris (15 to 18), and the extension of
two existing lines (11 and 14), for 200 additional kilometers.
2
Independent public company running the Parisian transport network
3
French National Railway Company
4
https://www.societedugrandparis.fr/
6
1.6 The company: Île-de-France Mobilités (IDFM)
Figure 1.6.1: Île-de-France Mobilités Logo
IDFM is a public authority based in Paris, that designs, controls and finance the public trans-
port network throughout the Île-de-France region. IDFM coordinates all operators such as the
RATP (mainly on the subway network and some bus lines), the SNCF (mainly RER and com-
muters train) and other private bus companies. Its administration council is directed by the
Île-de-France Regional Council, a decentralized local authority that rules this regional territory.
The director of IDFM is therefore the president of the region, currently Valérie Pécresse. This
organisation was created in 1959 and now has around 470 employees, distributed in different
directions.
My Master Thesis was carried out in the PE direction, more specially within the department
MEP, consisting of six mobility research officers and one manager.
1.7 The role of IDFM for the Olympics

The main objective in terms of transport for the Olympics is that all the spectators (around 10
million) travel to the venues using public transport. The role of Île-de-France Mobilités is to
make this goal achievable, in collaboration with the operators and OCOG. Several competences
of IDFM are involved in the realization of this project, for example to:
• Design a transportation plan
• Coordinates construction works and maintenances on the public transport network
• Think about a specific ticketing system for the public transport network during the
Games.
1.8 Modelling and Projects Evaluation (MEP) Department

About the Olympics, the role of the MEP department is to produce the most accurate forecasts
possible in terms of passenger flows on the periods covering the Olympics and the Paralympics.
The purposes of this modeling are varied:
• Notify the OCOG in case of sensitive point detection, for example not enough capacity
to service a site, in order to modify the choice of venues, their capacities, and schedules
when possible. The closer we get to the Olympics, the more difficult it becomes to have
influence on this point, as hypotheses are validated as we go along.
• Transmit the results to the operators and other IDFM directions so that they can adapt
their transportation plan.
7
• Perform Travel Demand Management (TDM)5 . For the OG, the main objective of TDM
is to free up space in public transportation for the OG-related trips.
MEP has been in charge of this task since Paris’s bid to host the Olympic Games (2015), with
simplified models carried out punctually according to the needs.
1.9 Collaborations within the project

In a general way, this Degree Project was conducted independently, although in collab-
oration with different stakeholders on several points.
I mainly collaborated with members of the MEP department whose missions have been ex-
plained above, mainly in order to understand the functioning of the ANTONIN model (cf. 3)
and its limitations for the development of the new OG model.
I worked with some other employees of the PE direction, such as members of the Observatoire
de la Mobilité en Ile de France (Observatory of mobility in Île-de-France) (OMNIL) (cf. 3.2.1
for more details), to obtain a list of data on the use and ridership of the public transport
network, in order to get a sense of available data that can be manipulated for this project.
Within the PE direction, I also collaborated with Nicolas Boichon, in order to design a file with
all the assumptions required on the Olympic side, necessary for the proper functioning of the
new model, and on which we wanted the OCOG to give values or to pronounce on an rough
size. Meetings organized either by IDFM or the OCOG allowed me to better understand the
organization and the progress of the Olympic Games.
The main work conducted on the design of the methodology was a collaborative effort with
Hervé Genest, my supervisor at the company. Indeed, I worked independently in order to
suggest methods which were discussed and questioned by Hervé Genest thanks to his knowledge
of the ANTONIN model and the usable data. This allowed me to come up with a general method
that can actually be followed to develop the model.
On the KTH side, Anders Karlström, my examiner, accepted this degree project. Together
with Christer Persson, my supervisor at KTH and Hervé Genest, we defined several milestones
for the progress of the thesis until its defense in April. During that time, Christer Persson gave
me a lot of advice and feedback regarding the writing of my report. Final remarks were given
by Anders Karlström after the seminar.
Finally, elements not presented in this report such as the writing of scripts in Python and the
development of some first parts of an application on the CUBE software are my personal and
independent work.
5
Application of strategies and policies to reduce travel demand, or to redistribute this demand in space or
in time. Wikipedia, Transportation Demand Management page, viewed on January 14, 2021
8
Chapter 2
Literature Review
Before designing a specific tool of travel modeling for Paris2024 Olympic Games, it is interesting
to get an overview of the types of models that exist in the field of transportation. In a second
part, the literature review will be focused on the work carried out during previous Summer
Olympiads, that fall into the category of mega-events [Müller, 2015]. All these elements will
therefore be sources of inspiration for the design phase of the new model.
2.1 Predicting Demand

One category of the general theme of transport modelling is predicting demand. Very often,
the idea is to forecast the demand evolution after some transport policy (for example new
infrastructure, tolls, financial aids, etc.), whether short or long term. In this case, there is no
other solution than using a model to represent the future situation and evaluate its impacts.
Here, we are exactly in this situation: the objective is to forecast the situation after a couple
years (2024) and thus especially changes in the public transit network, which is constantly
evolving.
For this task, many very different methods exist. They are different in particular because of
their purposes, which may be very varied: geographical scale, time scale, accuracy, complexity,
etc. The [van Wee et al., 2013] models’ classification gives an overview of the different types of
models that exist, distinguishing in particular:
• Descriptive and explanatory models: descriptive models represent only the correlation
between variables whereas explanatory models determine causes and consequences.
• Spatial and non-spatial models: depending on the importance or not of the physical space
or the location of activities
• Static and dynamic models: depending on the importance of the time factor or not
• Models based on revealed preference (RP) or based on stated preference (SP): in the case
of RP, the actual choices of an individual are observed in a real situation, whereas in the
case of SP, the individual has to choose between hypothetical situations
• Models for travel versus models for activities: Trips derives from activities. Activity-
based models provide a better description of behaviour and consider sequences of trips,
9
CHAPTER 2. LITERATURE REVIEW
which are not taken into account by models for travel

Two other important kind of models are also described and can also be used: these are the
aggregated and the disaggregated models. The following section presents them in detail, in
the context of multi-step models, commonly used to predict public transport use but also
car.
2.1.1 Multi-step models

These models are focused on two aspects: the spaciality and the purpose of the trips [van
Wee et al., 2013]. Indeed, space and network (here, the public transport one) are well taken
into account. Second, as explained above, the models do not consider displacements as an end
in itself: Transport is a derived demand [Cole, 2005].
There are mainly two different kind of models: The aggregated ones and the disaggregated ones
[Ortúzar and Willumsen, 2011].
Aggregated models
Models aims to represent the behaviour of more than one individual. The study area is sliced
into multiple zones. Each of these zones is linked to the transport network studied (in this case,
public transport stations and lines). The most famous structure of aggregated models has the
shape of the four-step process:
Figure 2.1.1: Four-step transport model

Source: [Ortúzar and Willumsen, 2011]
Four-step models has been implemented since the 1950s in the US [McNally, 2007]. Each model
is thus based on four stages, which are in fact four sub-models:
10
1. Trip Generation. For a zone i, estimation of the number of trips that depart from this
zone Oi , and the number of trips that arrive in this zone Di .
2. Distribution. Determination of the number of trips between each couple of zones, Tij
for the number of trips departing from the zone i and arriving in the zone j. These are
called Origin-Destination (OD) matrices.
3. Modal split. Assignment of a transport mode for every OD. Ti,j,mode is thus the number
of trips from the zone i and the zone j, done with the mode mode.
4. Assignment. Assignment of the trips on the network. Indeed, there may be different
available routes between two zones.
Disaggregated models
Disaggregated models stem from on the individual choices of every individuals in each zone, on
the contrary to aggregated models for which groups of individuals with similar characteristics
are considered. They are mainly represented by Logit models (probabilistic choice models)
and specially by Multinomial Logit Model (MNL) [Ben-Akiva, 1973], described below (cf.
2.1.3). These models mainly started to be developed and used in the 1970s.
There are also four steps, that are comparable to those of the aggregated models:
1. The choice of making a trip or not
2. The choice of a destination among different possibilities
3. The choice of a transport mode among those available
4. The choice of a route
Then, individuals choices are aggregated by zones, by summing every individual behaviour i.e.
trips over each zone.
For each choice, two or more propositions are available. These choices might be explained by
several explanatory variables, such as the cost, the travel time, the safety, comfort, etc, used
for the utilities. Thus, these models allows to easily emphasize the available transport modes,
this is why they are often used for traffic planning, specially to forecast volumes variations after
some changes in the network structure (price, new line, travel time, etc.). The models might
be represented by trees [Daly, 1987]: each node correspond to a choice (cf Figure 2.1.2).
It is also important to keep in mind that these steps are not necessarily in this order when
an individual chooses to make a move. For example, since she/he knows he has a personal
vehicle, she/he has more possible destinations (because she/he can reach them more quickly)
than if she/he did not have a vehicle. Thus, some models have modal split before trip distri-
bution, others perform distribution and mode choice simultaneously [Ortúzar and Willumsen,
2011].
2.1.2 Gravity models

Gravity models are frequently used in economics and initially come from an analogy with
Newton’s gravitational law, the first uses date back to the 19th century for migration patterns
11
Figure 2.1.2: Example of a tree representation
[Ravenstein, 1889]. Indeed, in the field of transport modelling [Erlander and Stewart, 1990],
they aim to create links between origins and destinations, when predicting travel demand,
often in the aggregated type of multi-step models. Therefore, they are often use in the Trip
Distribution step of four-steps models, to determine the OD matrix, composed of the Tij
coefficients. Trip distribution from zone i to j can be computed according to the following
equation:
Tij = Ai Bj Oi Dj f (cij ) (2.1)
with:
Oi the origin size, Dj the destination size
∑ ∑
Ai and Bj the balancing factors, such as Oi = j Tij and Dj = i Tij
cij the generalized cost for travelling from i to j
f is the distance-decay function, a generalised function of the travel costs which represents
the probability of travel from i to j. The purpose of this function is to represents the aversion
to travel as distance or cost increases. Thus, classical formats of this function are:
• Exponential functions (f : x → e−βx , β > 0)
• Power functions (g : x → x−α , α > 0)
2.1.3 Discrete choice modelling

A discrete choice model aims to predict choices between two or several discrete alternatives, ac-
cording to different parameters grouped into one function. In our case, discrete choice modelling
is often used to model the mode choice.
Multinomial Logit Model
One of the most common tool for discrete choice modelling is the Multinomial Logit (MNL)
model that derives from the utility theory [Fishburn, 1982]. MNL models appeared in the
early 1970s, the general theory and their adaptation to the field of transport has been described
by [McFadden, 1974].
12
Mathematically, MNL can be represented as follows:

Let A = {1, ..., N } a set of alternatives. Each alternative has attributes, represented by a vector
xn for n = 1, ...N . The utility of the alternative n is defined as:
Un (xn ) = Vn (xn ) + ϵn (xn ) (2.2)
with:
Vn the deterministic utility
ϵn the random utility component
The most common specification of the deterministic utility Vn is as a linear combination of

attributes and parameters representing the individual’s preference for the attribute:
Vn (xn ) = θ0 + θ1 xn,1 + θ2 xn,2 + ... + θm xn,m (2.3)
with θj the parameter representing the individual’s preference for the attribute xn,j
Then, the probability of choosing alternative i rather than alternative j is:
P (Ui > Uj ) = P (Vi − Vj > ϵj − ϵi = ϵ) (2.4)
If we assume that ϵ is Gumbel-distributed1 [McFadden, 1974], the probability of choosing

alternative i is given by:
eVi
P (Ui ≥ max Uj ) = ∑ (2.5)
j∈A
j∈A eVj
In transport models, Vn is also called generalized cost and depends on several attributes such
as the cost, the travel time, the waiting time, the mode of transport, etc.
Another indicator aims to take into account all the generalized costs of each route for each
OD couple and is called the composite cost. It might be used to determine the public
transport LOS (cf 2.1.4). The calculation of the composite cost is assimilated to a logsum of
the generalized costs.
In classic transport cases, MNL may not be sufficient because all the alternatives are not
independent (for instance, several public transport modes versus private car), hence the need
for more complex models, presented below.
Nested Logit Model
Nested Logit Model is more powerful than the previous type of model presented above. It
consists of several nested choices that follow each other, in a way, a sequence of MNLs, hence
its name, Nested Logit (NL) Model . NL model are also used since the 1970s, especially
since the complete analysis of their properties by [Williams, 1977].
1
See more: https://www.sciencedirect.com/topics/engineering/gumbel-distribution
13
Again, Nested Logit Model is consistent with utility maximization [McFadden, 1977, Daly and
Zachary, 1978]. The following presentation is based on [Train, 2009]:
This time, the previously defined set of alternatives A is partitioned into M subsets, denoted
B1 , ..., BM and called nests. Similarly, the utility that person n obtains from alternative j in
nest Bm is noted:
Unj = Vnj + ϵnj (2.6)
with:
Vnj the observed utility
ϵnj a random variable not observed by the researcher
Nested Logit Model is obtained by assuming that the vector of unobserved utility, ϵn , has a
generalized extreme value (GEV) distribution (category which includes the Gumbel distribu-
tion), with parameter λm , which is a measure of the degree of independence in unobserved
utility among the alternatives in the nest m.
This rise to the following probability for alternative i ∈ Bm :

(∑ )λm −1
eVni /λm j∈Bm e
Vnj /λm
Pni = ∑M (∑ ) λl (2.7)
l=1 j∈Bl eVnj /λl
In the case where λm = 1 for all m (representing independence among all the alternatives in
all nests), this gives:
(∑ )0
eVni j∈Bm e Vnj
eVni
Pni = ∑ (∑ ) =∑ Vnk
k∈A e
M Vnj
l=1 j∈Bl e
We recognize expression (2.5): it is the standard MNL case. The nested logit model is a
generalization of logit.
Expression (2.7) is difficult to understand by itself. Another approach consists in a decom-

position into two logits [Train, 2009]. Indeed, the observed component of utility can be
decomposed into two parts:
1. W , a constant for all alternatives within a nest
2. Y , that varies over alternatives within a nest
For j ∈ Bm , utility is written as:
Unj = Wnm + Ynj + ϵnj (2.8)
with:
Wnm that depends only on variables that describe nest m.
Ynj that depends on variables that describe alternative j.
The probability of choosing alternative i ∈ Bm is then expressed as the product of two proba-
bilities:
Pni = Pni|Bm PnBm (2.9)
14
with:
• PnBm the marginal probability of choosing an alternative within nest Bm is chosen
• Pni|Bm the conditional probability of choosing alternative i given that an alternative in

Bm is chosen
The main interest is that both probabilities take the form of logits, and can be express as:
eWnm +λm Inm ∑

PnBm = ∑M , where Inm = ln eYnj λm (2.10)
e Wnl +λl Inl
l=1 j∈Bm
eYni /λm
Pni|Bm = ∑ Ynj λm
(2.11)
j∈Bm e
Inm is often called the inclusive value or inclusive utility of nest Bm , or the log-sum term.
2.1.4 Level of Service

The LOS is a indicator used to quantify the level of performance of a transportation facility
from the user point of view. This indicator has been used since the seventies [Alter, 1976], first
of all on the road network rather than other transport infrastructures.
For a public transport network, it may involve several parameters such as cost, accessibility,
travel time, reliability, waiting time, frequency of service, passenger density. This indicator is
then better than just using one information such as the cost or the travel time, which are basic
information and do not completely reflect user experience.
This indicator allow to compare different routes done by public transport with each other, but
also will the other mode of transport, in order to determine which one has the highest utility
for the user. Thus, in case of several routes for the same OD couple, the LOS takes into account
all the generalised costs of relevant routes.
2.2 Previous Olympic Games

As the Summer Olympic Games take place every 4 years (even 2 years if Winter OG are
counted), the forecasts we want to obtain have already been done for other cities. Reports or
articles concerning the Olympic Games of the 21st century can therefore be interesting in order
to have information on the processes developed, the difficulties encountered and the unavoidable
points of Olympic modelling.
Regarding the city of Paris, it is for example particularly interesting to study an Olympiad like
London 2012 rather than Rio 2016, because they are two quite similar cities. Some articles
more or less detailed on the Olympic transportation issues [Currie and Shalaby, 2012, Kassens-
Noor, 2013] and on travel modeling are available for London 2012 [Donsunmu, 2012], Vancouver
2010 [Joshi et al., 2009], Beijing 2008 [Yan et al., 2010] and Athens 2004 [Frantzeskakis and
Frantzeskakis, 2006, Karlaftis et al., 2004]. The main lessons learned from these documents are
presented below.
15
2.2.1 Two parts to be distinguished

Every paper related to the previous Olympic Games mention the necessity to distinguish the
trips directly linked to the Olympic Games and those not related. Trips not related constitute
the background demand (also called residuals flows, base demand, or base-load) i.e. the
classic trips that would have taken place even without the Olympic Games that is to say trips
associated with usual Île-de-France residents2 activities. In most cases, two different models
are used to forecast these two demands.
One other significant point is the importance of understanding the roles of each categories of
stakeholders who intervene during the Games. Of course, there are the spectators, sportsmen
and women, but also volunteers, different categories of media, the games family, etc. Each
category has a different role, and therefore different travel habits (more or less use of public
transport) during the Games, that need to be accurately characterised. It seems that the most
significant category is the spectators, because they represent the largest volume. But the types
of stakeholders should not be neglected.
In addition, according to [Donsunmu, 2012], some major input parameters change frequently
over time before the Olympics, such as venue locations, event schedule, and therefore the
capacity in terms of spectators too, all of this mostly for political reasons.
2.2.2 Modeling process

Olympic demand
The very specific nature of an event such as the Olympic Games leads to the creation of
a specific model for the trips directly generated by this mega-event. From a more modeling-
oriented point of view, [Donsunmu, 2012] explains the process used for the OG public transport
demand forecasts in London. A first step consists in generating origin-destination matrices for
the spectators and the workforce, press and media, etc., whereas a second step tries to use as
best as possible existing public transport assignment models to determine route and services
choices.
The first step is based on the ticket choice model. Before the sale of the tickets, a gravity-
based model was used to predict the origins of the trips, based on a survey. In the case of
Beijing, also the assumptions are based on the information about the seating capacity of all
the venues, which is quite similar to the tickets. Assumptions are made to determine arrivals
and departures profile for each session. This allows to determine easily the spectators demand.
In each case, a very large majority of spectators are considered to use public transport.
Then, the spectators are divided into different categories according to their origin, the shares
are updated later with new information based on ticket sales. In London, special attention has
been paid to spectators who come from far away and can spend one or more days (overnight
trips) in the city, and have other activities (and travel) than the Olympic events. In Beijing,
the place of origin (home, hotel, work places, other venues) is also determined for 3 parts of
the day: the morning, the afternoon and the evening. Once the origins and the destinations
are known, a gravity model is used. In the case of Beijing, the model is single constrained,
2
As a reminder: the ”Île-de-France” is the name of the French region that includes Paris and represents
around 12 million inhabitants (just under 70 million in France)
16
with several impact factors (population, employment, accommodation, average income level) by
Traffic Analysis Zone (TAZ). After that, for London 2012, a multinomial logit model (cf.2.1.3)
is used for the mode choice model.
Concerning the transport network used, we notice that in some cases (Vancouver), only the
most important public transit lines for the Games are taken into account, the local network
is not considered. On the rail network within London, a modified version of Transport for
London (TfL)’s strategic rail model has been used to forecasts spectators and workforce de-
mands to access venues, roads events or live sites (entertainment and cultural events), for each
day, divided in one-hour steps, with distinction between London’s residents and non-residents.
Spreadsheet models are then used to refine this in a 15-min time segments.
Background demand
For the background demand part, since the Games happen during specific period, which is
often the summer holiday season, some adjustments need to be made compared to the classical
demand on an average day. For example, in China, schools were off, they expected more-than-
usual number of tourists, and shift from going to work to joining the games for the volunteers.
In most cases, the model already in use for the area is reused and adapted to take into account
the specificities of the Olympic Games. The significant question of the TDM is also evoked.
Traffic management measures must be included in this part, because they mostly affect the
background demand.
2.2.3 Tools used

Concerning the trips directly related to the Olympic Games, one notices that new specific tools
are developed for this special event. On the other side, existing local models are reused to
forecast the background demand, which is a more ”classical” part. These models are already
suitable with the area to be covered. Conventional modelling software are used, such as VISUM
in Vancouver.
Concerning the visualisation of results, the same range of software is mentioned, but also Excel
with VBA (London). It is also important to design output files that are easy to understand and
to interpret. Excel is also often used to prepare some input data. It seems that conventional
modelling software are not made to handle this kind of events and predictions (over a set of
consecutive days, very often all different from each other).
2.2.4 Points to keep in mind

• First, the differences between the cities for each Summer Olympics (Sydney, Athens,
Beijing, London, Rio) make it difficult to obtain useful data and comparisons ([Donsunmu,
2012]), although it is possible to draw inspiration from the tools that have been put in
place. In our case, the most similar city to Paris is London, but it is difficult to find
precise information.
• Another significant point to be taken into account is the non-ticketed events like road
events3 and live sites, for which it is more difficult to predict the volume and origin of
3
For example, events such as marathon, 50 kilometres race walk, road cycling, etc.
17
participants, since everyone is welcomed, a priori. In the case of Paris, we do not have
any information about the tickets yet. Other hypotheses about the origin of spectators
and workforce must be used.
• Travel Demand Management is also evoked and must be taken into account in particular
in the background demand part.
• A precise thinking on the outputs must also be carried out in order to automatically
obtain results that are easy to read and interpret for us but other stakeholders like the
OCOG, for example.
Finally, as reported especially in the London article, key inputs change a lot, as well as the
assumptions based on the best available information. Moreover, there are a large number of
hypotheses, which are difficult to determine precisely. In this case, it is thus important to design
a flexible process with efficient and reliable tools, so that updates can be performed easily. For
[Donsunmu, 2012] this is the key point of Olympic modeling. In any case, the assumptions and
values used for the modelling (although rare), especially of London 2012 are interesting for our
future work.
18
Chapter 3
ANTONIN, the IDFM model
This chapter provides detailed presentation of the model used by IDFM, called ANTONIN
3, seems mandatory, since the future OG model and tool will almost inevitably be based on it,
as will be seen in the Model chapter (cf. 4)
3.1 General description

The model called Analyse des Transports et de l’Organisation des Nouvelles Infrastructures
(Transport and Organization of New Infrastructures Analysis) (ANTONIN) is the transport
model developed and used by IDFM to achieve traffic forecasts and social-economic analyses
of various projects affecting transportation habits in the Île-de-France region, especially pro-
posals of new or extensions of public transport infrastructures (suburban railway lines, bus on
dedicated lanes, public transport schemes, etc.). The model is thus used for several stages of
a project’s life cycle, to make traffic forecasts and then social-economic assessments afterwards
[Debrincat and Meret-Conti, 2016].
ANTONIN has been designed and put into service at the end of the 90’s [Gunn et al., 1998].
The model has been updated a first time [Tuinenga et al., 2006]. The third version of the
model, called ANTONIN 3, is now in use for some years [Tuinenga et al., 2015].
It is an activity-based disaggregated multi-modal model (cf. 2.1.1). Seven transport

modes are taken into account: car driver, car passenger, motorized two-wheeler, bicycle, walk-
ing, public transportation (walking access) and public transportation (access by car).
Figure (3.1.1) represents the workflow and the different steps of the ANTONIN model and will
be described in the following subsections. The four sub-models of the four-steps model are
well present, although the model is more complex. Indeed, travel tours are generated rather
than trips, and mode choice and destination choice are performed together, and not one after
the other. The steps colored in orange are Discrete Choice Models.
19
CHAPTER 3. ANTONIN, THE IDFM MODEL
Figure 3.1.1: ANTONIN 3 workflow, adapted from [Tuinenga et al., 2015]
3.2 Initial data

3.2.1 EGT 2010
ANTONIN 3 is mainly based on the results of the Enquête Globale Transport (Global Transport
Survey) (EGT) from 2010. This is a so called regional household-displacement survey, which is
carried out approximately every 10 years and several tens of thousands of people are interviewed.
This survey allows:
• to know the large flows of travel according to the different modes and patterns
• to analyze the mobility behaviors of Ile-de-France residents
• monitor and interpret the evolution of travel practices
Since 2010, this survey is managed by IDFM and is released by the OMNIL [DRIEA and STIF,
2014, DRIEA and STIF, 2017]. A new EGT survey started at the beginning of year 2020 and
20
was supposed to be continuous, but the covid-19 pandemic unfortunately prevented it from
going well.
Expansion factors are used to generate a database of individuals from the individuals surveyed
in the EGT 2010, representative of the population in Île-de-France.
3.2.2 Zoning
The demand is described thanks to a breakdown of the Île-de-France region. The basic division
of ANTONIN 3 has 1805 zones (which can be redivided if necessary). Each zone is repre-
sented by a point, called centroid, where all the inhabitants, jobs, study places are virtually
located.
Figure 3.2.1: ANTONIN 3 zoning
3.2.3 Zonal data

For each of the previous zones, socio-economic and demographic data are available: these are
Population et Emploi (Population and Employment) (P+E) data. These data varies for each
modeling and can be refined.
3.2.4 Transport supply

The transport supply is precisely described in .lin files. All the public transport lines are
stored, with information such as the mode, frequency, stops, travel times, name. Correspon-
dence times are generated automatically or can be manually defined. A .net file contains a
geographical representation of the PT network (with nodes and links). This file also contains
the road network and its characteristics, such as the speed, which depends on the nature of the
road, its location, the period of the day. Access penalties for PT nodes are also indicated.
21
Figure 3.2.2: Train line representation
3.2.5 Level of Service by mode

For each mode of transport and origin-destination couple, LOS are calculated.
Table 3.2.1: LOS parameters for each mode
MODE PARAMETERS
Walk Distance via the shortest route
Bicycle Distance via the shortest route
Car Travel time, logarithm of travel cost, constraint on parking
at destination
Motorized two-wheeler Travel time
Public transportation Composite travel time during peak/off-peak hours, travel
cost
3.2.6 Public transportation LOS

Travel cost. The cost of the ticket is calculated depending on several parameters:
1. Age
2. Revenues
3. Trip purpose
4. Subscription
5. Origin and Destination
Composite travel time. The composite travel time quantifies the travel time perceived by
the user and does not correspond to the real travel time. A generalised travel time is calculated
for each route between one origin and one destination, depending on several parameters:
• Type of user: person under 55 years old coming on foot, person over 55 years old coming
on foot, person coming by car.
22
• Public transit categories: metro, recent automatic subway, RER, tramway,...

Each of these categories has different penalty coefficients: time in vehicle multiplying
factor, constant mode access penalty, constant correspondence penalty
• Walking time
• Waiting time
Based on all generalized travel times for a specific origin-destination, a composite travel time
is calculated. Of course, only relevant routes (determined by different criteria) are kept.
The following formula is used in ANTONIN:
( )
−1 ∑
CT T = log e−λGT Tr (3.1)
λ r∈R
with:
R the set of routes
GT Tr the generalised travel time of route r
λ the scale parameter which reflects the travelers sensitivity to travel time differences
This formula has the advantage of positively taking into account the addition of a new route,
whether it is longer or not. A ”simple” average would only increase this composite travel time.
In this case, the more alternative routes there are, the more the composite travel time decreases:
this illustrates perfectly the robustness of the network for this OD.
3.2.7 Private car LOS

Travel time are calculated thanks to the road network and its attributes presented above. The
parking constraint at destination is an information contained in another file, which completes
the network file. Finally, the travel cost corresponds to the trip distance multiplied by the
average cost per kilometre of car use (marginal costs and depreciation costs).
3.3 Demand generation

The mobility demand is modeled with a disaggregated approach, i.e. a suite of multinomial
logit models, thus with discrete choices, based and calibrated with the individuals results
from the EGT 2010, such as:
1. Possession of a driver’s license or not
2. Possession of one or more cars or not
3. Possession of a motorized two-wheeler or not
4. Possession of public transit pass or not
5. Generation of trip chain, by activity:
(a) Choice to carry out this activity
(b) Number of times one goes to this activity
23
6. Joint choice of destination and mode

Models 1 to 4 allow to determine which modes of transport are available for the individual,
whereas model 5 aims to determine the trip purpose and the number of tours. For each of these
possibilities, the choice depends on the individual’s characteristics, local amenities, and level
of service by mode.
3.3.1 Trip chain

In this model, trips are not considered to be single journeys or simple round trips. Trips chains
are here considered: a primary destination is considered, as well as a secondary destination.
Secondary trips are also considered, to model ”triangular” displacements. Tertiary destina-
tions are not taken into account because they represent very few trips in the EGT 2010; thus
triangular displacements are not taken into account (Figure 3.2.2). There are therefore three
scenarios:
• A primary chain only
• A primary chain and a secondary chain based the place of work or study
• A chain and a outward or return trip (secondary trips)
This results in 10 tour purposes and 2 secondary trip purposes (Table 3.3.1).
Table 3.3.1: Tour and trip patterns
Home-work for executives

Home-work for other workers
Home-business
Home-school
Primary chains (home-related)
Home-university
Home-regular shopping
Home-occasional purchases and out-of-home catering
Home-personal affairs (leisure, administrative proce-
dures, accompaniment)
Secondary chains (work or study Work/study-professional affairs
related) Work/study-other purposes
Secondary trip not related to Business - outbound / inbound (2 categories): only after
work or study a loop with work, business or study as destination
Other reasons - outbound / inbound (2 categories)
Trip generation follows the Stop-Repeat principle: for each purpose, a model give the probability
to make a trip chain or not. If the person travels, a second model give the probability to make
a second trip chain (Repeat) or not (Stop) for each purpose, and so on.
3.4 Combined mode and destination selection models

For each trip purpose detailed above, a combined mode choice and destination choice model has
been estimated, and allows to distribute the trip chains generated by each zones according to
24
Figure 3.3.1: ANTONIN 3 trip chain simplification
the different destinations (1805 zones) and modes (7) by comparing the utilities of the 1805 × 7
couples of possible destination × mode. These models are Nested Logit Models (cf. 2.1.3).
Depending on the purpose, two structures that provide better results has been chosen, namely
MD (Fig. 3.4.1) and 1MD (Fig. 3.4.2).
Figure 3.4.1: MD structure

Mode choice (1), destination choice (2)
Figure 3.4.2: MD1 structure

Type of mode choice (1), mode choice (2), destination choice (3)
3.5 Hourly distribution of trips

Since the model must allow analyses on a daily basis but also during the morning peak period
(dimensioning period). Therefore, displacements matrices are distributed for three periods of
the day: morning peak period, off-peak hours and evening peak period, thanks to coefficients
from the EGT.
25
3.6 Pivot procedure principle

In order to reproduce the non-modelled demand, futures matrices are distorted thanks to a
comparison between the models and an actual matrix for the base year (2010): this is the
pivot procedure [Daly et al., 2005]. For this purpose, a reference matrix of 128 zones has been
calculated thanks to OD surveys from all the railway lines and validation data (cf. 5.2.2). This
matrix is then compared to the synthetic matrix of the same year, in order to calculate the
difference between the model and the reality. These differences will be applied to the future
synthetic displacement matrices to correct them.
3.7 Assignment
Estimated trips for an origin-destination couple are distributed on the different routes thanks
to the generalised time travels calculated in the LOS section and their probabilities. Each of
the three user classes is assigned separately for each route in every OD. The sum of these
assignments gives the total traffic of each section of each public transport line.
26
Chapter 4
Model
This chapter aims to present the methodology that will be followed afterwards to model public
transport trips during the Olympics and the Paralympics. The first section presents two signifi-
cant concept for the OG modeling in general. Then, the next section described the existing OG
model used by IDFM and motivates the design of a new one, whose methodology is explained
in the two following sections. Both sections are based on the two concepts below-mentioned
and the process is described step by step, from input to output.
4.1 OG-demand and background demand

As explained in the Literature Review (cf. 2.2), all Olympic Games transport models consist
of two significant parts: the Olympic Games demand and the background demand. The
following suggested methodology does not deviate from this implicit rule.
The OG-demand gathers all the trips directly linked to the Olympic Games, that is to say
trips made by the different OG stakeholders, such as the spectators, volunteers, the Olympic
Family, etc. On the contrary, the background demand represents a typical demand during
the same period and might be defined as the demand for the summer 2024 in Paris without
the Olympic Games. This definition may of course not be 100% accurate. These definition
are clear, but may overlap. Indeed, during the summer, Paris already welcomes many tourists.
The year of the Olympics, some OG stakeholders will replace a part of tourist’s movements.
These movements (and therefore trips) might be counted in both demands: it is of a course an
OG demand, but also a background demand since these movements would have taken place in
all cases.
4.2 The existing OG model

4.2.1 Purpose
Since Paris’ application for the Olympic and Paralympic Games in 2015, IDFM has been
producing results on public transport flows. These results are based on a simplified model, but
were and are still useful to notify the OCOG when some venues cannot be properly served by
public transit, for example when the number of spectators allowed is too high, or the venue is
27
CHAPTER 4. MODEL
located far from existing lines. Understanding how this model works is important for the rest
of the process. Indeed, even if this model is very simple, its principle and the data used are
quite close to the new model suggested later. Like the other models related to the OG (see
2.2), there are two parts, the OG-demand and the background demand.
4.2.2 Description
OG-demand
For the OG-demand, inputs such as the schedule, venue’s capacity, arrival and exit profiles,
categories of stakeholder for each sport are required. The main simplification of this existing OG
model resides in the fact that all the spectators are considered to come and go from the center
of Paris. In summary, there is only one origin for each venue. In fact, on the outward journey,
stakeholders are supposed to come from the center of Paris, and on the return journey, they are
all supposed to come back to Paris. In addition, only the most significant public transport lines
around each venue are considered. If several lines serve the same venue, distribution coefficients
are used. These coefficients are still imprecise because in the best case, they are only based
on isochrone maps of the Île-de-France population calculated with ANTONIN, ie from each
of the 1805 zones of ANTONIN, what is the last line they will tend to choose. Of course,
the distribution of the population is not that of the OG stakeholders, resulting in unreliable
coefficients.
The main objective is indeed to check if the public transport network can actually serve each
venue without capacity problems, that is to say with a sufficient hourly capacity.
Figure 4.2.1: Existing OG model workflow

On the left part: OG-demand, on the right part: background demand
28
CHAPTER 4. MODEL
Background demand
Another simplification of the existing OG model is that for the background demand. Indeed,
only one section for each line chosen between the location of the venue and the center of Paris
is considered. This is the dimensioning section, i.e. the one for which the load (number of
passengers) is maximum on the line during peak hours. On this specific section, the most
recent available counting data (explained later, cf. 5.2.2) are used hour by hour.
Then, an evolution coefficient between now and 2024 on the same section, determined thanks to
ANTONIN, is applied. Since the OG happen during the summer, an reduction coefficient called
abatement coefficient between a normal period of the year and the summer has been determined
thanks to a comparison between validation data (cf. 5.2.2) at these two periods. There is only
one coefficient for the morning peak period, and one for the evening peak period and one for
the other hours of the day. These three coefficients are the same for all the lines of the public
transport network, and thus for all the sections considered, for the sake of simplicity.
Finally, the background demand is added to the OG-demand and compared to the capacity of
the line.
4.2.3 Strengths
The major strength of this model is its simplicity. It is very easy and quick to obtain new results
and hour-by-hour graphs, when any assumption is modified, for example a venue, a capacity,
or the schedule. Similarly it is very easy to understand how the model works, given the small
number of steps and their low complexity.
4.2.4 Weaknesses
• The model has been implemented on Microsoft Excel. Given the different conditions to
be taken into account, in particular because of arrival and departure profiles, and the
number of periods (twenty hours per day for every day of the OG), many calculations
are carried out. As simple as they are, this slows down the operation of the software
considerably, which makes its use more difficult and time-consuming.
• As the construction of the model is on Excel, this greatly limits its evolutions. Indeed, it
is complicated to add one more assumptions or to redefine the format of an hypothesis.
• Then, the extreme simplicity of the model has its limits. Selecting only one section per
line prevents to detect problems elsewhere on the line, or on the network. For example,
large stations in the city center (like République or Châtelet) are not directly impacted by
the venues, but will likely be a crossing point for many trips generated by the Olympics.
This shows the importance of modeling the entire network, or at least the most important
lines (railway network).
The weaknesses listed above demonstrate the need for a new model, presented in the following
sections.
29
CHAPTER 4. MODEL
4.3 General presentation

Until now, the modelling carried out by the existing OG model has considered each venue
independently, and a single origin for all stakeholders and each destination. The idea, with
a new improved model, is not to define something completely different, but to obtain more
detailed results for the entire public transport network this time as well as to design a more
agile tool (easy to update, to improve, to handle by others) to manage easily the increasing
number of modelings and assumptions.
As this has been done for previous Olympiads and the existing OG model (cf. 2.2), two different
parts structure this model: the OG-demand and the background demand, that require
very distinct approaches.
Below (Figure 4.3.1) is presented a general outline of the improved model, the steps of which
will be explained in the following sections.
Figure 4.3.1: Detailed model workflow
4.4 Hypothesis
Here are presented some general and significant hypotheses. These assumptions that have
been made from the beginning and structure the modeling but also the construction of the
underlying tool. Thus, they should be changed infrequently, as it may require the review of
many processes.
• The time period. It seems logical to consider two long periods: one over the Olympics,
and one over the Paralympics. It is important to take into account the few days before,
during which the first Olympics stakeholders will arrive in Île-de-France, and a few day
30
CHAPTER 4. MODEL
after, for the same reason. But the definition of this period might change according to
future information.
• Part of day to consider. In ”classic” projects in Île-de-France, the most interesting

period is often the morning peak period, to size the infrastructure, because it is the period
during which there is the most traffic over a given period of time. In the OG-case, all the
periods of the day must be studied. Indeed, even off-peak hours are interesting because
the transport supply is lower, but in the meanwhile there can also be a high proportion
of travel related to the Olympics that could saturate the network (late-evening event(s)
for instance).
• The time step. Surely the most crucial hypothesis. Indeed, the accuracy of the forecasts
is based on this parameter. But given that forecasts are made during 20 hours out of
the 24 hours of the day, over at least the entire period covering the Olympics and then
the Paralympics, a too short time step significantly increases the calculation time. For
the moment, therefore, one hour has been chosen as the basis, but it is possible that it
will have to be reduced to half an hour in the future. That is already a lot of periods
(20 × Days) and therefore a lot of data to consider.
• The transport network. Looking at the size of the transportation network, one may
question the relevance of conducting this study on all lines. Despite this, it seems inter-
esting to focus on the rail network (subway, commuters trains and trams) to serve the
sites, rather than the bus network. However, the bus network may still be useful to take
into account the precise origin of the stakeholders, because the rail network might not be
sufficient to ”attract” potential users in some areas.
4.5 OG demand
4.5.1 Inputs
The suggested inputs are quite similar to the ones used in the existing OG model. More details
are given in the Data and Tools chapter.
4.5.2 Step 1: Trip destinations

NB: In this part, for reasons of simplicity, the destinations are the OG venues whereas the
origins are accommodations, although there are of course trips in both directions.
As for a classic four-step model (cf. 2.1.1), the first two steps consist in generating trips
and distributing them. To achieve these parts, precise assumptions about the origins and
destinations are required. One difficulty lies in the fact that several time-of-the-day periods
are studied, and not a unique period which could be the morning peak period in other models.
Here, some of the trips take longer than the period under consideration. It cannot therefore be
assumed that all the trips over the period take place only during this period. We will develop
some solutions to this problem in the next sections.
Thanks to the multiple input data related to the OG (events, venues, stakeholders, cf. 5), we can
calculate the number of expected arrivals and departures per hour for each of the sites. Indeed,
31
CHAPTER 4. MODEL
Figure 4.5.1: OG-demand part workflow. Zoom of Figure 4.3.1
knowing the capacity of the venue, its schedule and a profile of arrivals and departures per
session, it is quite easy to obtain values for each site, each hour and each type of stakeholders.
For this, it is suggested to write Python scripts and to include them in the CUBE model (cf.
Set up and tools used).
4.5.3 Step 2: Trip origins and distribution

Since we need Origin-Destination matrices, one crucial step is the generation of links be-
tween origins and destinations. In fact, it has to be done in two parts, first determining the
origins and then looking at the distribution.
Trip origins
Indeed, the trips origins are not known precisely. Let us first distinguish two types of origins:
the initial origin, and then the final origin. A large proportion of stakeholders will not go
directly to the venue from their home, since they are living to far (many will be foreigners).
Thus, they must first arrive in Île-de-France, where they will stay. From this new starting
point, called the final origin, they will reach the venue.
The initial origin is important because for those who will arrive from a remote location, it
is quite easy to identify the access points in Île-de-France. These are in fact highways, train
32
CHAPTER 4. MODEL
stations and airports. From there, we need an assumption about the proportion of trips that
will be made on the Île-de-France public transport network, from these access points to the
accommodations of the stakeholders (the location of the accommodations will be evoked below).
These trips related to the Olympic Games, but not directly to the venues, will be considered
in a separate module. The difficult aspect to determine is that these trips could be assimilated
to the background demand of ”classic tourists”.
The final origin is the origin of the trip on the day the stakeholder goes to a venue. Four
different categories might be considered.
• The person is a resident of the Île-de-France region, and come from his/her home
• The person is staying in an paid accommodation (hotel or seasonal rental) located in
Île-de-France
• The person is staying at an inhabitant’s house of the Île-de-France
• The person goes back and forth during the day (by car, train, airport) and does not live
in Île-de-France
Depending on the type of stakeholders (spectators, volunteers, Olympic Family, the media),
some of the previous cases are not possible. For example, all members of the Olympic Family
will be accommodated in hotels. Assumptions need to be made to distribute the stakeholders
in these four categories.
As the locations of the paid accommodations are known, as well as the distribution of the Île-
de-France population and the access points for the last category, it is then possible to distribute
the OG stakeholders on all the Île-de-France region.
Trip distribution
Thus, the trip distribution must be considered. As a first step and for the sake of simplicity, it
was decided that the distribution was homothetic, that is to say that each location in Île-de-
France has the same probability to attracted by a destination, regardless of the characteristics
of the trip between the two places. Of course, we know that this will not be the case during the
OG. For example, it can be assumed that residents of the Île-de-France will tend to go to the
venue closest to their home. This behavior can be model with a gravity model ((2.1.2), with a
time parameter. Thus, depending on the type of stakeholder and the type of accommodation,
different distribution are envisaged: homothetic, gravitational, or perhaps other more complex
ones. That is of course a lot of cases to handle to obtain OD matrices for each type of stakeholder
and each hour.
These matrices use the zoning defined in ANTONIN, that is to say 1805 zones (cf. 3.2.2), with
possibly additional zones that represent the venues, in order to represent them as faithfully as
possible, with for example walking times from the surrounding public transport stations.
4.5.4 Step 3: Assignment

Now, the OD matrices have been defined, and the trips of every OD couple need to be assigned
on the public transport network. As explained above (cf. 3) since the model ANTONIN is
made especially for the Île-de-France region, it seems obvious that we should try to reuse it.
33
CHAPTER 4. MODEL
Of course, we could have used or thought of a less complex model, but it has the advantages of
already being in place, calibrated, used, with a zoning and a team that knows how to make it
work. Thus, in this part, it is suggested to reuse the assignment part of the ANTONIN model.
Only this part will be reused, and not the entire model. Indeed, a eventual step such as the
mode distribution is not relevant since all the stakeholders considered are supposed to travel
exclusively by public transport and thus not on foot, by car, etc. But it is necessary to
adapt ANTONIN assignment process for this specific task, for several reasons:
• The whole day is studied, and not only the morning peak period. There is not the
same transport offer at 8 am, 2 pm or 10 pm. Therefore, the public transport supply
during the whole day must be included and coded.
• There are concerns about computation times, which are difficult to estimate. Indeed,
for a classic assignment done with ANTONIN, it takes about 20 minutes. Here, 20 as-
signments have to be done per day for about 30 days, for a total of 600 assignments.
But at most we have several hundreds of thousands of trips per day, and surely less than
100.000 per hour, against usually several million per day.
As a first step, it is suggested that only the shortest path between two zones will be
considered, to simplify. But during the same day, for two fixed zones (origin and des-
tination), this path might change according to the supply. This assumption needs to
be precisely controlled: This can lead many people to use the same route, when there
may be another route that lasts an extra minute that is finally forgotten. Thus, some
lines or stations may be over-used, and on the contrary, the use of others may be totally
under-estimated. This may be subject to change. But for the moment it is difficult to
evaluate the execution time of this model, without ever having tested it.
• How to take into account trips that straddle two periods? For example, somebody
which is supposed to arrive on site between 9 am and 10 am, will actually leave the place
of departure at 8:40 and arrive at 9:30. Thus, this person was on the public transport
network before 9 am. It is then not possible to consider that someone who arrives between
H and H+1, also leaves between H and H+1 in all cases.
It is therefore necessary to envisage a law of distribution on the hour, constant in the
first instance (the same number of persons arrive every minute). Then, one idea is to cut
the trip in two parts, for each period concerned (8:40 - 8:59 and 9:00 - 9:30), while trying
to determine where is the passenger at 9:00, which is not necessarily obvious.
Another question is about the definition of the public transport network limits. Some
important projects, significant for the general operation of the network, are expected to be ready
for 2024 (but may not be), such as the extension of metro 14 to the South (Orly Airport), to
the North (Saint-Denis Pleyel), the extension of RER E to the West, etc. Although they
evolve very frequently, bus lines must belong to the scope, as they often provide access to the
rail network from the accommodations. Some changes will happen near the venues. Due to
security perimeters, several bus lines will be cut, as well as tram lines. Areas covered by security
perimeters depends on short-term decision taken by the French authorities and may evolve. We
can therefore already take into account the fact that the representation of the public transport
network as well as the road network will evolve over time.
34
CHAPTER 4. MODEL
4.5.5 Outputs
In output, the objective is to obtain a loaded network, hour by hour, day by day, i.e. the load
(number of passengers) for each section (interval between two stations), and the number of
people getting on and off at each stop. These information are centralised on .dbf files, but
can also be viewed easily on a map on the software used for the modeling part, CUBE, which
makes it easier to analyse the results. But this part is only focused on the OG-related trips, the
residual flows whose determination constitutes a second part, are required before any attempt
to make a complete analysis.
4.6 Background demand
Figure 4.6.1: Background demand part workflow. Zoom of Figure 4.3.1
To define the method to model the background demand, it was first necessary to think about
the tools and the best data to use for this part. The objective, exactly like the OG demand part,
is to rebuild the loaded Île-de-France public transport network (ie the number of passengers
on each section of each line) but during the Olympic and Paralympic Games, that is to say
approximately for every hour and day between mid-July and mid-September 2024.
In this suggested methodology, there are two significant point of focus, namely the evolution
until 2024, and then between a normal period and the summer. The normal period corresponds
35
CHAPTER 4. MODEL
the period of the year when travel is considered ”normal”, compared to the entire year. This
is based on typical weekdays called Jours Ouvrables de Base (Basic Working Days) (JOB)
(more details in the Data section, 5.2.2). Indeed, the summer is not a period very often studied
because the demand is lower, and has no influence for dimensioning the network. It is therefore
also a privileged period for maintenance work. These two points previously mentioned generate
the three parts that are detailed in the following sections.
The scope of this part in terms of network studied should be defined. In this part, it was
decided to focus only on the railway network, on the one hand because they are the structuring
lines of the network, and on the other hand because we have more information about them.
Indeed, the bus network changes very often and it is difficult and unreliable to monitor each
line over time.
4.6.1 Step 1: Network loaded in current period

Network loaded in current period means getting exactly the results we want for the Olympic
period, but first for the current period, that is to say, the current state of the network per hour.
For this, we propose to use the counting data (details in 5.2.2).
In this specific situation, the load (number of passengers) is required on each section during
several periods of one hour. The basic data provided by counting data is the load on a specific
train, which is not that interesting in this case, because the interest is not directly on the
trains, but rather on the stations and sections over a given period. Indeed, it is suggested to
write Python and SQL scripts which take the count files as input and output data by hour and
section. The difficulty is that the input files do not always have the same formats (different
columns names or time formats) even if they are very similar.
One other problem is the fact that, on some lines (commuter trains lines like the Transilien),
depending on their missions, trains may have different routes. Some trains may or may not
stop at certain stations. If they do not stop, they pass without stopping. In this case, it is
considered that these trains could stop, and, computer-wise, this is represented by a stop with
no getting on and off the train. But how do we know if a train passes in front of a station ?
The data we have only give us information when a train stops somewhere. Therefore, we also
need to build a simple IT graph of the line, in order to reconstruct the train’s route between
two consecutive stops, and thus check if there are any intermediate stations in between. In
addition, since ANTONIN will be used in the next step, it also important to link these sections
to the section names in ANTONIN. Very simply, a section is a segment defined by both ends
which are two stations with a unique ID.
Another problem is the completeness of the data. Some counts are missing during the week-
end, as well as Trafic Journalier du Réseau Ferré (Daily Railway Traffic) (TJRF) at the end of
the evening and during the week-end. For the majority of projects, this is not a problem, since
it is well known that the load is less on weekends. But in the case of the Olympics, we need
to forecast the background demand for every hour, every day. After discussion with RATP,
the actual subway operators, it appear that some lines have weighing data, which are used to
measure the load. These data are therefore collected automatically all the time, and thus also
during the week-ends. Therefore, there are ways to obtain additional data to fill the gaps.
36
CHAPTER 4. MODEL
4.6.2 Step 2: Evolution by 2024

The goal of this step is to determine evolution coefficient between the actual period and 2024
for each section of the network in the scope of this part. But the actual period is different
for each line ; indeed, the most recent counting data have been produced between 2016 and
2019.
The idea is to use the ANTONIN model, first to model the actual situation for every year
between 2016 and 2019, and then to model the situation by the year 2024. The results give
directly what interests us, the load on each section for each line of the network. Then, evolution
coefficients can be calculated between 2024 and the year of the count for every line in the
scope.
The question of which network to choose in 2024 is the same as for the second step of the OG-
demand part. But in this case, the trips needs to be generated in the model, by using social and
economic data (the ones called P+E), for which I have already explained the situation.
Other method not selected
Another method suggested but not chosen at this stage is mainly based on the ANTONIN
model. It consists of a classical modeling at the actual horizon and the horizon 2024, with a
calibration step. Indeed, as explained in the Literature Review, ANTONIN is based on data
from 2010. Each use of the model requires a modeling at the actual horizon (2021), followed
by a calibration step to correct defects and changes from 2010. But most of the simulations
are focused on a specific zone in the Île-de-France region, due to a specific transport project.
The calibration step is thus carried out only on this area, and is already tedious because it is
necessary to multiply the tests and iterate until a convincing result is obtained. In this case,
the calibration must be done on the whole region, because all the railway lines of the public
transport network are taken into account, and these lines serve almost the entire region. It
is therefore a very long and complex work, although it might be done in a second stage, if
necessary.
On the contrary, this method has a great advantage for the TDM part. Indeed, for each section
studied, we have information on the origin of each travellers. This makes it easier to target the
areas that need to be emphasized and thus to make tests when modeling the effects of TDM
measures.
4.6.3 Step 3: Evolution by summer 2024

As explained earlier, summer is not a typical period for the modellers, because this period of the
year does not represent the normal period at all. Indeed, we do not have many data available
during this period. In addition, a lot of maintenance and restructuring work are done during
summers, which makes it impossible to have a correct idea of the use of the network during the
summer. The only available consistent data is the validating data which are obtained thanks
to validators (cf. 5.2.2).
We therefore know that there are fewer validations during the summer than during the normal
period, when the counts are carried out, a percentage drop could be calculated. However, this
decline is not uniform and depends on the lines and sectors under consideration. The scale
37
CHAPTER 4. MODEL
of this work must therefore be at least at the level of the line, or even half a line for some
specific line, for instance RER lines. The idea here is to compare validation data between a
normal period and the summer, by comparing daily volumes, but also the hourly distribution
in percentage during the day. Thus, these coefficients can be applied to the results obtained in
Step 2. Indeed, there are potentially thousands of coefficients to be determined, for each hour,
each day, each line, or even each section.
A last bias remains to be evoked. The data are measured at the station. Some stations are
crossed by several lines at the same time, and we do not know which line a traveller is going to
when he/she validates, because he/she enters into the station. Similarly, a drop in ridership at
a station does not necessarily mean the same drop in load on the sections to which that station
belongs, since this also depends on the other stations upstream.
Of course, this method presents a lot of bias, but it seems difficult to do better with the actual
available input data. Collaborations are planned (and have already started) with operators
such as SNCF and RATP in order to improve, compare and consolidate the results, as they
are the ones who produce this data. They also own a transport model, for which it might be
interesting to compare the results (Step 2).
4.6.4 Outputs
The same output format is required for both OG-demand part and background demand part.
Then, this two files are added together and constitute the final result which must be ana-
lyzed.
4.6.5 Travel Demand Management

The objective of this module is to implement TDM measures and to evaluate their effects on
the network usage.
In our case, we only have data on each section (between two stations) of the network, without
information on the origin or destination of the travelers. Apart from applying reduction coeffi-
cients, it seems complicated to test measures that are more ”precise” or that concern a certain
category of the population (teleworkable employment, for example). A priori, ANTONIN mod-
eling in Step 2 (4.6.2, transition between now and 2024) could provide slightly more accurate
information on this. Above all, the method presented in the following section can lead to better
results.
But, it is necessary to recognize that this is not the main concern, the important thing is to
model the actual state of the network.
4.7 General outputs

As explained earlier, the results and the outputs have several use. These results are useful at
different scale.
• At the scale of the network, to detect stations or sections not directly impacted by the
Olympic Games events but which will be congested due to the surplus of travellers due
to the Olympics.
38
CHAPTER 4. MODEL
• At the scale of a site or a group a site, to know the distribution of the stakeholders
between the different lines and stations available around the sites.
• At the scale of a station, to compare the use of the station during the OG with what
is usually done and the capacity of the station, because some infrastructures (stairs,
platforms, corridors) might not accommodate as many passengers as expected even if
there was sufficient supply.
It is also envisaged to generate maps semi-automatically (certainly with Python and QGIS1 ),
as it will not be possible to do this manually over the hundreds of hours to be studied, in order
to improve analyses. Since it is very likely that many scenarios will be tested, it is important
to produce summary files, which summarise all assumptions and results. Alert files are also
envisaged since it is difficult to monitor the entire network.
1
free and open-source GIS software
39
Chapter 5
Data and Tools
5.1 Data required

5.1.1 OG-demand
As can be seen from the literature review, and moreover the previous section, there is a lot of
data to be considered.
First, it is very important to understand the detailed functioning of the Olympic Games,
although this seems easy to understand in practice. Basically, for each venue, a certain amount
of spectators is expected. To ensure that the event takes place in good conditions, volunteers,
staff and service providers intervene on the different venues. Then, to ensure the coverage of
the events, many media are expected on the competition sites. And finally athletes, and the
Olympic Family are also on site.
All these categories of stakeholders who come to the venue, have different travel habits, in terms
of:
• Modal distribution. The vast majority of spectators will use public transport exclu-
sively, whereas for instance, athletes will use a dedicated service between the Olympic
Village and the venue. In the case of this model, this is mainly the percentage of public
transport use which is the most necessary.
• Venue Arrival and Departure Time. Most of the trips will happen or be generated
because an event is about to take place or has just finished. For these trips, it seems for
example logical that a large part of the workforce has to arrive before the first spectators.
Other trips have a fixed timetable: this usually concerns travel to non-competition sites
such a the Olympic Village, the Media Village, etc. It is important to determine when
people will arrive and leave to know when the trips will happen and thus when they will
be on the network. And, of course, all the spectators will not arrive at the same minute.
In addition, depending on the location of the venue and the schedule of the session, these
profiles might be different. For instance, if the session ends late in the evening, spectators
will be more likely to leave the venue as soon as possible.
In addition, the volume of each categories and for each site is required: it is one of the most
important parameters to know if the public transport network is going to be congested or
40
CHAPTER 5. DATA AND TOOLS
not.
Moving a bit away from the venues, other types of data are necessary. First, it is necessary to
know the first origin of the stakeholders around the world and the mode of transport they will
use to come to France. Then, once these people have joined the Ile-de-France region, they go
to their accommodations (hotel, renting accommodations like Airbnb, staying with friends,...),
from where they will depart for the Olympic venues. There is also the case of day trips for the
spectators, that do not require accommodation in the area. These trips are mostly done by
train and result in arrivals from the main train stations in Paris, or by cars. As the surroundings
of the sites will be closed to cars over a large area, it is envisaged that a park and ride system
will be set up close to major highways and public transport stations.
5.1.2 Background demand

As explained in the model section, this part requires several data about the current network,
for different period: the summer and more classical periods. The difficulty is that this data is
needed for the entire railway network.
5.2 Available data and collection

5.2.1 OG-demand
For the OG-demand part, most of data is supposed to come or at least to be validated by the
OCOG, such as the schedule, the number of spectators allowed for each event, and arrival-
departures profiles. Other assumptions are also required: the origin of the stakeholders for
their entire stay in Paris, as well their origin in the interior of the Ile-de-France for the day they
go to an Olympic event. In addition, there are not only the competition venues for which data
is necessary: non-competition sites and celebrations sites are other places which generate a lot
of travels too. Similarly, the same data are required for the Paralympic Games, even these are
currently even less numerous than the Olympic ones.
All these information are useful for the Trip distribution part. As small database has been
created to store these data (Figure 5.2.1). The risk is that a new information or assumptions
may force a change of this database format and thus the scripts that uses this database to
generate the trips. This database only consists in a few .csv files, in order to be easily readable
and modifiable.
Here are some examples of the input data format used:
• Sport. For each sport, parameters such as the venue, the capacity of the venue are
precised.
• Schedule. Presented as a huge table for which one indicates with a 1 if there is a session
for this day and this hour, and a 0 if not.
• Arrival Departure Profile. The table contains all the conditions for each profile
proposed, namely the type of stakeholder, the start time, the end time, the category of
the venue. This list is likely to become more complex, until perhaps obtaining a profile
for each site (Figure 5.2.2).
41
Figure 5.2.1: Database schema
Figure 5.2.2: Example of a random arrival departure profile

For example, 20% of the people arrives between 3 and 2 hours before the official beginning of
the session.
42
• PT Proportion. The table contains the public transport modal part for each site,
depending on the type of stakeholder and the accessibility of the venue (easy or not).
Thus, this sort of database is opened with Python and scripts allows to generate trips to and
from the venues.
Several difficulties make the data collection more complicated than other projects. Indeed,
the concept of the Paris 2024 Olympic Games is in constant evolution. For example, the final
lists of venues was validated in December 2020. This greatly influences the modeling part.
On different topic, such as the number of voluntary on a venue X or even the hour-by-hour
schedule for each sport, the OCOG has still no precise idea. A schedule per session will only
be validated in 2022. But it is important for IDFM to have working hypotheses in order to
warn before decisions are ratified. On the IDFM side, we must then make proposals so that the
organizers tell us if it seems coherent to them. Sometimes, these information are quite vague
or not detailed enough. These data which will potentially be subject to variation, change in
format, or be refined later. These future changes must be anticipated as much as possible when
designing the model. It is therefore not easy to start from a common base of relatively stable
hypotheses, but it is still necessary to start the first modelling exercises, since the process is
iterative. It should of course be pointed out that the covid-19 epidemic does not facilitate this
crucial step.
Other data may be collected directly from IDFM or its partners, i.e. the location of Ile-de-
France accommodation facilities such as hotels, renting flats, or simply ”classic” housing, which
will be use for the Trip Distribution model.

To supply the background demand part, transport data collected by IDFM or the operators
are very useful, including in particular validation and counting data.
Counting data
Counting data is produced by the operators on each line of the railway network, approximately
every 4 years. There is a difference between the data for the metro and the other lines.
For the commuter train lines and tram lines, specific days are chosen. On these specific
days, for each train, the number of people boarding and getting off at each station is counted
during the whole day. These days are mainly Tuesdays and Thursdays in October, November,
March or April, because the network is at its busiest during these days: they are called JOB.
Other counts are also performed on Saturdays and Sundays. These data are available as .csv
or .xlsx files and one line of the file represents a stop made by a train, its direction, its mission,
the number of people getting on and off, the load, etc.
For the subway lines, origin-destination surveys are used. Questions are asked to travellers
while they are waiting on platforms. The results are interpolated in steps of half an hour and on
the number of passengers (they cannot all be questioned), to obtain the status of the network
at almost every time of the day. The database is called TJRF and is available as a Microsoft
Access file which contains a relational database. These data are not complete: they are not
available during the week-end and at the end of the evening (after 10pm).
43
Figure 5.2.3: Extract from the counting data file of the RER A, 2018
Figure 5.2.4: Extract after treatment of the counting data file of the RER A, occupancy of
each section per hour
Validation Data
Validation data1 [STIF, 2010] comes from the terminals installed on the public transport
network at the input and sometimes at the output, where as counting data come from manual
counts carried out over one or more days on a specific line. This data is therefore easily accessible
but requires some additional processing, as we will see later. An online private platform,
Cognos, presents aggregated data on a quarter-hourly basis, for each station of the network,
with the possibility of filtering on validation days (deletion of vacations, public holidays, etc).
This platform is more than sufficient, but it is important to know that there is another one,
called Système d’Information des Données de Validation (Validation Data Information System)
(SIDV), which provides the quasi-raw data. This one requires special authorizations, because it
is possible to follow all the paths of an identifier (although anonymized). Data from the SIDV
are aggregated to be more easily accessible on Cognos. A priori, the degree of precision of the
SIDV does not seem necessary in our case.
The idea with validation data is to compare validation data between a normal period of 2019
and the summer of 2019. In order to reduce statistical bias, one suggestion is to use the data
of three consecutive summers, 2017, 2018 and 2019. But it is also difficult to compare the
years between them because of maintenance work. In addition, in Île-de-France, platforms of
some stations are accessible without the obligation to validate, which also distorts the data,
because as the years go by, these stations are ”closed” and it becomes mandatory to validate
to access the platform. A statistical analysis was suggested to determine the period of 2019 or
the days in 2019 most representative of the average. Similarly, another statistical analysis can
be conducted to compare summer days with each other and try to find similarities. And this,
in order to reduce the number of calculations.
5.2.3 Common data

Finally, other data are required as an input to the ANTONIN model, as seen in the Literature
review:
1
General presentation available on the IDFM website, in French
44
• Societal data called P+E which provides detailed information by ANTONIN zoning
zones. These data are produced by Institut Paris-Région (IPR)2 each year for the whole
region, with forecasts up to 2025 and 2035 generally. This dataset provide information
for each zone on the population (age and gender) and jobs, study places in the area.
• Transport offer Lines available now and in the summer of 2024, frequencies, travel
times, etc. This data are commonly used and updated by the MEP department, and
simple tools allow to generate the .lin that contains the whole transport offer, read by
CUBE. There is one .lin for each mode of transport, which contains all the missions of
each line, its frequency, its stations, the travel time between each stations. Each station
is coded by its unique ID, which can also be found on the network file.
• A network representation This is a .net file, internal to the MEP department and
called carwaxs. This file contains all the geographic information of the model, namely:
– the centroids of the 1805 and more ANTONIN zones
– the location of every metro, train, tramway, bus stations with their accessibility in
terms of car parking and access penalty
– the road network nodes
– road segments
– public transport segments between stations
– segments to connect the centroids to the road and the public transport network.
This file might be easily visualized and modified in CUBE.
Figure 5.2.5: Extract from the metro.lin file and details for the Metro 1, Vincennes (Vin) - La
Défense (Def)
2
Paris-Region Institute, conducts studies, surveys and research for the purpose of development and urban
planning in the Ile-de-France region.
45
Figure 5.2.6: Screenshot of the visualization of carwax.net

Metro stations in red, and commuter train stations in blue
A question arises as to which data to used for 2024, namely which P+E dataset to use. As
a first approximation, we can consider, given the few months that separate the Olympics and
year 2025, that the dataset produced by the IPR for January 1, 2025 might be used. But the
most recent available dataset dates from 2019, and has been forecasted before the Covid-19
pandemic and is therefore likely to be overvalued. But actually, no forecast taking into account
these recent events is available and, above all, no one is able to describe the impact that this
will have on the transportation field in the long term.
5.3 Consideration of future available data

As seen in the Literature Review, the available data is evolving and will continue to evolve,
whether for political, economic, or even societal reasons. In addition, Paris 2024 project is
making progress as the organizers go along, and the concepts for each venue are becoming
clearer.
One of the key points of this model is the ability to quickly take into account new assumptions
or updates as soon as they are published, in order to be able to produce new analyses and
notify the OCOG so that they can review their proposal in case of a major problem. A close
long-term work is necessary to convince them of the usefulness of the IDFM model and so that
they agree they share with IDFM their most recent information, even if these are only first
working hypotheses.
46
5.4 Set up and tools used

Since the ANTONIN is based on the software CUBE, it was almost mandatory to use this
software for the Olympic Games modeling, to reuse some parts of ANTONIN in this new
model. To sum up, CUBE allows to link boxes together. A box represents one step of the
model. A box can itself be decomposed into other boxes or be a primary box, which contains
a ”minor” step. For each box, input and output files are indicated. In addition, CUBE allows
an efficient management of different scenarios.
The OG model is a very special compared to the other model. A lot of loops must be used, as
well as Python scripts that must be included, which is not often easy. The stage of creation
of the different boxes is very important, because it is necessary to anticipate the fact that the
model may become more complex, so that intermediate or additional boxes will be added. In
some cases, it is indeed very complicated to make changes afterwards.
Figure 5.4.1: Example of the inside of a CUBE box
In addition to CUBE, the main software used are Eclipse3 to write and test the Python scripts,
and Excel for input and output data.
To summarize, like ANTONIN, the OG model will be based on the software CUBE, including
also different Python scripts, because of the singularity of the model, which requires treatments
impossible to realize directly with the native tools of CUBE, for instance the trip destination
and distributions steps, or the counting data processing.
3
Integrated Development Environment (IDE)
47
Chapter 6
Analysis
The IT development phase of the model is still in progress and thus is unfortunately not
finished at the end of my Master Thesis. Therefore, no results can be released at the moment,
but this does not preclude an analysis of the suggested methodology. As explained earlier, this
methodology is still constantly evolving, but its main structure remains of course the same. We
can therefore wonder about the strengths and weaknesses of this new model.
6.1 Strengths
Compared to the existing OG model (cf. 4.2), several strengths and improvements can be
noticed. The following description will go from the general to the more detailed.
From a modelling point of view, this model is in respect with what is currently being done
in terms of transportation modeling. Indeed, the new OG-model is strongly based on the
ANTONIN model, developed for over 20 years on strong theoretical principles, and specially
dedicated to the particularities of the Île-de-France region, which indeed guarantees solid results.
Thus, the ANTONIN model gains importance. Although ANTONIN has also been used to some
extend for the background demand in the existing OG model and in a not very precise way,
now this model will be used for both parts to its fullest potential.
In general, the four-step model structure is better respected: Especially, the Trip Gen-
eration step of the OG part is much stronger, because there is a genuine interest in the origin
of participants. As a reminder, in the existing OG model, participants were longer considered
to come from central Paris only. There were no details about their distribution in Paris itself,
which can be problematic since there are also events in the center of Paris.
In terms of mega-event modeling, the model is not innovative since it is based on a well
known principle for the previous Olympics, namely the distinction between the OG related
trips and the background demand, which structures the model. But this ensures also a certain
consistency since this method has already been proven.
From an IT perspective, the model is very flexible, which was one of its initial objectives.
The CUBE software but mainly Python allow much greater flexibility than Excel. With Excel,
when an assumption changes, if you want to make a step more complex, then you have to
review all the formulas. With Python, it is sufficient to add a function or a few lines of code.
48
CHAPTER 6. ANALYSIS
As explained earlier, a CUBE workflow is represented by a sequence of ”boxes”, numbered in

the order of execution, each with a specific role: this can be considered as a set of nested boxes.
It is therefore sufficient to add one or more boxes inside a parent box. But in some cases,
this is not possible because of some lack of theoretical reflection during the design phase. If
we can get an idea of the steps that will surely require improvements, we can foresee boxes or
functions that seem to be superfluous at the beginning, but that will be enriched later on. It is
therefore easier to add precision or an extra step with CUBE, which will be frequent until the
Olympics.
In addition, CUBE allows to manage easily different scenarios, which was more difficult with
Excel, where adding scenarios made the file heavy. In CUBE, a workflow is Visually, CUBE
provides a better understanding of the entire process, much better than the multitude of Excel
tabs. Then, the entire railway network is now being studied, and not only the lines directly
concerned by the Olympic venues. CUBE has the advantage of offering a fast visualization
device (based on a geographical representation of the network), which will be useful for the
outputs.
6.2 Limits and weaknesses

On the other hand, this model still has some limits which are important to keep in mind:
• For the OG demand:
– The Trip Distribution step is very dependent on the origin of the spectators, which
is still very unclear. The choice of the method for this step is discussed later (cf.
7.1.1). Thus, the Affectation step is also affected, as well as the lines used. It will
be difficult to improve these hypotheses until the first information on ticket sales
(2023), that will provide significant information about the origin of spectators.
• For the background demand:
– The whole process is based on a single counting data for each line in the scope.
Of course, the days on which counts are made are not randomly selected, but it
is difficult to know how representative these days are. In addition, only a short
operational incident can partially skew the numbers. But a count is made every
4 years for each line, which limits the sample and comparisons, since the network
evolves in the meantime.
– In addition, in the rest of the process, validation data from previous summers are
used, but are also very fragile. Unfortunately in the short term it will be difficult to
consolidate them since the current data do not represent the classic situation.
• In general:
– The model is based on ANTONIN for its two parts. The ANTONIN model is a very
sophisticated model with several parameters. Normally it is specially calibrated
to operate in the Ile-de-France region, but there is therefore the risk of black box
operation. In our case, ANTONIN is going to be used in a special mode, almost
never used, i.e. assignment to the day and not to the morning peak period as usual.
49
CHAPTER 6. ANALYSIS
– Managing the different versions of the model is likely to be a bit complex, especially
with CUBE only. CUBE representation is complex since there are loops, and it may
become quite difficult to test a new piece of code.
– Currently, it is difficult to know which parameters will have the most influence on
the results. Of course, it seems logical that venues capacities has a great influence
because this conditions the number of trips, but what about the stakeholders dis-
tribution in Île-de-France or the one-hour shift of a sport session ? We do not know
anything about it, this is why sensitivity tests should be carried out.
50
Chapter 7
Discussions
The development of this OG modeling methodology leads to some interesting issues or topics
specific to this project, that are interesting to discuss.
7.1 Further work

As explained earlier, this model is still in the development phase and will certainly remain so
until 2024, due to the originality of the project to be modeled, which is constantly evolving
and whose data is validated as it goes along. Input assumptions will change until the end
because the information available will become more and more precise, once the OCOG will
have definitely validated certain points such as the schedule, the volume of each category of
stakeholder for each site, and once the first ticketing data will be published (2023). The pitfall
is that the more information we have, the closer we get to the date of the event, and therefore
the less impact IDFM have.
So it is important to produce a model that works and then build on it over the months and
years. Following sections present some significant improvements identified.
7.1.1 OG-demand
Input data
On the OG-demand side, the most significant improvements are expected at the level of the
input data. These inputs are based on assumptions made and given by the OCOG. These are
still assumptions because the organisers are not yet at this level of accuracy, and they cannot
give us precise values or data. In some cases, it is up to IDFM to suggest values that seem
coherent with what they know and then to propose them to the OCOG and have their opinion.
In addition, it is very difficult to forecast the level of details the OCOG will be able to provide
in the future. This may require a large amount of work depending on the changes to be made.
Indeed, each improvement will lead to a better representation of the situation at the Olympic
Games, this is why it is important to take them into account.
51
CHAPTER 7. DISCUSSIONS
Trip Generation
Significant improvements are also expected on the Trip generation step of the model, specially
about the origin of stakeholders. Indeed, the attraction part, i.e. all that concerns the venues
and the sport sessions seems to be under control and depends on the OCOG. But on the
other side, as long as ticketing data are not available (and they will not until 2023), limited
information about the final origin of the stakeholders (i.e. from the Île-de-France, the day
of the event) is available. So, one step consists in allocating people and another consists in
giving distribution laws to determine who goes where. This distribution is not neutral, as it
will influence the assignment on the public transport network, which is the next step after this
one.
As explained in the model methodology (cf. 4.5.3), a significant simplification in the distribution
trip has been chosen, i.e. the choice of an homothetical distribution. Although it is certain
that the choice of a homothetical distribution guarantees short calculation times, compared to
a gravity (cf. 2.1.2) or a logit model (cf. 2.1.3), it is still quite difficult to determine to what
extent this choice has a major impact on the overall performances and results of the proposed
model.
Indeed, on the contrary of classic demand model, we are in a situation where stakeholders,
and in particular spectators, can eventually choose their origin (i.e. their accommodation)
depending on the venue for which they have tickets. This must be qualified: most spectators
have on average several tickets for different events, not necessarily taking place at the same
place. So it would be logical to try to stay in the middle (i.e. in Paris itself) rather than
near one of the venues. Moreover, this is only true for spectators coming from abroad or from
distant regions. Local spectators will come from their homes, whose location is not related
to the events they want to attend. But it can also be considered that some people will want
tickets to the venue closest to their home, regardless of the sport offered. Little information is
available for this topic.
For all these reasons, it is difficult to choose the most suitable method for this step. Thus, it
seems reasonable to begin with the most simplified method, and then to make it more complex
when information is available, or if it becomes necessary to refine the results. Again, it seems
complicated to have more details before the ticketing information is available, but the choice of
the distribution could depend on the type of stakeholders as well as their first origin (foreigners
or locals).
Transport
We can also expect future changes on the transport side. Some future projects that are
supposed to be finished before the Olympic Games may see their opening date delayed, which
can strongly impact some venues. For example, in the previous models, the future subway lines
16 and 17 were taken into account to serve the venues located in the North of Paris such as
the Le Bourget. Finally, these lines are unlikely to be put into in service before the OG. It is
the same with the extension of line 14, particularly to the north, which would be very useful in
serving the Stade de France (with a capacity slightly lower than just under 80,000 spectators),
the Olympic Aquatic Center and the Olympic Village.
But it also concerns the infrastructures dedicated to pedestrians. Indeed, while it may be easy
52
to access the nearest public transport station to the site, it is not necessarily easy to walk to the
site entrance, as the infrastructures are not designed to accommodate such large flows. Some
projects are underway, but depending on their completion and the state of progress in which
they will be in 2024, this may call into question some accesses that had been identified.
These changes must be taken into account in the numerical representation of both the trans-
portation network and the road network, and thus strongly influence the new modelings.

The background demand may be the part which will be significantly improved until the Olympic
Games. Indeed, IDFM short-term objective is to produce first results for the summer 2021.
In reality, as explained earlier, it is difficult to get information on classic network use during
the summer, specially at the scale of the lines i.e. by trying to have more precise information
than those available on the entire network. In addition, it seems that the Olympic years induce
changes (more or less foreign or local tourists), although these are difficult to determine well in
advance.
Cooperation with operators
Discussions with the operators such as the RATP and the SNCF have resulted in various
solutions to improve the methodology presented, more or less time consuming. For example,
some methodologies or data on some lines in particular will be collected to assess the impact
of future works. In addition, some lines also have automatic counters, which therefore operate
in summer.
Another interest is the eventuality of being able to compare the results provided by ANTONIN
3 with those of the RATP (called Global) and the SNCF (called ARES). This verifies the
consistency of the forecasts for the transition from the current period to 2024, particularly for
the new lines (such as M14) and the lines used extensively during the Olympic Games.
But before someone have done the work for the first time, it is difficult to evaluate which of
the suggested solutions are the most interesting and can provide more precise results in an
acceptable time frame. Indeed, some suggestions seem to have a great potential, but it would
be a shame spend a month to implement them, for a very poor added value in the end.
Transport
As for the OG-demand, future changes on the transport network will impact modeling to the
horizon 2024. In addition, all the process is based on ”field” data (validation and counting
data). Normally, these data will be updated over time, with the data of the following years.
But, because of the actual situation, no counting has been carried out in 2020, and the validation
data are not usable as well. This still will be the case in 2021. Therefore, we must work with
data prior to 2019 for now, in the hope that it can be updated in 2022 or 2023.
Better knowledge of the public network’s use during the summer seems necessary, and studies
can certainly be conducted in 2022 and 2023, assuming that we return to a post-covid situation
and that public transport attendance goes back to the way it was before.
53
7.1.3 General
The impact of the covid-19 pandemic has been poorly considered. The more time passes, the
more data and ideas about covid-induced behavioral changes will become available. However,
two types of changes that might have an impact on the use of public transportation can already
be mentioned:
1. On a individual scale. Behavioral changes may occur, such as an increase of remote
working, which, on a long-term does not necessarily mean less travel, but possibly a
smoothing of peak hours for example.
2. On a global scale. A decrease in the number of jobs is expected as well as fewer real
estate projects, less travel and a population that is not growing as fast as expected earlier,
etc.
The first point illustrates specially detectable changes during the day, that can be observed
thank to surveys, counting data and validation data.
On the other side, the second point illustrates more general changes, which has impact on travel
behavior, in terms of global daily volume for example. All this will be reflected in future data
sets such as population and employment data, (the so-called P+E), which constitute an input
of the model.
But for now, it is difficult to take these changes into account. Scientific literature is lacking
since this is a very recent phenomenon. Main issues about transport modeling considering
covid-19 are summarized in a post by Willumsen on Linkedin1 . We can therefore expect to
have to adapt the OG model to take all this into account, possibly in 2022 or 2023.
7.2 On the importance of data

Through the description of the model but also the available data, it is clear that the limiting
factor is the data. Input data clearly greatly influences the results, maybe more than the
selected models which constitute the whole process.
On the one hand, data is actually missing on both parts of the model; as explained earlier, all
the required data to model the OG concept are not known and fixed yet. On the demand side,
the completeness of the data is not guaranteed during the summer. But, and perhaps most
importantly, on the other hand, future available data is not known as well. This makes our
model and its IT tool more complex to design, because the future data formats are not defined.
Empty cells cannot just be left, by thinking that they will be filled in later. It may turn out
that the new formats are more or less accurate than expected, which means that scripts have
to be rebuilt. Hence the importance of data for this project, and the need to think upstream
about what is needed or will be needed. Otherwise, the tool behind the model will have to be
redone several dozen times as soon as new data becomes available, which is a waste of time to
avoid.
In addition, this lack of data forces to test different scenarios, in order to test different values
for some parameters.
1
Has covid killed our transport models ? by Luis Willumsen, January 4, 2021, available on https://www.
linkedin.com/pulse/has-covid-killed-our-transport-models-luis-pilo-willumsen
54
Thus, with the improvements of the data, the results presented will be more and more precise.
But it depends on the data collection, which does not necessarily depend on IDFM, especially
about the OG.
7.3 From designing a model to project management

Model trips for such a event is finally not only a matter of good IT development. Indeed,
it is very important to work in collaboration with all the stakeholders around the Olympic
Games, such as the OCOG on one side, and around the Paris public transportation sector i.e.
the operators and the IDFM entities, other than MEP. Producing the best methodology is
necessary but in the meanwhile, this model needs to be useful. For this purpose, input and
outputs are two major points that are often neglected. It is not about producing results for the
sake of producing results.
In our case, the inputs come from the OCOG for all that concerns the OG, and from the
operators for the transport part (traffic data). And in output, who will use the results of
this model ? First, the OCOG that will use them to continue to improve their preparation
of the Olympics, on scheduling but also on accessing the venues once participants have left
the transit system (called Local Transport Demand). Second, transport operators (notably
SNCF and RATP) also intervene, because they need these data to design their transportation
plans during the Games. This is why the output format of the OG model must be intelligible
to everyone, and this is an important objective to keep in mind for the design phase of the
model.
Finally, this model cannot be designed alone. Since other stakeholders are involved in the input
and output, it is the responsibility of IDFM to convince them of the usefulness and the relevance
of this model, to ensure effective collaboration and acceptance of results. This requires de facto
project management with all the stakeholders.
7.4 Possible reuses

This model has been designed for the Olympic Games, as well as its future IT tool. But it
might be reused for any other mega-event or even event which take place in the Île-de-France
region, for which IDFM would like to estimate the impact on the public transportation network.
These events are numerous around Paris, such as concerts, sports events, major trade show, etc.
Indeed, it is enough to know some basic information such as the location of the venue(s), the
schedule, the expected volumes. Then, the same methodology might be applied. The simpler
the event (single venue, single session), the simpler the process will be.
Of course, all these events mentioned are less complex than the OG, by their duration, the
multiplicity of sites and stakeholders. This model could therefore be a real white elephant, far
too complicated, but constitutes an very complete starting point.
55
7.5 General conclusion

This Master Thesis presents a methodology (cf. Chapter 4) to model the public transit travel
flows during the Paris 2024 Olympic and Paralympic Games. As for the former Olympic
Games, this model consists of two distinct parts: the background demand (cf. 4.6) and the
OG-demand (cf. 4.5). The OG-demand is based on a four-step principle with up-to-date
data from the OCOG, whereas the background demand rely on traffic data. Both parts use the
IDFM transport model, ANTONIN 3 (cf. 3).
One of the difficulties encountered is the fact that the concept of this mega-event is still in
the planning stage. In the meanwhile, it is normal to design and build the model, in order to
be able to notify and warn the OCOG if their proposals are not feasible from a transit service
perspective. The interest also lies in the fact that this model will be useful to create a Transport
Plan for the OG to be presented to the IOC, in collaboration with transport operators.
The complexity of such an mega-event makes it necessary to work in collaboration with several
stakeholders, such as the OCOG to be constantly aware of updates and make sure we properly
understand how the Olympic Games work, and the operators which have a detailed knowl-
edge of the network and can determine what can and cannot be done to adapt the transport
supply.
Several improvements (cf. 7.1) can still be achieved until the Olympics but depends on the
available data delivered by the OCOG on one hand and collected on the public transport
network on the other hand.
Finally, this methodology might be useful and reused by others for the next Olympic Games,
although it is based on the specific data available for the Paris and its region (such as counting
data, validation data and the ANTONIN 3 model). For IDFM, this model can be reused for
other major events happening in the Île-de-France region with a few adjustments, because these
events will always be less complex than the Games.
7.6 Future Work

Now that the methodology has been defined, the design phase of the model is in progress
and consists in a lot of IT development. It should be kept in mind that this methodology will
evolve as the input data will change. But the short term objective is to complete this design
phase and produce first results by this summer. Indeed, the model could be improved until
2024 and the Olympics without ever producing results, because it is possible to go into such
details. It is therefore better to have results with biases we are aware of (since the input data
are also biased), than nothing at all.
Then, we imagine that these results will have to be updated according to new information and
assumptions. These initial results will give indications of what steps of the model have to be
improved and which are sufficiently reliable, and allow some first feedback to the OCOG so
that they can revise or not their concept.
56
Bibliography
[Agence de l’environnement et de la maîtrise de l’énergie (ADEME), 2018] Agence de

l’environnement et de la maîtrise de l’énergie (ADEME) (2018). Mobilités et trans-
port : chiffres-clés (Mobility and transport: key figures).
[Alter, 1976] Alter, C. H. (1976). Evaluation of Public Transit Services: The Level-of-Service
Concept. Transportation Research Record, pages 37–40.
[Anderson, 2010] Anderson, J. E. (2010). The Gravity Model. Working Paper 16576, National
Bureau of Economic Research.
[Ben-Akiva, 1973] Ben-Akiva, M. E. (1973). Structure of passenger travel demand models. PhD
thesis, Massachusetts Institute of Technology. Department of Civil Engineering.
[Cole, 2005] Cole, S. (2005). Applied Transport Economics: Policy, Management and Decision
Making, 3rd edition. Kogan Page.
[Currie and Shalaby, 2012] Currie, G. and Shalaby, A. (2012). Synthesis of Transport Planning
Approaches for the World’s Largest Events. Transport Reviews, 32(1):113–136.
[Daly, 1987] Daly, A. (1987). Estimating “tree” logit models. Transportation Research Part B:
Methodological, 21(4):251–267.
[Daly et al., 2005] Daly, A., Fox, J., and Tuinenga, J. (2005). Pivot-Point Procedures in Prac-
tical Travel Demand Forecasting.
[Daly and Zachary, 1978] Daly, A. and Zachary, S. (1978). Improved multiple choice models.
Determinants of Travel Choice, pages 335–357.
[Debrincat and Meret-Conti, 2016] Debrincat, L. and Meret-Conti, A.-E. (2016). Modelling
the Growing Use of Public Transport in Île-de-France: How ANTONIN 3 Addresses the
Challenge. Transportation Research Procedia, 14:283–292.
[Donsunmu, 2012] Donsunmu, B. (2012). Delivering London 2012: Transport demand fore-
casting. Transport, 165:257–266.
[DRIEA and STIF, 2014] DRIEA and STIF (2014). EGT 2010 : Plaquette générale.
http://www.omnil.fr/IMG/pdf/egt2010_enquete_globale_transports_-_2010.pdf.
[DRIEA and STIF, 2017] DRIEA and STIF (2017). EGT 2010 : Résultats détaillés.
http://www.omnil.fr/IMG/pdf/-4.pdf.
[Erlander and Stewart, 1990] Erlander, S. and Stewart, N. F. (1990). The Gravity Model in
Transportation Analysis: Theory and Extensions. CRC Press.
57
BIBLIOGRAPHY
[Fishburn, 1982] Fishburn, P. C. (1982). The Foundations of Expected Utility, chapter 1 -

Introduction, pages 1–8. Springer.
[Flyvbjerg et al., 2016] Flyvbjerg, B., Stewart, A., and Budzier, A. (2016). The oxford olympics
study 2016: Cost and cost overrun at the games. Saïd Business School Working Papers.
[Frantzeskakis and Frantzeskakis, 2006] Frantzeskakis, J. M. and Frantzeskakis, M. J. (2006).
Athens 2004 Olympic Games: Transportation Planning, Simulation and Traffic Management.
ITE Journal, 76:26–32.
[Gunn et al., 1998] Gunn, H., Tuinenga, J., Allouche, J., and Debrincat, L. (1998). ANTONIN:
a forecasting model for travel demand in the Île-de-France. In European Transport Conference
1998. Association for European Transport.
[International Energy Agency, 2020] International Energy Agency (2020). Tracking Transport
2020.
[Jiang, 2008] Jiang, Y. (2008). Analysis on Beijing Subway Flows during the 29th Olympics.
Journal of Transportation Systems Engineering and Information Technology, 8:45–51.
[Joshi et al., 2009] Joshi, C., Darwent, C., and Giese, K. (2009). Transportation Modeling for
the 2010 Winter Olympic Games. 2009 Annual Conference of the Transportation Association
of Canada Vancouver, British Columbia.
[Karlaftis et al., 2004] Karlaftis, M. G., Kepaptsoglou, K., Stathopoulos, A., and Starra, A.
(2004). Planning Public Transport Networks for the 2004 Summer Olympics with Decision
Support Systems. Transportation Research Record Journal of the Transportation Research
Board, 1887:71–82.
[Kassens-Noor, 2013] Kassens-Noor, E. (2013). Managing Transport during the Olympic
Games. In Managing the Olympics, chapter 8, pages 127–146. Palgrave Macmillan.
[Liu et al., 2008] Liu, X., Guo, J., and Sun, Z. (2008). Traffic Operation with Comments during
Beijing Olympic Games. Journal of Transportation Systems Engineering and Information
Technology, 8:16–24.
[McFadden, 1974] McFadden, D. (1974). Conditional logit analysis of qualitative choice be-
havior. In Frontiers in Econometrics, pages 105–142. Academic Press, New York.
[McFadden, 1977] McFadden, D. (1977). Modelling the Choice of Residential Location. Cowles
Foundation Discussion Papers 477, Cowles Foundation for Research in Economics, Yale Uni-
versity.
[McNally, 2007] McNally, M. G. (2007). The Four-Step Model. In Handbook of Transport
Modelling (2nd Edition), chapter 3, pages 35–53. Elsevier Science Ltd.
[Müller, 2015] Müller, M. (2015). What makes an event a mega-event? Definitions and sizes.
Leisure Studies, 34(6):627–642.
[Ortúzar and Willumsen, 2011] Ortúzar, J. and Willumsen, L. G. (2011). Modelling Transport,
4th Edition. Wiley.
[Ravenstein, 1889] Ravenstein, E. G. (1889). The Laws of Migration. Journal of the Royal
Statistical Society, 52(2):241–305.
58
BIBLIOGRAPHY
[Robbins et al., 2007] Robbins, D., Dickinson, J., and Calver, S. (2007). Planning Transport
for Special Events: A Conceptual Framework and Future Agenda for Research. International
Journal of Tourism Research, 9:303–314.
[STIF, 2010] STIF (2010). Télébilletique : La validation au service de la décision. Syndicat
des Transports d’Île-de-France (former IDFM’s name).
[Train, 2009] Train, K. (2009). Discrete Choice Methods With Simulation. Cambridge Univer-
sity Press.
[Tuinenga et al., 2015] Tuinenga, J. G., Meret-Conti, A.-E., and Pauget, N. (2015). ANTONIN
3, an up-to-date transportation planning model for the Île-de-France region. In European
Transport Conference 2015. Association for European Transport.
[Tuinenga et al., 2006] Tuinenga, J. G., Pieters, M., and Debrincat, L. (2006). ANTONIN:
Updating and Comparing a Transport Model for the Paris Region. In European Transport
Conference 2006, pages 99–121. Association for European Transport.
[van Wee et al., 2013] van Wee, B., Annema, J. A., and Banister, D. (2013). The Transport
System and Transport Policy. Edward Elgar Publishing.
[WCED, 1987] WCED (1987). Our Common Future. Oxford University Press.
[Williams, 1977] Williams, H. C. W. L. (1977). On the Formation of Travel Demand Models
and Economic Evaluation Measures of User Benefit. Environment and Planning A: Economy
and Space, 9(3):285–344.
[Yan et al., 2010] Yan, L. C., Yang, S. S., and Fu, G. J. (2010). Travel Demand Model for
Beijing 2008 Olympic Games. Journal of Transportation Engineering, 136:537–544.
59
TRITA ABE-MBT-21438
www.kth.se

Sustainable Travel During The Olympic and Paralympic Games: A Methodology To Model Public Transport Travel For Paris 2024

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sustainable Travel During The Olympic and Paralympic Games: A Methodology To Model Public Transport Travel For Paris 2024

Uploaded by

Copyright:

Available Formats

DEGREE PROJECT IN TECHNOLOGY

SECOND CYCLE, 30 CREDITS

Sustainable travel during the

May 11th, 2021

KTH ROYAL INSTITUTE OF TECHNOLOGY

Place for Project

Detta examensarbete utvecklar utmaningarna med resemodellering under de Olympiska Spelen

ANTONIN Analyse des Transports et de l’Organisation des Nouvelles Infrastructures

2.1.1 Four-step transport model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1.1 ANTONIN 3 workflow, adapted from [Tuinenga et al., 2015] . . . . . . . . . . . 20

4.2.1 Existing OG model workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2.1 Database schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2.1 LOS parameters for each mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1 Introduction and Context 1

3 ANTONIN, the IDFM model 19

3.5 Hourly distribution of trips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5 Data and Tools 40

7.2 On the importance of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Introduction and Context

1.1 Degree Project

1.2 Sustainability and Transport

1.3 Olympic Games

Figure 1.3.1: The Olympic flag

1.4 Paris 2024

Figure 1.4.1: Paris 2024 Logo

Figure 1.4.2: Paris 2024 Olympic Venues

1.5 Public transportation in Île-de-France

Figure 1.5.1: Location of the Île-de-France region (in red) in France

The public transport network in Île-de-France is structured as follows:

– Réseau Express Régional (Regional Express Network) (RER). The RER

– Non-RER lines. Eight of these lines (called H, J, K, L, N, P, R, U) do not cross

1.6 The company: Île-de-France Mobilités (IDFM)

Figure 1.6.1: Île-de-France Mobilités Logo

1.7 The role of IDFM for the Olympics

1.8 Modelling and Projects Evaluation (MEP) Department

1.9 Collaborations within the project

2.1 Predicting Demand

which are not taken into account by models for travel

2.1.1 Multi-step models

Figure 2.1.1: Four-step transport model

2.1.2 Gravity models

Figure 2.1.2: Example of a tree representation

• Exponential functions (f : x → e−βx , β > 0)

• Power functions (g : x → x−α , α > 0)

2.1.3 Discrete choice modelling

Multinomial Logit Model

Mathematically, MNL can be represented as follows:

Un (xn ) = Vn (xn ) + ϵn (xn ) (2.2)

The most common specification of the deterministic utility Vn is as a linear combination of

Vn (xn ) = θ0 + θ1 xn,1 + θ2 xn,2 + ... + θm xn,m (2.3)

Then, the probability of choosing alternative i rather than alternative j is:

P (Ui > Uj ) = P (Vi − Vj > ϵj − ϵi = ϵ) (2.4)

If we assume that ϵ is Gumbel-distributed1 [McFadden, 1974], the probability of choosing

Nested Logit Model

This rise to the following probability for alternative i ∈ Bm :

Expression (2.7) is diﬀicult to understand by itself. Another approach consists in a decom-

1. W , a constant for all alternatives within a nest

2. Y , that varies over alternatives within a nest

For j ∈ Bm , utility is written as:

Unj = Wnm + Ynj + ϵnj (2.8)

• PnBm the marginal probability of choosing an alternative within nest Bm is chosen

• Pni|Bm the conditional probability of choosing alternative i given that an alternative in

eWnm +λm Inm ∑