Professional Documents
Culture Documents
Guanjie Zheng† , Yuanhao Xiong‡ , Xinshi Zang§ , Jie Feng ¶ , Hua Wei†
Huichu Zhang§ , Yong Li ¶ , Kai Xu⊺ , Zhenhui Li†
† The
Pennsylvania State University, ‡ Zhejiang Univerisity, § Shanghai Jiao Tong Univerisity,
¶ Tsinghua Univerisity, ⊺ Shanghai Tianrang Intelligent Technology Co., Ltd
† {gjz5038, hzw77, jessieli}@ist.psu.edu, ‡ xiongyh@zju.edu.cn, § zang-xs@foxmail.com, ¶ feng-j16@mails.tsinghua.edu.cn,
§ zhc@apex.sjtu.edu.cn, ¶ liyong07@tsinghua.edu.cn, ⊺ kai.xu@tianrang-inc.com
3 8 5
Conflict Matrix for Movement Signal Competing Matrix for Phase
2
1 2 3 4 5 6 7 8 A B C D E F G H
West 6 East
1 A
1 4 7
2 B
3 C
Left-turn movement on 4 D
East entering approach 5 A B E
6 C D F
7 E F G
8 G H H
A
B
7 C
D
4 E
F
G
H
1x1
8
1x1 conv
1 conv sum
Phase A
Output
Input
5 EXPERIMENT
Embedding(1) 4, Embedding(2) 4 5.1 Experiment Settings
Phase demand Concat(4, 4) 8
modeling
Following the tradition of the traffic signal control study [38], we
Embedding(8) 16
conduct experiments in a simulation platform CityFlow3 , which
Add(16, 16) 16
is an open-source traffic simulator designed for large-scale traffic
Phase pair Concat(16, 16) 8 7 4 8 7 signal control [42]. The simulator takes traffic data as input, and
representation Conv-20 11 a vehicle proceeds to its destination according to the simulation
Phase competition mask
settings. The simulator can provide observations of the traffic situ-
Phase Relation Map:2 8 7
ation to signal control methods and execute signal control actions.
Embedding(2 8 7) 4 8 7 Similar to real world, a three-second yellow signal and a two-second
Conv-20 11 all-red signal are set after each green signal.
In a traffic dataset, each vehicle is described as (o, t, d), where o
Multiply(20 8 7, 20 8 7) 20 8 7
Phase pair Conv-20 11
is origin location, t is time, and d is destination location. Locations
competition Conv-111
o and d are both locations on the road network. Traffic data is taken
Sum(8 7) 8 1 as input for the simulator.
In a multi-intersection network setting, we use the real road
Output network to define the network in the simulator. For a single inter-
section, unless otherwise specified, the road network is set to be a
Figure 5: Network detail of FRAP.
four-way intersection, with four 300-meter long road segments.
The code will be accessed on the authors’ website.
5.2 Datasets
We use two private real-world datasets from Jinan and Hangzhou
in China and one public dataset from Atlanta in the United States.
Jinan. We collect data from our collaborators in Jinan from surveil-
Figure 6: Real-world 3-, 4-, and 5-approach intersections.
lance cameras near intersections. There are in total 7 intersections
with relatively complete camera records for single intersection con-
‘0’ (red). As shown in Figure 4, our network first obtains a repre- trol. Each record in this dataset contains the camera location, the
sentation for a phase from features of two traffic movements with time when one vehicle arrived at the intersection, and the vehicle
22 × n2 possible combinations. Then two phases are paired together information. These records are recovered from the camera record-
to compete and the model of phase competition is shared among ings by advanced computer techniques. We feed the vehicles to the
all pairs. In this way, to regress Q values for all eight actions, the intersections at their recorded arrival time in our experiments.
model is required to observe only (22 × n2 ) × (22 × n 2 ) = 16 × n4 Hangzhou. This dataset is captured by surveillance cameras in
samples, a significant decrease in comparison with 64 × n 8 as in Hangzhou from 04/01/2018 to 04/30/2018. There are in total 6 in-
DRL [32] and IntelliLight [38]. In Section 5, we further conduct tersections with relatively complete camera records. These records
extensive experiments to illustrate that FRAP converges faster and are processed similarly as the Jinan data.
to better solutions, as the enhanced sample efficiency compensates Atlanta. This public dataset4 is collected by eight video cameras
for the explosion of state space. from an arterial segment on Peachtree Street in Atlanta, GA, on
November 8, 2006. This vehicle trajectory dataset provides the
Adaption to different environments. As we focus on the uni- precise location of each vehicle within the study area and five
versal principles of phase competition and invariance throughout intersections in total are taken into consideration.
our model design, the FRAP model can be applied to different traffic
conditions (i.e., small, medium and large traffic volumes), different 5.3 Methods for Comparison
traffic signal settings (e.g., 4-phase, 8-phase), and different road
To evaluate the effectiveness and efficiency of our model, we com-
structures (e.g., 3-, 4-, and 5-approach intersections as shown in
pare it with the following classic and state-of-the-art methods. We
Figure 6, and intersections with a variable number of lanes on each
tune the parameters of each method separately and report the best
traffic movement). Further, FRAP can learn from one environment
performance obtained.
and transfer to another one with high accuracy without any addi-
tional training. We further illustrate this in the experiments. • Fixedtime [23]: Fixed-time control adopts a pre-determined cy-
cle and phase time plan, which is widely used in the steady traffic
Applications in multi-intersection environments. Though our
flow. A grid search is conducted to find the best cycle.
discussion so far has been only focused on single intersections,
• SOTL [11]: Self-Organizing Traffic Light Control is an approach
FRAP makes fundamental contributions to city-wide traffic signal
which can adaptively regulate traffic lights based on a hand-tuned
control, as a good learning model at single intersection is the base
threshold on the number of waiting vehicles.
unit even in the scale of city-wide traffic signal control. In addition,
we demonstrate that FRAP works well in the multi-intersection 3 https://github.com/cityflow-project/CityFlow
Jinan Hangzhou
Model
1 2 3 4 5 6 7 1 2 3 4 5 6
Fixedtime 118.82 250.00 233.83 297.23 101.06 104.00 146.66 271.16 192.32 258.93 207.73 259.88 237.77
Formula 107.92 195.89 245.94 159.11 76.16 100.56 130.72 218.68 203.17 227.85 155.09 218.66 230.49
SOTL 97.80 149.29 172.99 64.67 76.53 92.14 109.35 179.90 134.92 172.33 119.70 188.40 171.77
DRL 98.90 235.78 182.31 73.79 66.40 76.88 119.22 146.50 118.90 218.41 80.13 120.88 147.80
IntelliLight 88.74 195.71 100.39 73.24 61.26 76.96 112.36 97.87 129.02 186.04 81.48 177.30 130.40
A2C 135.81 166.97 226.82 43.28 67.05 148.69 236.17 110.91 98.56 187.41 86.56 116.70 128.88
FRAP 66.40 88.40 84.32 33.83 54.43 61.72 72.31 80.24 79.43 110.33 67.87 92.90 88.28
Improvement 25.17% 40.79% 16.01% 47.69% 11.15% 19.72% 33.87% 18.01% 33.20% 35.98% 15.30% 23.15% 32.30%
Travel time
Atlanta 50
Model
1 2 3 4 5 10
Fixedtime 10 15 20 25 30
140.51 334.17 334.01 353.56 271.14
Time interval
Formula 116.16 148.71 163.93 157.08 254.00
SOTL 101.87 133.79 136.77 138.75 73.93 Figure 7: Effects of the time interval.
DRL 152.93 95.83 101.61 79.03 43.04
IntelliLight 76.25 74.10 83.12 65.94 47.51
5.5 Evaluation Metrics
A2C 147.01 105.16 87.42 70.99 49.23 Based on existing studies in traffic signal control, we choose a repre-
FRAP 67.45 61.48 65.26 60.75 41.39 sentative metric, travel time, for evaluation. This metric is defined
Improvement 11.54% 17.03% 21.49% 7.87% 3.83% as the average travel time that vehicles spend on approaching lanes
(in seconds), which is the most frequently used measure to judge
performance in the transportation field.
DRL
To train our models, some hyperparameters are tuned for best per- IntelliLight
300
formance. We adopt an Adam optimizer with learning rate 1e-3.
For each training round, 1000 transitions are sampled from mem- 200
ory and trained with batch size 20. Three actors run in parallel
to implement the Ape-X DQN framework. For the time interval 100
between two consecutive actions, we carry out a simple experiment
0 100 200 300
and find model performance is not sensitive to this parameter. As Training round
shown in Figure 7, travel time is not much affected by time interval.
Therefore, we just set it to be 10s as suggested in a recent work [14]. Figure 8: Convergence speed of RL methods.
5.7 Model Characteristics In this experiment, we choose both FRAP and IntelliLight models
5.7.1 Invariance to flipping & rotation. Besides achieving faster trained from the intersection 2 in Jinan with the largest vehicle
travel time and convergence speed, FRAP has another advantage volume, and evaluate their performances on a relatively light flow
in its invariance to flipping and rotation. In the real world, it is from the intersection 1. In this case, both models have explored
common that people drive to work in a specific movement in the similar states and converged to the best values they can achieve
morning and go home in the opposite direction in the afternoon. for intersection 2. From Figure 11, we can see that the transferred
Figure 9 shows an example of traffic flow flipping from intersection model of FRAP performs almost the same as the re-trained model,
4 in Jinan. It can be observed that the traffic volume of the west whereas there is a distinct gap between re-trained and transferred
approach is much larger than that of the east approach at around 8 models of IntelliLight. This suggests that the proposed FRAP model
am. At 5 pm, the relation is reversed. design increases sampling efficiency significantly and facilitates
adaptation to different traffic flows.
800 W E
FRAP-retrain
300
Volume
transfer transfer flexible than the 4-phase setting, that is, under the 8-phase setting,
vehicles can pass faster and more reasonably.
200 200
0.4
100 100 Vehicle
8-phase
0.3 4-phase
0 100 200 300 0 100 200 300
Ratio