You are on page 1of 5

Comparative Study of PMSM Control Using

Reinforcement Learning and PID Control.


Adil Najem1 , Ahmed Moutabir2, Mohamed Rafik3, Abderrahmane Ouchatti4
124
G.E.I.T.I.I.L laboratory, Hassan 2nd University, Aïn chok Faculty of Sciences Casablanca, Morocco
3
ENSET Mohammedia, EEIS Laboratory, Hassan II University of Casablanca, Morocco
1
adilnajem@gmail.com ,2 a.moutabir65@gmail.com, 3rafik@enset-media.ac.ma, 4ouchatti_a@yahoo.fr

Abstract— The use of reinforcement learning for process control Unlike classical approaches, reinforcement learning does
does not require knowledge of its mathematical model. This paper not need to define and know the mathematical model of the
focuses on the control of a permanent magnet synchronous motor system to be controlled, it simply sends actions to the controlled
(PMSM) based on the Field Oriented Control strategy (FOC). The environment to maximize rewards.
objective is to compare the performances of the classical PID
control with that using reinforcement learning (RL). The RL
algorithm used is Double Delay Deterministic Policy Gradient There are several modern RL algorithms. We can mention:
(TD3). Deep Q learning (DQN), Deep Deterministic Policy Gradient
First, the general principle of vector control of a PMSM motor is (DDPG), Double Delay Deterministic Policy Gradient (TD3).
described. Then, the control using reinforcement learning is For our case, we will use the TD3 algorithm.
analyzed and compared to PID control. The performances to be
compared are accuracy, dynamic response and the ability to The paper is organized as follows: the PMSM model and the
control torque and speed. FOC principle are presented in Section II, Section III describes
Finally, the simulation models have been developed and tested in the reinforcement learning, Section IV contains the PID
the MATLAB / SIMULINK. Simulated results are displayed to
validate the effectiveness of the proposed strategies.
controller and Section V presents the methods and results of
numerical simulations of the PMSM control signal correction
based on the FOC strategy using the PID controller and the
Keywords— PMSM, FOC, reinforcement learning, TD3, reinforcement learning agent. Finally, Section VI presents the
Simulink. conclusion.
I. INTRODUCTION
Permanent magnet synchronous motors (PMSM) are widely II. PMSM MODELING AND FOC STRATEGY
used in electric drive systems. They have the advantages of DESCRIPTION
being simple in structure, efficient and easy to maintain.
Despite all these advantages and because of their non-linear
behavior due to motor and load dynamics, their control is A. PMSM Model
difficult [1-2].
The diagram in Fig.1 shows the different phases of PMSM
Many strategies have been developed to control and regulate modeling.
the parameters of the PMSM.

In addition to the conventional PI based control method,


artificial intelligence control methods have been applied. In this
context, the use of fuzzy logic [3], artificial neural network [4-
6] and neuro-fuzzy are mentioned [7].

The improvement of the static and dynamic performance of


the PMSM and the insensitivity to parameter uncertainties and Fig. 1. PMSM Model Diagram
disturbances is the objective.

The application of RL reinforcement learning to the control The model of the PMSM is defined by the following
of physical systems is a desired approach, where RL agents electrical (1) and mechanical (2) equations:
train themselves using the input and output signals of the
system to find a suitable control policy before implementation
in a real application.

979-8-3503-9836-6/23/$31.00 ©2023 IEEE


0 0 0 0
0 0 0 0

0 0
1
0 1

Where:
• is the stator resistor.
• and , and , and are respectively d-q axis
stator voltages, currents and inductances.
• , are respectively the rotor inertia and viscous friction
coefficient.
• is the electrical angular velocity of rotor.
• , , and are respectively the electromagnetic
torque, the load torque, the Flux induced by the
permanent magnets of the rotor in the stator phases and
the mechanic speed.

Fig. 2. FOC schematic diagram


is expressed by the following relationship:
3
"# ∗ ∗ % 3
2 III. REINFORCEMENT LEARNING DESCRIPTION
Where: and are the d-q axis flux.
A. Description
Reinforcement learning is a method for rewarding desired
We have: behaviour and/or punishing undesired behaviour.
& 4 This learning method has been adopted in the field of
5 artificial intelligence to guide unsupervised machine learning
using rewards and punishments [9-10].

The expression of becomes: Fig.3shows the general structure of reinforcement learning


3
"# % 6
2

B. Field Oriented Control Principle

Field Oriented Control (FOC), also known as vector control,


which aims to control and adjust the speed of a PMSM to track
a reference value. FOC provides robust control over all speed
ranges [8].

A schematic diagram of the vector control of a PMSM is


shown in Fig. 2:
Fig. 3. Block diagram for a RL structure
The agent has two input signals which are Observation and
Reward. It is the main actor in reinforcement learning, it makes
decisions and takes optimal actions under certain circumstances.
It interacts with the environment according to input and output
signals and has a policy that helps it to provide the best action
at an appropriate time.
The role of the learning algorithm is to update the parameters
associated with the policy which is based on maximizing the
cumulative reward.
The reward is the value that will guide the agent in its choice
and show it how good or bad the action sent was. It is therefore
defined as a measure of the success of the output signal Action.
The reward can be a mathematical equation combining
criteria, such as energy efficiency, error between the set point
and the actual value... During the learning process the algorithm
updates the policy so that the agent can maximize the rewards.
The observation characterizes the process by a set of
predefined signals. Observations are the signals measured by
sensors or cameras and which are visible to the agent.
The action is represented by the control quantities of the
controlled process.
The environment represents the real or virtual (simulated) Fig. 5. Reinforcement Learning TD3 Agent block
physical system and is external to the agent.
For the reward function:
B. Structure
• The error speed is defined by
*+ (7)
• The reward of speed:
10 0 . *+ . 0.1
-
,
1
(8)

• The reward of d-q axis currents:

0 0.2*10 (9)

2 0.2*10 (10)
• The reward of action:
3 0.1456,
0
(11)
Finally, the reward is defined by:
, 0 2 3 (12)

IV. PMSM PID CONTROLLER


We start with the use of a PID controller shown in Fig. 6.
The calculation of the coefficients requires the use of the
Fig. 4. MATLAB/SIMULINK Reinforcement Learning TD3
transfer function and the PMSM values given by the
manufacturer.
The algorithm is defined as follows: The Ziegler-Nichols method used in practice does not
require the transfer function. After selecting the setpoint, the
controller is set to proportional action, the other actions are
zero. By gradually increasing the proportional coefficient until
oscillation (pumping) is achieved, the controller coefficients are The simulation is in Matlab/Simulink 2021a to compare the
subsequently adjusted. two approaches RL and Ziegler-Nichols, with a load torque of
2N.m and a reference speed of 750rpm.
This type of adjustment is difficult for our system.

Fig. 7. PMSM speed and his reference variation


Figure 7 shows the velocity results of two approaches RL
and PID. Both speeds follow the same reference with zero
accuracy, but the transient of the two approaches is different,
Fig. 6. MATLAB/SIMULINK PID Controller Speed for the Ziegler-Nichols approach there is an overshoot, and a
response time (time settling) is 71ms, for the reinforcement
learning there is no overshoot, a response time (time settling) is
V. CONTROL DESIGN AND SIMULATIONS 86ms. The PID control that it gains in response time.
RL after learning, in case of disturbance, it corrects in real
There are three operational control objectives: time the system thanks to the agent and takes into account the
(i). Regulating the output speed to a desired value . non-linearity and the uncertainties of the engine elements. For
a non-linear system, the calculation of the PID coefficients is
(ii). Guarantee the static and dynamic performances of the for a given operating point.
PMSM against parameter uncertainties and
disturbances.
(iii). Ensure the robustness of the system.

The PMSM control system has been simulated by using


Matlab software [11] with the set of parameters values depicted
in Table I:
TABLE I. PMSM PARAMETERS

Fig. 8. q axis current variation


Parameters Values
Stator Resistor 0,58
Inductance 1,4. 1062 9
Inductance 2,8. 1062 9

0,02;<=0
Combined inertia PMSM-Load

0,014 >=?/AB
Combined viscous friction
PMSM-Load

0,2CD
Flux induced by the permanent Fig. 9. d axis current variation
magnets
After training, reinforcement learning has zero accuracy, the
Pole pairs number EF 4 dynamic overshoots of the Id and Iq currents have decreased by
54% compared to the classical PID method.
The reinforcement learning kept the machine decoupled
(Id=0).
demonstrate the superiority of the control system using
reinforcement learning.

REFERENCES

[1] B. Wu and M. Narimani «Control of Synchronous Motor Drives,


in High-Power Converters and AC Drives » , Wiley-IEEE Press,
2017, pp.353-391.
[2] S. Sakunthala, R. Kiranmayi, and P. N. Mandadi, « A Review on
Fig. 10. Electromagnetic torque variation
Speed Control of Permanent Magnet Synchronous Motor Drive
The PID controller has a poor dynamic response, therefore Using Different Control Techniques » International Conference
the electromagnetic torque instability increases. on Power,Energy, Control and Transmission Systems
(ICPECTS), Chennai, China , 2018, pp. 97-102.
The RL reduces the dynamic response by 74%, in effect [3] Y.Zhou and al « Fuzzy Control For Treadmill PMSM Speed
reducing the torque instability. System » IEEE Xplorer September ,14 2020. doi :
10.1109/ICARM49381.2020.9195339.
[4] M, Nicola and al « Sensorless Control of PMSM Using DTC Strategy
Based onMultiple ANN and Load Torque Observer» IEEE Xplorer
August ,14 2020. doi : 10.1109/SIELA49118.2020.9167120.
[5] M, Nicola and al « Sensorless Control of PMSM Using FOC Strategy
Based onMultiple ANN and Load Torque Observer» IEEE Xplorer
August ,14 2020. doi : 10.1109/DAS49615.2020.9108914.
[6] M, Hanke and al « Comparison of Artificial Neural Network and Least
Squares Prediction Models for Finite Control Set Model Predictive
Control of a PMSM » IEEE Xplorer August ,14 2020. doi :
10.1049/icp.2021.1122.
Fig. 11. Stator currents variations of PID controller [7] C, Elmas and al, « A neuro-fuzzy controller for speed control of
a permanent magnet synchronous motor drive » Expert Systems
with Applications Volume 34, Issue 1, January 2008, Pages 657-
664 https://doi.org/10.1016/j.eswa.2006.10.002
[8] H. Celik and T. Yigit, “Field-Oriented Control of the PMSM
with 2- DOF PI Controller Tuned by Using PSO,” International
Conference on Artificial Intelligence and Data Processing
(IDAP), Malatya, Turkey, 2018, pp. 1-4.
[9] Z. Song, J. Yang, X. Mei, T. Tao, and M. Xu, « Deep
reinforcement learning for permanent magnet synchronous motor
speed control systems » in Neural Computing and Applications,
vol. 33, September 2020, pp. 5409-5418.
Fig. 12. Stator currents variations of RL [10] T. Schindler, L. Foss, and A. Dietz, « Comparison of
At the start time, the stator currents in the reinforcement Reinforcement Learning Algorithms for Speed Ripple Reduction
learning approach are lower compared to the currents in the PID of Permanent Magnet Synchronous Motor » IKMT 2019 -
Innovative small Drives and Micro-Motor Systems; 12.
control case.
ETG/GMM-Symposium, Wuerzburg, Germany, 2019, pp. 1-6.
[11] Reinforcement Learning Toolbox™ User's Guide, Matlab and
Simulink, MathWorks, Natick, MA, USA, 2020.
VI. CONCLUSION [12] H. Celik and T. Yigit, “Field-Oriented Control of the PMSM with
2- DOF PI Controller Tuned by Using PSO,” International
This paper proposes a comparative study between two
Conference on Artificial Intelligence and Data Processing
PMSM control methods based on the vector control strategy. (IDAP), Malatya, Turkey, 2018, pp. 1-4.
These are the classical PID strategy and a new method based on
artificial intelligence algorithms. TD3 reinforcement learning
was used and simulated and compared with the PID results.
The performance of the PID was improved by using a
reinforcement learning technique. Thus, comparative results are
presented for the case where the reinforcement learning agent
is correctly trained and provides correction signals to be added
to the control signals. Numerical simulations are used to

You might also like