You are on page 1of 30

Neuron structure

The output signal travels


through the axon and ends up to
the terminal

Dendrites are where the Axon hillock is where the


inputs come from outputs come out
Signal Transmission in neuron

It changes the
charge/voltage
gradient across the
membrane

The electronical effect


of the high voltage Ions
gradient affects the next
sodium pump so sodium Sodium flow
ions will flood in and the
The dendrite gets triggered: channel is opened. Ions are signal still gets Ions flow
released in/out of the cell. If they are released into the transmitted
cell, it is in an electrotonic fashion

If the combined effect


of the change of the
voltage gradient is just
enough at the axon
hillock (meet the
threshold)
(electronically affects
the sodium pump).
The sodium channel
will open up

How the dendrite gets triggered (axon to dendrite)


Calcium stock
that works like
potassium
The sodium ions sodium pumps
(positive
charge :+1)
trigger a calcium
channel that
allows calcium
ions (positive
charge :+2) to
the terminal end

Terminal end
These vesicles contain
neurotransmitters. Proteins link
the vesicles to the membrane

The synapse is the place where the


terminal and a dendrite meet

Calciums bond to the


proteins. Proteins makes the
vesicles to join the
membrane such as…
The neurotransmitters may cause some sodium channels to
open up on the membrane of the dendrite

It can trigger instead potassium channels which make potassium to go out and
the voltage charge so that it may not exceed the threshold of the axon
Deep Reinforcement learning

The way to make an AI to do what


we want it to do is to stimulate
some actions (realizes partially the
goal) by a rewarding system

When we give the inputs and The desired outputs are An agent learns to behave in an
the expected outputs to the not given. Just the environment by performing
neural networks to modify data(inputs). The neural actions and seeing the results
the weights of each variables networks learn on its own
to be able to predict
correctly the desired outputs
Neural network of the AI

Actual ressources

Acceleration of ress prod Acceleration of Attack


Ressource/ units
Available patches of military
mineral
Bulding Composition of
Time military

Enemy race Defense


The number of layers depends on
the order of building or the causality
relationship between each Time
perceptrons

Amounts of
units

How many
enemy units

Positionning of
Positionning
enemy units

Composition of
enemy units
Scouting
Amount of
enemy buildings

Composition of
enemy building
Perceptron « Actual ressources »

x 1(actual ressource) Output: a value


A= with
Max ressource between 0 and 1
Input: Data
corresponding
to the actual
ressources

Perceptron « Acceleration of ress prod »

Output: a value
x 2(ress ∏ ¿ mn)
B= with between 0 and 1
Max acceleration ∏ ¿ ¿
Input: Data
corresponding to
ress prod/min

Perceptron “Available patches of mineral”

x 3 (available patches) Output: a value


C= with between 0 and 1
total available patc h es at t h e beginning

Input: Data
corresponding to
actual available
patches
Perceptron « ressource/military building »

b is the bias that


C
determine the threshold
R=∑ I∗¿ w i ¿ = ( A∗w A + B∗wB +C∗wC )−b
i=A

If R> threshold --> build a facility ()

Evaluation function :
The evaluation function consists on evaluating the winrate calculated by the AI to the real
winrate.

The goal is to evaluate the variability that each factor has on the winrate and setup optimal
thresholds so that the winrate tends to be as high as possible.

Winrate(calculated) = A∗w A +B∗wB +C∗w C +…+ X∗w x


w A, w B,w C ,… are the
with w A + wB + wC +…+ w x =1
variability that each factor
I want to use backpropagation techniques to improve the weights: has on the winrate

Usage of Backpropagation algorithms. By


giving neural network training examples
(inputs that we know the desired outputs.
The neural network changes itself his
weights in function of the outputs).

We measure the difference between prediction of the neural


network with the desired outputs to measure how good our
network is. We use the cost function for that.

To minimize the weights, we calculate the gradient of C


Weight update Errors are used to calculate the
gradient of C

And optimize the weights

Errors propagated backwards

The weight is updated small amount each time the


rate of update is called learning rate.

The threshold update :


Just make an algorithm that makes b to vary in range (0, y) and see when does the winrate is the closest to 1

The API: the one I use: https://github.com/Dentosal/python-sc2 , others that exists:

https://github.com/BurnySc2/python-sc2

https://github.com/deepmind/pysc2 https://www.youtube.com/watch?v=St5lxIxYGkI

The game: https://starcraft2.com/fr-fr/game

How to install the required stuff to code:

https://pythonprogramming.net/starcraft-ii-ai-python-sc2-tutorial/

Ce qui est à faire :

Je n’ai pas montré le raisonnement pour tous les facteurs/perceptrons.

Les perceptrons surlignés en bleus requièrent plus de précision (quel bâtiment, où le construire, …)

La manière de calculer le winrate pourrait être améliorer (intégrer b peut être)

Je n’ai pas eu le temps de réfléchir à comment précisément sera l’algorithme pour ‘update the weights’

Ce que j’attends de vous

Quand j’aurais fini de faire le raisonnement pour tous les perceptrons et monter l’algorithme d’évaluation, je voudrais savoir
s’il est possible de créer un tel AI (suivant les raisonnement et algorithmes que j’aurais créés) et quels sont les
preuves/arguments que vous apportez pour dire si c’est possible ou pas.

You might also like