You are on page 1of 72

Artificial Neural Networks

CSE 590 Lecture 1

Historical Sketch
Pre-1940: von Hemholtz, Mach, Pavlov, etc.
General theories of learning, vision, conditioning No specific mathematical models of neuron operation

1940s: Hebb, McCulloch and Pitts


Mechanism for learning in biological neurons Neural-like networks can compute any arithmetic function

1950s: Rosenblatt, Widrow and Hoff


First practical networks and learning rules

1980s: Grossberg, Hopfield, Kohonen, Rumelhart, etc.


Important new developments cause a resurgence in the

field

Artificial Neural Network Systems are called:


neurocomputers neural networks parallel distributed processors PDP connectionists systems

CSE 590 Lecture 1

What is a Neural Network?


There is no universally accepted definition of an NN. But perhaps most people in the field would agree that

an NN is a network of many simple processors (units), each possibly having a small amount of local memory. The units are connected by communication channels (connections) which usually carry numeric (as opposed to symbolic) data, encoded by any of various means. The units operate only on their local data and on the inputs they receive via the connections.590 Lecture 1 CSE

Who is concerned with NNs? neural nets and about Computer scientists want to find out about the properties of non-symbolic information processing with
learning systems in general.

Statisticians use neural nets as flexible, nonlinear regression and classification models. Engineers of many kinds exploit the capabilities of neural networks in many areas, such as signal processing and automatic control. Cognitive scientists view neural networks as a possible apparatus to describe models of thinking and consciousness (Highlevel brain function).

Neuro-physiologists use neural networks to describe and explore medium-level brain function (e.g. memory, sensory system, motorics). Physicists use neural networks to model phenomena statistical mechanics and for a lot of other tasks. Biologists use Neural Networks to interpret nucleotide sequences.
CSE 590 Lecture 1 5

Human brain
The brain is a highly complex, non-linear, parallel information processing system. It performs tasks like pattern recognition, perception, motor control, many times faster than the fastest digital computers. It characterize by; Robust and fault tolerant Flexible can adjust to new environment by learning Can deal with fuzzy, probabilistic, noisy or inconsistent information Is highly parallel Is small, compact and requires little power.
CSE 590 Lecture 1 6

Inspiration for Artificial Intelligence


AI has been inspired by two fundamental questions: How does the human brain work? How can we exploit the brain metaphor to build intelligent machines?

Phenomenological Properties of the Human Brain


massive parallelism distributed fault tolerance graceful degradation endurance of memories fast retrieval and quick

representation and computation the ability to selforganize the ability to generalize based on existing knowledge associative memory recall

alternation between concepts low energy consumption and very high capacity

Other Factors that Promoted Interest


Neuroscience research

Exponential increase in desktop computing power

Powerful neural network algorithms and applications

Psychological research on human problem solving and decision making

Brain vs. Computer Processing


Processing Speed: Milliseconds VS Nanoseconds. Processing Order: Massively parallel.VS serially. Abundance and Complexity: 1011 and 1014 of neurons operate in parallel in the brain at any given moment, each with between 103 and 104 abutting connections per neuron. Knowledge Storage: Adaptable VS New information destroys old information. Fault Tolerance: Knowledge is retained through the redundant, distributed encoding information VS the corruption of a conventional computer's memory is irretrievevable and leads to failure as well.

Cesare Pianese

1. 2. 3. 4.

Biological inspiration Artificial neurons and neural networks Learning processes Learning with artificial neural networks

Animals are able to react adaptively to changes in their external and internal environment, and they use their nervous system to perform these behaviours. An appropriate model/simulation of the nervous system should be able to produce similar responses and behaviours in artificial systems. The nervous system is build by relatively simple units, the neurons, so copying their behavior and

The spikes travelling along the axon of the pre-synaptic neuron trigger the release of neurotransmitter substances at the synapse. The neurotransmitters cause excitation or inhibition in the dendrite of the post-synaptic neuron. The integration of the excitatory and inhibitory signals may produce spikes in the postsynaptic neuron. The contribution of the signals depends on the strength of the synaptic connection.

Neuron structure
Human brain consists of approximately 1011 elements called neurons. Communicate through a network of long fiber called axons. Each of these axons splits up into a series of smaller fiber, which communicate with other neurons via junctions called synapses that connect to small fibers called dendrites attached to the main body
CSE 590 Lecture 1

14

Biological Neuron

Biological Neuron

Neural Networks
A neural network (NN) is a machine learning

approach inspired by the way in which the brain performs a particular learning task. A NN is specified by: an architecture: a set of neurons and links connecting neurons. Each link has a weight, a neuron model: the information processing unit of the NN, a learning algorithm: used for training the NN by modifying the weights in order to model the particular learning task correctly on the training examples.

The aim is to obtain a NN that generalizes well, that is, that behaves correctly on new instances of the learning task.

Neuron structure
Basic computational unit is the Neuron
Dendrites (inputs, 1 to 104 per neuron) Soma (cell body) Axon (output)

-Synapses
-excitatory -inhibitory
CSE 590 Lecture 1 18

Interconnectedness
80,000 neurons per square mm 1015 connections Most axons extend less than 1 mm (local connections) Some cells in cerebral cortex may have 200,000 connections Total number of connections in the brain network is astronomicalgreater than the number of particles in known universe
CSE 590 Lecture 1 19

Synapse like a one-way valve. Electrical signal is generated by the neuron, passes down the axon, and is received by the synapses that join onto other neurons dendrites. Electrical signal causes the release of transmitter chemicals which flow across a small gap in the synapse (synaptic cleft). Chemicals can have an excitatory effect on the receiving neuron (making it more likely to fire) or an inhibitory effect (making it less likely to fire) Total inhibitory and excitatory connections to a particular neuron are summed, if this CSE value exceeds 590 Lecture 1 the neurons threshold the 20

Neuron structure

Inspiration from Neurobiology


A neuron: many-inputs /

one-output unit output can be excited or not excited incoming signals from other neurons determine if the neuron shall excite ("fire") Output subject to attenuation in the synapses, which are junction parts of the neuron

Electron Micrograph of a Real Neuron

22

Biological vs. Artificial Neuron


Synapse Axon Synapse Dendrites Axon

Soma Dendrites

Soma Synapse

neurons
Knowledge is represented in neural networks by the strength of the synaptic connections between neurons (hence connectionism) Learning in neural networks is accomplished by adjusting the synaptic strengths (weights) There are three primary categories of neural network learning algorithms :

Supervised exemplar pairs of inputs and (known, labeled) target outputs are used for training. Reinforcement single good/bad training signal used for training. Unsupervised no training signal; selfCSE 590 Lecture organization and1 clustering produce (and
25

BNNs versus ANNs


From experience: examples / training data Strength of connection between the neurons is stored as a weight-value for the specific connection. Learning the solution to a problem = CSE 590 Lecture 1

A physical neuron

An artificial neuron
26

Analogy between biological and artificial neural networks

Synapse Axon Synapse Axon


Out put Signals Middle Layer Input Layer Output Layer Input Signals

Dendrites

Soma Dendrites

Soma Synapse

Artificial Neuron
Input Signals x1 x2 Weights w1 w2 Neuron Y Y Output Signals Y

xn

wn

Neurons work by processing information. They receive and provide information in form of spikes. x1 x2 x3 Inputs xn-1 xn . . w3 . wn-1 wn The McCullogh-Pitts model w2 w1

z = wi xi ; y = H ( z )
i =1

Output y

The McCullogh-Pitts model:


spikes are interpreted as spike rates; synaptic strength are translated as synaptic weights; excitation means positive product between the incoming spike rate and the corresponding synaptic weight; inhibition means negative product between the incoming spike rate and the corresponding synaptic weight;

Nonlinear generalization of the McCullogh-Pitts neuron:

y = f ( x, w)
y is the neurons output, x is the vector of inputs, and w is the vector of synaptic weights. Examples:

y=

1 1+ e
w xa
T

sigmoidal neuron Gaussian neuron

y=e

|| x w|| 2 2a 2

Output

Inputs

An artificial neural network is composed of many artificial neurons that are linked together according to a specific network architecture. The objective of the neural network is to transform the inputs into meaningful outputs.

Tasks to be solved by artificial neural networks: controlling the movements of a robot based on self-perception and other information (e.g., visual information); deciding the category of potential food items (e.g., edible or non-edible) in an artificial world; recognizing a visual object (e.g., a familiar face);

Learning = learning by adaptation The young animal learns that the green fruits are sour, while the yellowish/reddish ones are sweet. The learning happens by adapting the fruit picking behavior. At the neural level the learning happens by changing of the synaptic strengths, eliminating some synapses, and building new ones.

An output signal is either discrete (e.g., 0 or 1) or it is a real-valued number (e.g., between 0 and 1) Net input is calculated as the weighted sum of the input signals Net input is transformed into an output signal via a simple function (e.g., a threshold function)
CSE 590 Lecture 1 35

Neural networks abstract from the details of real neurons

Basic Artificial Model


Consists of simple processing elements called neurons, units or nodes. Each neuron is connected to other nodes with an associated weight (strength). Each neuron has a single threshold value. Weighted sum of all the inputs coming into the neuron is formed and the threshold is subtracted from this value = activation. Activation signal is passed through an activation function (transfer function) to produce the output of the neuron.
CSE 590 Lecture 1 36

Basic Network generally Concepts A Neural


maps a set of inputs to a set of outputs Number of inputs/outputs is variable The Network itself is composed of an arbitrary number of nodes with an arbitrary topology
Input 0 Input 1

...

Input n

Neural Network

Output 0

Output 1

...

Output m

The McCulloch-Pitts Model (First Neuron Model The neuron 1943 ) inputs (0 or 1) has binary labelled x where i = 1,2, ...,n.
i

These inputs have weights of +1 for excitatory synapses and -1 for inhibitory synapses labelled wi where i = 1,2, ...,n. The neuron has a threshold value T which has to be exceeded by the weighted sum of signals if the neuron is to fire. The neuron has a binary output signal denoted by o.
CSE 590 Lecture 1 38

The output o at a time t+1 can be defined by the following equation:


1 Ot+ = 1 if i =1 1 Ot+ = 0 if
i= 1 n
n

wi xit >= T wi xit < T

i.e. output of the neuron at time t+1 is 1 if the sum of all the inputs x at time t multiplied by their weights w is greater than or equal to the threshold T, and 0 if the sum of all the inputs x at time t multiplied by their weights is less than the threshold T. Simplistic, but can perform basic logic operations590 Lecture 1 OR and AND. NOT, CSE

39

x1 x2 . Inputs

Mathematical Representation
w1 w2 wn b

net = wi xib
i =1

Output

y = f (net)

xn .

x1 x2 . . xn
Inputs

w1 w2 . . wn
Weights

+ b x0

f(n)

Summation

Activation

Output
40

CSE 590 Lecture 1

A linear neuron is a more flexible model if we include a bias. A Bias unit can be thought of as a unit which always has an output value of 1, which is connected to the hidden and output layer units via modifiable weights. It sometimes helps convergence of the weights to an acceptable solution A bias is exactly equivalent to a weight on an extra input line bias th input i that always has an activity of 1. m

Adding biases

y =b + i wi x
i= 1
index over input connections

weight on i th input

output

b w1

x1

w2

x2

OR

y = wixi
i =0

w0 = b

CSE 590 Lecture 1

x1

41

Bias as extra input


x0 = +1 x1 Input x Attribute 2 values w0
W1

w2

weights

f (x )

Activation function Output class


y

wm

Summing function

xm

y = wj xj
j =0

w0 = b
CSE 590 Lecture 1 42

Elements of the model neuron :


Xi is the input to synapse i

wij is the weight characterizing the synapse from input j to neuron i wij is known as the weight from unit j to unit i wij > 0 synapse is excitatory wij < 0 synapse is inhibitory Note that Xi may be external input or the output of some other neuron
CSE 590 Lecture 1

43

Each neuron is composed of two units. First unit adds products of weights coefficients and input signals. The second unit realize nonlinear function, called neuron activation function. Signal x is adder output signal, and y = f(x) is output signal of nonlinear element. Signal y is also output signal of neuron.
CSE 590 Lecture 1 44

Activation Functions
Usually, we dont just use weighted sum directly Apply some function to the weighted sum before it is used (e.g., as output) Call this the activation function Step function could be a good simulation of a biological neuron spiking the Is called f(n), f(net) f(e)

1 if x f ( x) = 0 if x <
CSE 590 Lecture Step function 1

threshold (T)

45

Another Activation Function: The Sigmoidal


The math of some neural nets requires that the activation function be continuously differentiable A sigmoidal function often used to approximate the step functionIs the

1 f ( x) = x 1+ e
CSE 590 Lecture 1

steepness parameter

46

Topology Layers

Connection Weights

Architecture Modular Feedforward

Heteroassociative
Cesare Pianese

Recurrent Autoassociative

Basic Neural Network & Its Elements


Bias Neurons

Input Neurons
CSE 590 Lecture 1

Hidden Neurons

Output Neurons
48

Example: Mapping from input to output


Output pattern: <-0.9, 0.2,-0.1,0.7> output layer

-0.9

0.2 -0.1 0.7 0.2 -0.5 0.8


feed-forward processing

hidden layer input layer

0.5

1.0 -0.1 0.2

Input pattern: <0.5, 1.0,-0.1,0.2>

CSE 590 Lecture 1

49

Feed Forward Neural Networks


The information is
Output layer 2nd hidden layer 1st hidden layer

propagated from the inputs to the outputs Computations of No non linear functions from n input variables by compositions of Nc algebraic functions Time has no role (NO cycle between outputs and inputs)
.. xn

x1 x2

Recurrent Neural Networks


Can have arbitrary
0 0 1 0 x1 1 x2 1 0 0

topologies Can model systems with internal states (dynamic ones) Delays are associated to a specific weight Training is more difficult Performance may be problematic Stable Outputs may be more difficult to evaluate Unexpected behavior (oscillation, chaos, )

The main characteristics of NN


Architecture: the pattern of nodes and connections between them Learning algorithm, or training method: method for determining weights of the connections Activation function: function that produces an output based on
CSE 590 Lecture 1

52

Taxonomy of neural networks

Feedforward

Recurrent

Unsupervised (Kohonen)

Supervised (MLP, RBF)


CSE 590 Lecture 1

Unsupervised (ART)

Supervised
(Elman, Jordan, Hopfield)
53

Types of connectivity
1-Feedforward networks
The neurons are arranged in separate layers There is no connection between the neurons in the same layer The neurons in one layer receive inputs from the previous layer The neurons in one layer delivers its output to the next layer The connections are unidirectional (Hierarchical) output units

hidden units

input units

2-Recurrent networks
Some connections are present from a layer to the previous layers More biologically realistic. Feedforward + feedback = recurrent CSE 590 Lecture 1

54

l l e c t i n a a L N

D t o

a t T

Artificial Neural Network Developme nt Process

Get More, a Better Data


r a i n i n g a n d T

e p D S

a r a t e e f i n e

e t w e a r n

e l e c t S e t P

Refine o r k S t r u c t u Structure Select Another i n g A l Algorithm g o r i t


V i n

r e h

a r a m r m T a n D

e t e r a t a i n

Reset
t o N

a l u

e s e t w o r

r a n S S

s f o

t a r t t o p p

r a i n d e n T

Reset

e s t n

I m
CSE 590 Lecture 1

l e m

t a t i o

Typical Neural Network

A neural network is a set of interconnected neurons (simple processing units) Each neuron receives signals from other neurons and sends an output to other neurons The signals are amplified by the strength of the connection The strength of the connection changes over time according to a feedback mechanism The net can be trained

Three-layer back-propagation neural network

Input signals x1 x2
1 1 2 1 2 i 2

y1 y2 yk

xi

wij

wjk

xn

yl

Input layer

Hidden layer Error signals

Output layer

Types of Problems Solved by NN


Classification: Regression:
determine to which of a discrete number of classes a given input case belongs predict the value of a (usually) continuous variable

Times series- you wish to predict


the value of variables from earlier values of the same or other variables
CSE 590 Lecture 1 58

Advantages of ANNs
Generalization :using responses to prior input patterns to determine the response to a novel input Inherent massively parallel Able to learn any complex non-linear mapping Learning instead of programming Robust Can deal with incomplete and/or noisy data
CSE 590 Lecture 1 59

Disadvantages of ANNs
Difficult to design The are no clear design rules for arbitrary applications Learning process can be very time consuming Can overfit the training data, becoming useless for generalization Difficult to assess internal operation It is difficult to find out whether, and if so what tasks are performed by different parts of the net Unpredictable It is difficult CSE 590 Lecture 1 to estimate future network
60

Applications
Aerospace

autopilots, flight path simulations, aircraft control systems, Automotive Automobile automatic guidance systems, warranty activity analyzers Banking Check and other document readers, credit application evaluators Defense Weapon steering, target tracking, object discrimination, facial recognition Electronics Code sequence prediction, integrated circuit chip layout,

High performance aircraft

Applications
Robotics
Trajectory control, forklift robot, manipulator controllers, vision

systems

Speech
Speech recognition, speech compression, vowel classification,

text to speech synthesis systems

Securities
Market analysis, automatic bond rating, stock trading advisory

Telecommunications
Image and data compression, automated information services,

real-time translation of spoken language, customer payment processing systems systems

Transportation
Truck brake diagnosis systems, vehicle scheduling, routing

Introduction KR and Logic

Introduction
Assumption of (traditional) AI work is that: Knowledge may be represented as symbol structures (essentially, complex data structures) representing bits of knowledge (objects, concepts, facts, rules, strategies..).
E.g., red represents colour red. car1 represents my car. red(car1) represents fact that my car is red.

Intelligent behaviour can be achieved

through manipulation of symbol structures

Knowledge representation languages


Knowledge representation languages have been

designed to facilitate this. Rather than use general C++/Java data structures, use special purpose formalisms. A KR language should allow you to:
represent adequately the knowledge you need for

your problem (representational adequacy) do it in a clear, precise and natural way. allow you to reason on that knowledge, drawing new conclusions.

Well-defined syntax/semantics

Knowledge representation languages should have precise syntax and semantics. You must know exactly what an expression means in terms of objects in the real world.
Real World Map to KR language Real World Map back to real world

Representation of facts in World Computer

Inference

New conclusions

Computer

Logic as a Knowledge Representation Language


A Logic is a formal language, with precisely defined

syntax and semantics, which supports sound inference. Independent of domain of application. Different logics exist, which allow you to represent different kinds of things, and which allow more or less efficient inference.
propositional logic, predicate logic, temporal logic,

modal logic, description logic..

But representing some things in logic may not be very

natural, and inferences may not be efficient. More specialised languages may be better..

Propositional logic
In general a logic is defined by syntax: what expressions are allowed in the language. Semantics: what they mean, in terms of a mapping to real world proof theory: how we can draw new conclusions from existing statements in the logic. Propositional logic is the simplest..

Propositional Logic: Syntax


Symbols (e.g., letters, words) are used to represent facts about

the world, e.g.,

P represents the fact Andrew likes chocolate Q represents the fact Andrew has chocolate

These are called atomic propositions Logical connectives are used to represent and: , or: , if-

then: , not: . Statements or sentences in the language are constructed from atomic propositions and logical connectives.
P Q Andrew likes chocolate and he doesnt have any. P Q If Andrew likes chocolate then Andrew has chocolate

Artificial Neural Networks and AI


Artificial Neural Networks provide
- A new computing paradigm - A technique for developing trainable classifiers,

memories, dimension-reducing mappings, etc


- A tool to study brain function

CS 561, Session 28

70

Converging Frameworks
Artificial intelligence (AI):
packet of intelligence into a machine build a

Cognitive psychology: Brain Theory:

explain human behavior by interacting processes (schemas) in the head but not localized in the brain interactions of components of the brain - computational neuroscience - neurologically constrained-models

and abstracting from them as both

Cognitive psychology:

Artificial intelligence and

- connectionism: networks of trainable quasi-neurons to provide parallel distributed models little constrained by neurophysiology - abstract (computer program or control system) information processing models

CS 561, Session 28

71

Vision, AI and ANNs

1950s: beginning of computer vision Aim: give to machines same or better vision capability as ours Drive: AI, robotics applications and factory automation Initially: passive, feedforward, layered and hierarchical process that was just going to provide input to higher reasoning processes (from AI) But soon: realized that could not handle real images 1980s: Active vision: make the system more robust by allowing the vision to adapt with the ongoing recognition/interpretation

CS 561, Session 28

72

You might also like