You are on page 1of 73

*

* Used to describe pattern of numbers.

* Example: Age, Height & weight of Joe is 37 yrs, 72 inch &
175 pounds respectively.
37 10
Joe 72 Mary 30
125 61
25 66
Carol 65 Johny 67
121 155
* Notation of vector- Bold, lowercase letters-
* Graphical representation of vector-
When a vector has no more than
3 components, it can be represented
graphically by a point or arrow in
3-D space.

* Difference between scalar, vector & matrix-

*
* Multiplication by scalars-
* If a is a scalar & v a vector, where
𝐯 = [𝑥1 𝑥2 ] then
𝐚𝐯 = 𝑎 𝑥1 𝑥2 = [𝑎𝑥1 𝑎𝑥2 ]
* Geometrically, scalar multiplication is lengthening or
shortening of v ,pointing it in same or opposite direction.
* 2 vectors that are scalar multiples of one another are
collinear-lies along same line.
*
𝑥1 𝑦1 𝑥1 𝑦1
* If 𝐱 = 𝑥2 & 𝐲 = 𝑦2 then 𝐱 + 𝐲 = 𝑥2 + 𝑦2
𝑥3 𝑦3 𝑥3 𝑦3
* Vectors must have same no: of components
* Vector addition is associative & commutative
* Example:
2 3 2+3 5
If 𝐱 = 7 & 𝐲 = 2 then 𝐱 + 𝐲 = 7 + 2 = 9
−5 12 −5 + 12 7
*
* Graphical representation of vector addition-
* Sum of a & b is the diagonal of parallelogram with sides a & b.
* Sum of 2 vectors is a vector that lies in the same plane as the
* Example: Calculating averages- Find average age, height
& weight of 4 individuals in Fig.

* This corresponds to adding 4 vectors & then multiplying

resulting sum by scalar ¼.
* Let u denote average vector. Then,
1 37 10 25 66 34.5
𝐮= 72 + 30 + 65 + 67 = 58.5
4
175 61 121 155 128
1
𝐮 = (𝐯𝟏 + 𝐯𝟐 + 𝐯𝟑 + 𝐯𝟒 )
4
*

* Linear Combinations of vectors-

*𝒖 is a linear combination of 𝒗1 & 𝒗2 if 𝒖 can be written as
scalar multiples of 𝒗1 & 𝒗2
𝒖 = 𝒄𝟏 𝒗𝟏 +𝒄𝟐 𝒗𝟐 ----------eqtn 1
* Linear Combinations of vectors(Contd)-
1 3 9
* Example: Consider 𝒗1 = 2
, 𝒗2 =
2
&𝐮=
10
Eqtn 1 is satisfied for 𝒄𝟏 =3 and 𝒄𝟐 =2.Thus, 𝒖 is a linear combination
of 𝒗1 & 𝒗2
* 𝒗𝟏 & 𝒗𝟐 spans the plane since any vector in plane can be
formed as linear combination of 𝒗𝟏 & 𝒗𝟐
* In general, 𝒗 is a linear combination of 𝒗1 , 𝒗2 … . 𝒗𝑛 if 𝒄1 , 𝒄2 … . 𝒄𝑛
can be found such that:
𝒗 = 𝒄𝟏 𝒗𝟏 +𝒄𝟐 𝒗𝟐 + ⋯ 𝒄𝒏 𝒗𝒏
1 0 0
* Example: 0 , 1 & 0 span all 3-D space since any vector 𝒗 =
𝑎 0 0 1
𝑏 can be written as
𝑐
1 0 0
𝒗=𝑎 0 +𝑏 1 +𝑐 0
0 0 1
These 3 vectors are referred as standard basis for 3-D space.
* Linear Independence-
* 2 vectors used to span 2-D space- 10 &
0
1
1 0 0
* 3 vectors used to span 3-D space- 0 , 1 & 0
0 0 1
* n vectors used to span n-D space
* If at least 1 set of n vectors can be written as linear
combination of other, then vectors span something less than a
full n-D space & the set of vectors are linearly dependent.
* Examples:
1 2 −1
* 1 1
, &
3
is linearly dependent & spans only 2-D space since
−1 1 2
=7 −4
3 1 1
1 2
* 1
&
2
is linearly dependent & spans only 1-D space since
2 1
=2
2 1
* Linear Independence(Contd)
* If none of the vectors can be written as a linear combination
of others, then set of vectors is linearly independent.
* Examples:
1 2
* 1
&
1
are linearly independent & spans 2-D space.

1 0 −2
* 2 , 1 & 0 are linearly independent & spans all 3-D space.
3 2 1
* There can be no more than n linearly independent vectors in
n-D space.
* Linear Independence(Contd)
* A set of vectors {v1, v2, …, vk} is linearly independent if the
only set of scalars c1, c2, …, ck that satisfies eqtn 2 is the set
c1 = c2 = … = ck = 0
c1v1 + c2v2 + … + ckvk = 0 --------Equation 2

* Example: Determine whether the following set of vectors is

linearly dependent or linearly independent
S = { v1 = (1, 2, 3), v2 = (0, 1, 2), v3 = (2, 0, 1)}
Solution: c1v1 + c2v2 + c3v3 = 0
 c1(1, 2, 3) + c2(0, 1, 2) + c3(2, 0, 1) = (0, 0, 0)
 (c12c3, 2c1+c2, 3c1+2c2 +c3) = (0, 0, 0)
 c1 = c2 = c3 = 0
Therefore, S is linearly independent.
*
* Vector space is set 𝑽 of vectors 𝒗1 , 𝒗2 … . 𝒗𝑛 with following
properties:
* To every vector pair 𝒖, 𝒗 ∈ 𝑽, 𝒖 + 𝒗 ∈ 𝑽(Sum) such that
vector addition is commutative & associative.
* For any scalar c & any vector 𝒗 in 𝑽, c 𝒗 ∈ 𝑽(product) such
that scalar multiplication is associative & distributive.
*
* Used to define length of a vector or similarity between vectors.
* Inner product of 2 vectors is sum of product of vector components.
* Defined only if vectors have same number of components.
* Inner product of vectors 𝒗 and 𝒘 is 𝒗. 𝒘
𝒗1 𝒘1
If 𝒗 = 𝒗2 & 𝒘 = 𝒘2 , 𝒗. 𝒘=𝒗1 𝒘1 + 𝒗2 𝒘2 +𝒗3 𝒘3
𝒗3 𝒘3
3 1
* Example: If 𝒗 = −1 & 𝒘 = 2 , 𝒗. 𝒘 = 3.1 + −1.2 + (2.1)=3
2 1
* Inner Products(Contd)- Inner product of a pair of vectors
measures following characteristics:
1. Length- Square root of inner product of vector with itself:
𝒗 = 𝒗. 𝒗
* Multiplying vector by scalar produces new vector whose length
is absolute value of scalar times the length of old vector.
𝑐𝒗 = 𝑐 𝒗
* Property of triangle inequality- length of sum of two vectors is
less than or equal to sum of lengths of 2 vectors:
𝒗1 + 𝒗2 ≤ 𝒗1 + 𝒗2
Geometrically, it corresponds to
statement that one side of a triangle is
no longer than sum of lengths of other 2 sides.
2. Angle-Angle between vectors 𝒗 and 𝒖 is:
𝒗.𝒖
𝑐𝑜𝑠𝜃 =
𝒗 𝒖

*Example: If 𝒗 = 0
1
&𝒖=
1
1
,
𝒗.𝒖
𝑐𝑜𝑠𝜃 =
𝒗 𝒖
0.1 +(1.1) 1
𝑐𝑜𝑠𝜃 = =
[ 0.0 + 1.1 ]1/2 +[ 1.1 + 1.1 ]1/2 2
0
𝜃 = 45
* Equation for angle in terms of components of vectors gives:
σ𝒏
𝒊=𝟏 𝒗𝒊 𝒖𝒊
𝑐𝑜𝑠𝜃 = 𝟏/𝟐 𝟏/𝟐
(σ𝒏
𝒊=𝟏 𝒗𝒊
𝟐) (σ𝒏
𝒊=𝟏 𝒖 𝒊
𝟐)
*Geometrical interpretation- Imagine moving 2 vectors around
in space like hands on a clock. If we hold length of vectors
constant, then equation for angle says that inner product is
proportional to the cosine of angle. Also Inner product 𝒗. 𝒖 =
𝒗 𝒖 𝑐𝑜𝑠𝜃,
* When 𝜃=00 , 𝑐𝑜𝑠𝜃 = 1 is maximum, 𝒗. 𝒖 is also maximum.
* When 𝜃=900 , 𝑐𝑜𝑠𝜃 = 0 , 𝒗. 𝒖 = 𝟎, vectors are said to be
orthogonal.
* When 𝜃=1800 , 𝑐𝑜𝑠𝜃 = −1 is minimum, 𝒗. 𝒖 is also minimum and
vectors point in opposite directions.
*Closer the 2 vectors are, larger the inner product.
*More the vectors point in opposite directions, the more
negative the inner product.
*Orthogonal vectors are vectors which lie at right angles to one
another.
*Set of orthogonal vectors- Every vector in set is orthogonal to
every other vector in the set. i.e., every vector lies at right
angle to every other vector. Eg: standard basis in 3-D space.
*Every orthogonal set is linearly independent.
*When we choose a basis for a space, we typically choose an
orthogonal basis.
3. Projections-
* Projection of one vector onto another.
* 𝒙 is the projection of 𝒗 on 𝒘:
𝑥 = 𝒗 𝑐𝑜𝑠𝜃

* 𝒙 is a scalar which indicates how much 𝒗 is pointing in the

direction of 𝒘.
* Relation between inner product & projection-
𝒗.𝒘 𝒗.𝒘
𝑥 = 𝒗 𝑐𝑜𝑠𝜃 = 𝒗 =
𝒗 𝒘 𝒘

* When 2 vectors are orthogonal, inner product=projection=0.

* If the lengths of 𝒗 and 𝒘 are held constant, then inner
product as well as projection gets larger as 𝒗 moves towards
𝒘.
4. Inner products in 2 dimensions-
* 𝒗 and 𝒘 -vectors on plane.
* 𝒗𝑥 and 𝒗𝑦 -x and y coordinates of 𝒗.
* 𝒘𝑥 and 𝒘𝑦 -x and y coordinates of 𝒘.
* l-projection of 𝒗 on 𝒘,𝒍 = 𝒗 𝑐𝑜𝑠𝜃.
* 𝒍𝑥 and 𝒍𝑦 -x and y coordinates of 𝑙.
* Triangles OAD & COB (Fig.1) are similar triangles, ratio of
corresponding sides is constant:
𝑂𝐷 𝐶𝐵 𝑙𝑦 𝒘𝑦 𝒗𝑦 . 𝒘𝑦
= ⇒ = ⇒ 𝑙𝑦 =
𝑂𝐴 𝐶𝑂 𝑣𝑦 𝒘 𝒘
* Triangles CAB & EOD (Fig.2) are similar triangles, ratio of
corresponding sides is constant:
𝐴𝐵 𝑂𝐷 𝑙𝑥 𝒘𝑥 𝒗𝑥 . 𝒘𝑥
= ⇒ = ⇒ 𝑙𝑥 =
𝐴𝐶 𝑂𝐸 𝑣𝑥 𝒘 𝒘

Fig.1 Fig.2
* Inner products in 2 dimensions(Contd)-
𝒗𝑥 . 𝒘𝑥 𝒗𝑦 . 𝒘𝑦
𝑙 = 𝒗 𝑐𝑜𝑠𝜃 = 𝑙𝑥 + 𝑙𝑦 = +
𝒘 𝒘
𝒗𝑥 . 𝒘𝑥 + 𝒗𝑦 . 𝒘𝑦
𝒗 𝑐𝑜𝑠𝜃 =
𝒘
𝒗. 𝒘
𝒗 𝑐𝑜𝑠𝜃 =
𝒘
𝒗. 𝒘
𝑐𝑜𝑠𝜃 =
𝒗 𝒘
5. Algebraic properties of inner product-
* 𝒄 & 𝒄𝑖 - any scalar.
* 𝒗 and 𝒘 – n-D vector.
𝒗. 𝒘= 𝒘. 𝒗-------------Eqtn 1
𝒄 𝒗. 𝒘 = 𝒄𝒗 . 𝒘 = 𝒗. (𝒄𝒘) -------------Eqtn 2
𝒘. 𝒗1 + 𝒗2 = 𝒘. 𝒗1 + 𝒘. 𝒗2 ------------Eqtn 3
Combining Eqtn 2 & 3,
𝒘. 𝒄1 𝒗1 + 𝒄2 𝒗2 = 𝒄1 (𝒘. 𝒗1 ) + 𝒄2 (𝒘. 𝒗2 )
In general,
𝒘. 𝒄1 𝒗1 + 𝒄2 𝒗2 + ⋯ 𝒄𝑛 𝒗𝑛 = 𝒄1 𝒘. 𝒗1 + 𝒄2 𝒘. 𝒗2 + ⋯ 𝒄𝑛 𝒘. 𝒗𝑛
*

Fig.1 Fig.2

* Processing unit receives inputs from n units below.

* Scalar u – activation of o/p unit.
* Vector 𝒗– activations of i/p units.
* Vector 𝒘– Set of n weights between i/p units & o/p unit.
* ith component of 𝒗 is activation of ith input unit. Since there
are n input units, 𝒗 is an n-D vector.
* Associated with each link between input units & output unit,
there is a scalar weight value & set of n weights is n-D vector
𝒘.
* Operation of model:
* Assume activation of each input unit is multiplied by weight on
its link & these products are added up to give activation of
output unit.
* Activation of o/p unit is inner product of its weight vector
with vector of i/p activations.
u = 𝒘. 𝒗
* O/p activation u gives an indication of how close i/p
vector 𝒗 is to the stored weight vector 𝒘.
* Inputs lying:
* close to weight vector will give a large positive response.
* near 900 will give zero response.
* pointing in opposite direction will give a large negative
response.
* Functioning of processing unit- It splits i/p space into 2
parts:
* Part where response is +ve, o/p is 1.
* Part where response is –ve, o/p is 0.
* This unit is called Linear threshold unit.
*
* Matrices-
* Matrix- array of real no:s.
* If array has m rows & n columns, it is an m*n matrix.
* Example: 3 4 5
3 0 0
10 −1
𝑀= ,𝑁 = 0 7 0 ,P =
1 0 1 −1 27
0 0 1
* Special matrices-
* Square matrix
* Diagonal matrix
* Symmetric matrix
* Identity matrix
* Multiplication by scalars-
3 4 5 9 12 15
* Example: 3𝑀 = 3 1 0 1
=
3 0 3
3 4 5 −1 0 2 2 4 7
* Example: 𝑀 + 𝑁 = 1 0 1
+
4 1 −1
=
5 1 0
* Multiplication of vector by a matrix-
𝑀𝑎𝑡𝑟𝑖𝑥 ∗ 𝑉𝑒𝑐𝑡𝑜𝑟 = 𝑎 𝑛𝑒𝑤 𝑣𝑒𝑐𝑡𝑜𝑟
1
* Example:W = 31 4 5
0 1
,𝒗= 0
2
1
3 4 5 13
u = W𝒗 = 0 =
1 0 1 3
2
* 2 row vectors in matrix W & forming inner product of these 2
row vectors with 𝒗, we get a 2-D vector u.
* In general, if W is an 𝑚 ∗ 𝑛 matrix &
𝒗, an n-D vector,
Then product u = W𝒗 is an m-D vector, whose elements are
inner products of 𝒗 with row vectors of W.
* Another method- Breaking matrix W into column vectors &
multiplying each column vector with each elements of vector 𝒗
to produce 𝒖 that is linear combination of column vectors of W.
Coefficients of linear combination are components of 𝒗.
1
* Example:W = 31 4 5
0 1
,𝒗= 0
2
* Let 𝒘𝟏 , 𝒘𝟐 & 𝒘𝟑 be column vectors of W & 𝑣1 , 𝑣2 & 𝑣3 be
components of 𝒗. Then,
𝒖 = 𝑣1 𝒘𝟏 + 𝑣2 𝒘𝟐 + 𝑣3 𝒘𝟑
3 4 5 13
= 1 +0 +2 =
1 0 1 3
* In general, for a matrix with n columns,

* For each vector 𝒗, operation W𝒗 produces another vector 𝒖.

This operation is mapping or function from one set of vectors
to another set of vectors.
* Consider an n-D vector space 𝑽 (domain) & an m-D vector
space 𝑼(range), then operation of multiplication by a fixed
matrix W is a function from 𝑽 to 𝑼. It is a function whose
domain & range are both vector spaces.
*
𝑾(𝒂𝒗)= 𝐚 𝑾𝒗-------------Eqtn 1
𝑾 𝒖 + 𝒗 = 𝑾𝒖 + 𝑾𝒗------------Eqtn 2
Combining Eqtn 1 & 2,
𝑾 𝒄1 𝒗1 + 𝒄2 𝒗2 + ⋯ + 𝒄𝑛 𝒗𝑛 = 𝒄1 𝑊𝒗1 + 𝒄2 𝑊𝒗2 + ⋯ 𝒄𝑛 (𝑊𝒗𝑛 )
𝑀𝒗 + 𝑁𝒗 = 𝑀 + 𝑁 𝒗
M & N must have same number or rows & columns.
*

* m output units are present, each one connected to all of

the n input units.
* Let 𝑢1 , 𝑢2 … 𝑢𝑚 be activation of m output units.
* Each output unit has its own weight vector 𝒘𝒊 , separate
from other output units.
* Activation of an output unit is given by inner product of
its weight vector with input vector.
𝑢𝑖 = 𝒘𝒊 . 𝒗
* Form a matrix W whose row vectors are 𝒘𝒊 . Let 𝒖 be
vector whose components are 𝑢𝑖 .Then
𝒖 = 𝑊𝒗
* For each input vector 𝒗, network producesan output
vector 𝒖 whose components are activations of output
units.
* Another way to draw the network- At each junction, a
weight connects an i/p to o/p unit.
A matrix appears in equation linking
o/p vector to i/p vector.
*
* A function f represents a system with i/p x and an o/p y:
𝑦 = 𝑓(𝑥)
* f is linear if for any i/ps x1 & x2, and any real number c,
following equations holds:
𝑓 𝑐𝑥 = 𝑐𝑓 𝑥
𝑓 𝑥1 + 𝑥2 = 𝑓 𝑥1 + 𝑓(𝑥2 )
* In a linear system,
* In a nonlinear system, response to sum is much larger or
smaller than would be based on i/ps taken separately.
* For scalar functions of a scalar variable, the only linear
function are those in which o/p is proportional to input:
𝑦 = 𝑐𝑥
* Many systems are scalar or vector functions of a vector i/p.
For a fixed vector w, the function
𝑢 = 𝐰. 𝐯
is a scalar function of a vector i/p v and is linear because
𝐰. 𝑐𝐯 = 𝑐 𝐰. 𝐯
𝐰. 𝐯1 + 𝐯2 = 𝐰. 𝐯1 + 𝐰. 𝐯2
* A system in which o/p is obtained from input by matrix
multiplication is a linear system since
𝑾(𝒂𝒗)= 𝐚 𝑾𝒗
𝑾 𝒖 + 𝒗 = 𝑾𝒖 + 𝑾𝒗
Thus, the one layer of PDP system is an example of linear
system.
* If we know o/p to all of the vectors in i/p set {𝑣𝑖 }, then we can
calculate o/p to any linear combination of 𝑣𝑖 . i.e., if
𝑾𝒗 = 𝑾 𝒄1 𝒗1 + 𝒄2 𝒗2 + ⋯ + 𝒄𝑛 𝒗𝑛 = 𝒄1 𝑊𝒗1 + 𝒄2 𝑊𝒗2 + ⋯ 𝒄𝑛 (𝑊𝒗𝑛 )
* 𝑊𝒗1 , 𝑊𝒗2 , … (𝑊𝒗𝑛 ) are known vectors which are o/p to vectors
𝑣𝑖 . Multiply these vectors by 𝑐𝑖 to calculate o/p when v is presented.
* Application-
* Study of physical system(electronic or physiological) by measuring its
responses to various inputs. If it is a linear system, first measure
responses to a set of basis that constitute a basis for input space. Then
responses to any other input vector can be calculated based on
*
* Consider two-layer or cascaded system.
* Output of 1st system becomes input to
2nd system & described by two matrix-vector
multiplications.
* I/p vector v multiplied by matrix N to get vector z on the
intermediate set of units: 𝐳 = 𝐍𝐯-------------eqtn 1
and z is multiplied by M to produce vector u on uppermost set of
units: 𝐮 = 𝐌𝐳 -------------eqtn 2
* Substituting eqtn 1 in 2 gives response for the composite system:
𝐮 = 𝐌(𝐍𝐯) -------------eqtn 3
* Equation 3 relates i/p vectors v to the o/p vectors u.
* Matrix multiplication allows to replace the 2 matrices in
Equation 3 by a single matrix P=MN.
* (i,j)th element of P is the inner product of ith row of M
with jth column of N.
* Thus, a 2-layer system is equivalent to a one-layer system
with weight matrix P.
* Cascaded matrix of any n-layer system can be replaced by
a single matrix which is the product of n matrices.
* MN≠NM
* Product of 2 matrices is defined only if:
* no: of columns of 1st matrix=no: of rows of 2nd matrix.
* (r*s matrix)*(s*t matrix)=(r*t matrix)
* Example:
3 4 5 1 2 (3 + 8 − 5) (6 + 0 + 5) 6 11
1 0 1 2 0 = (1 + 0 − 1) (2 + 0 + 1) = 0 3
0 1 2 −1 1 (0 + 2 − 2) (0 + 0 + 2) 0 2
* Another way of matrix multiplication-
* Each column vector of P is product of matrix M with
corresponding column vector in N.
* For eg: 1st column of P=1st column of N * Matrix M

* Assume a matrix P exists which can replace cascaded pair

M,N & consider what the element in 1st row & 1st column
of P should be. This element gives strength of connection
between 1st component of i/p vector v & 1st component of
o/p vector u.
* In the cascaded system, there are s paths through which
connection between 1st component of i/p vector v & 1st
component of o/p vector u occurs. To get strength of
equivalent one-layer system,
𝑝11 = 𝑚11 𝑛11 + 𝑚12 𝑛21 … … … 𝑚1𝑠 𝑛𝑠1
* In general, the strength of connection between jth element
of v & ith element of u :
𝑝𝑖𝑗 = 𝑚𝑖1 𝑛1𝑗 + 𝑚𝑖2 𝑛2𝑗 … … … 𝑚𝑖𝑠 𝑛𝑠𝑗
This calculates inner product between ith row of M & jth
column of N.
* Algebraic properties of matrix multiplication-
𝐌 c𝐍 = c𝐌𝐍
𝐌 𝐍 + 𝐏 = 𝐌𝐍 + 𝐌𝐏
𝐍 + 𝐏 𝐌 = 𝐍𝐌 + 𝐏𝐌
*
* 𝐮 = 𝐖𝐯 describes a function or mapping from one vector
space (domain) to another vector space(range). It
associates a vector u in the range with each vector v in
domain.
* But knowing that it is a linear function constrains the form
the mapping between domain & range can have:
* If 𝐯𝟏 and 𝐯𝟐 are close together in domain, then vectors
𝒖𝟏 = 𝐖𝐯𝟏 and 𝒖𝟐 = 𝐖𝐯𝟐 must be close together in range.
(Continuity property of linear functions)
* If 𝐯𝟑 is a linear combination of 𝐯𝟏 and 𝐯𝟐, and the vectors
𝒖𝟏 = 𝐖𝐯𝟏 and 𝒖𝟐 = 𝐖𝐯𝟐 are known, then 𝒖𝟑 = 𝐖𝐯𝟑 is
completely determined as linear combination of 𝒖𝟏 and 𝒖𝟐 .
* If we have a set of basis vectors for domain, and it is known
which vector in range each basis vector maps to, then
mappings of all other vectors in domain are determined.
* Consider square matrix W multiplied by vectors 𝐯𝟏 and 𝐯𝟐 .
* Vectors will change direction as well as length when
multiplied by a matrix.

v2 Wv2
Wv1

v1

* Some vectors change only in length, not direction. For

these vectors, matrix multiplication is no different than
multiplication by a scalar. Such vectors are known as
Eigenvectors.
* Each eigenvector v of a matrix obeys the equation:
𝐖𝐯 = 𝛌𝐯
where 𝛌 is a scalar called an eigenvalue & indicates how
much v is lengthened or shortened after multiplication by
W.
* Example:
4 −1 1 1 4 −1 1 1
* 2 1 2
=2
2
OR
2 1 1
=3
1
3 0 1 1 3 0 0 0
* 0 4 0
=3
0
OR
0 4 1
=4
1
* Each vector that is collinear with an eigenvector is itself
an eigenvector.
* An n*n matrix can have up to, but no more than n distinct
eigenvalues that corresponds to different directions. Also,
the n associated eigenvectors are linearly independent.
* Consider an n*n matrix W with n distinct eigenvalues &
associated eigenvectors .
* If we have a set of basis vectors for the domain of a
matrix, and if we know vectors in range associated with
each basis vector, then mapping of all other vectors in
domain are found.
* Eigenvectors of W form such a basis since there are n
linearly independent eigenvectors. Also the vectors in
range are associated with each eigenvector 𝐯𝒊 as scalar
multiples given by : 𝐖𝐯 = 𝛌𝐯
* Consider an arbitrary vector 𝐯 in the domain of W. it can
be written as linear combination of eigenvectors, since
they form a basis:
𝐯 = 𝐜𝟏 𝐯𝟏 + 𝐜𝟐 𝐯𝟐 + ⋯ + 𝐜𝐧 𝐯𝐧 Eqtn 1
𝐮 = 𝐖𝐯 Eqtn 2
Substitute eqtn 1 in 2
𝐮 = 𝐖(𝐜𝟏 𝐯𝟏 + 𝐜𝟐 𝐯𝟐 + ⋯ + 𝐜𝐧 𝐯𝐧 )
𝐮 = 𝐜𝟏 (𝐖𝐯𝟏 ) + 𝐜𝟐 (𝐖𝐯𝟐 ) + ⋯ + 𝐜𝐧 (𝐖𝐯𝐧 ) Eqtn 3
Also 𝐖𝐯𝐢 = 𝛌𝐯𝐢 , substituting in eqtn 3
𝐮 = 𝐜𝟏 𝛌𝟏 𝐯𝟏 + 𝐜𝟐 𝛌𝟐 𝐯𝟐 + ⋯ + 𝐜𝐧 𝛌𝒏 𝐯𝐧 Eqtn 4
* No matrices in Eqtn 4. Each term 𝐜𝒊 𝛌𝒊 is a scalar. Matrix
multiplication has been reduced to simple linear
combination of vectors.
* If we know eigenvectors & eigenvalues of a matrix, there
is no need to store the matrix. The matrix can be written
as linear combination of eigenvectors multiplied by
associated eigenvalues.
* Eigenvectors turn matrix multiplication into simple
multiplication by scalars.
*
* Transpose-
* Transpose of an n*m matrix W is an m*n matrix denoted 𝐖 𝐓.
* i,j th element of 𝐖 𝐓 is j,i th element of W .
* Example:
𝑇 3 1
3 4 5
= 4 0
1 0 2
5 2
* Row vectors of 𝐖 𝐓 are column vectors of W, and column
vectors of 𝐖 𝐓 are row vectors of W.
* Algebraic properties of Transpose-
(𝐖 𝐓 )𝐓 = 𝐖
(𝐜𝐖)𝐓 = 𝐜𝐖 𝐓
(𝐌 + 𝐍)𝐓 = 𝐌 𝐓 + 𝐍 𝐓
(𝐌𝐍)𝐓 = 𝐍 𝐓 𝐌 𝐓
* A matrix W is symmetric if 𝐖𝐓 = 𝐖.
*
3 0
* Let 𝐯 = 1 & 𝐮 = 4
2 1
0
* Then, 𝐯𝐓 𝐮 = 3 1 2 4 =  is inner product of
1
vectors 𝐯 and 𝐮.
0 0 0 0
* Consider product 𝐮𝐯𝐓 = 4 3 1 2 = 12 4 8 .
1 3 1 2
* 𝒏𝟐 inner products are calculated.
* i,j th element of resulting matrix is equal to product 𝑢𝑖 𝑣𝑗 .
* Products of the form 𝐮𝐯𝐓 are called as outer products.
* If 𝐖 = 𝐮𝐯𝐓 , then ith row of W is given by
𝐰𝐢 = ui 𝐯
u𝐢 is ith component of vector 𝐮.
*
* Simple linear PDP systems are modelled by the equation
𝐮 = 𝐖𝐯
* Case I: 𝐮 has only one component u.
Find a weight vector w such that it gives
output u when input vector is v.
u = 𝐰. 𝐯
u and v are given, w is unknown.
* If we choose w=v, then w.v = v.v = 1≠ u.
* If we choose w=uv, then
w.v = (uv).v = u(v.v) = u.
* Geometrically, finding w corresponds to
finding a vector whose projection on v is u.
Any vector along dotted line will work.
* Case II: u has more than
one component.
* Each o/p unit has a weight
vector & they form rows of W.
* Each unit calculates inner product b/w its weight vector & i/p
vector v, which are components of o/p vector u.
* i th weight vector is given by: 𝐰𝐢 = ui 𝐯
* Find weight matrix W when i/p v and o/p u is given.
* Let W=𝐮𝐯𝐓 , then 𝐖𝐯 = 𝐮𝐯𝐓 𝐯 = 𝐮 𝐯 𝐓 𝐯 = 𝐮
* This is called Hebbian learning rule or local learning rule where, a
matrix W is chosen that associates a particular o/p vector u to an
i/p vector v.
* Given n-D o/p vectors 𝐮𝟏 , 𝐮𝟐 ,….. 𝐮𝐧 to be associated with n-D i/p
vectors 𝐯𝟏 , 𝐯𝟐 ,….. 𝐯𝐧 . For each i, we wish to have
𝐮𝐢 = 𝐖𝐯𝐢
Case I:Assume vectors 𝐯𝐢 form a mutually orthogonal set & 𝐯𝐢 is of
𝐓 1 if i = j
unit length: 𝐯𝐢 𝐯𝐣 = ቊ
0 otherwise
𝐓
Form a set of matrices using learning scheme as: 𝐰𝐢 = 𝐮𝐢 𝐯𝐢
Form a composite weight matrix W which is sum of 𝐰𝐢
𝐖 = 𝐖𝟏 + 𝐖𝟐 +. . +𝐖𝐢 +. . 𝐖𝐧
For any arbitrary i,
𝐖𝐯𝐢 =(𝐖𝟏 + 𝐖𝟐 +. . +𝐖𝐢 +. . 𝐖𝐧 ) 𝐯𝐢
𝐓 𝐓 𝐓 𝐓
= (𝐮𝟏 𝐯𝟏 + 𝐮𝟐 𝐯𝟐 + ⋯ 𝐮𝐢 𝐯𝐢 + ⋯ 𝐮𝐧 𝐯𝐧 ) 𝐯𝐢
𝐓 𝐓 𝐓 𝐓
= (𝐮𝟏 𝐯𝟏 )𝐯𝐢 + (𝐮𝟐 𝐯𝟐 )𝐯𝐢 + ⋯ (𝐮𝐢 𝐯𝐢 )𝐯𝐢 + ⋯ (𝐮𝐧 𝐯𝐧 ) 𝐯𝐢
𝐓 𝐓 𝐓 𝐓
= 𝐮𝟏 (𝐯𝟏 𝐯𝐢 ) + 𝐮𝟐 (𝐯𝟐 𝐯𝐢 ) + ⋯ 𝐮𝐢 (𝐯𝐢 𝐯𝐢 ) + ⋯ 𝐮𝐧 (𝐯𝐧 𝐯𝐢 )
= 0 + 0 + ⋯ + 𝐮𝐢 . 𝟏 + ⋯ + 0
= 𝐮𝐢
* Case II: When set of i/p vectors is not orthogonal, Hebb
rule will not correctly associate o/p vectors with i/p
vectors.
* But, a modification of Hebb rule called Delta rule or
Widrowhoff rule can make such associations.
* Requirement for delta rule to work is that i/p vectors are
linearly independent.
* Case III: For square matrices, knowledge of eigenvectors
permits an important simplification: matrix multiplication
of a vector can be replaced by scalar multiplication.
* We can fit in Hebbian learning with the idea of
eigenvectors.
* We want vectors 𝐮𝐢 = 𝐖𝐯𝐢 = 𝛌𝒊 𝐯𝐢 , where 𝐯𝐢 are i/p
vectors.
* Using outer product learning rule,
𝐖 = 𝐖𝟏 + 𝐖𝟐 +. . +𝐖𝐢 +. . 𝐖𝐧 where
𝐓 𝐓
𝐖𝐢 = 𝐮𝐢 𝐯𝐢 = 𝛌𝒊 𝐯𝐢 𝐯𝐢
Presenting vector 𝐯𝐢 to matrix 𝐖 thus formed,
𝐖𝐯𝐢 =(𝐖𝟏 + 𝐖𝟐 +. . +𝐖𝐢 +. . 𝐖𝐧 ) 𝐯𝐢
𝐓 𝐓 𝐓
= (𝛌𝟏 𝐯𝟏 𝐯𝟏 + ⋯ + 𝛌𝐢 𝐯𝐢 𝐯𝐢 + ⋯ 𝛌𝐧 𝐯𝐧 𝐯𝐧 ) 𝐯𝐢
𝐓 𝐓 𝐓
= 𝛌𝟏 𝐯𝟏 𝐯𝟏 𝐯𝐢 + ⋯ + 𝛌𝐢 𝐯𝐢 𝐯𝐢 𝐯𝐢 + ⋯ 𝛌𝐧 𝐯𝐧 𝐯𝐧 𝐯𝐢
= 𝟎 + ⋯ + 𝛌𝐢 𝐯𝐢 . 𝟏 + ⋯ + 𝟎
𝐖𝐯𝐢 = 𝛌𝐢 𝐯𝐢 ------------------eqtn 1
* Eqtn 1 shows that 𝐯𝐢 is an eigenvector of 𝐖 with
eigenvalues 𝛌𝐢 .
* When we calculate a weight matrix 𝐖 using Hebbian
learning rule & associate i/p vectors to scalar multiples of
themselves, then those i/p vectors are eigenvectors of 𝐖.
* Once we have eigenvectors & eigenvalues of a matrix, 𝐖
need not even be calculated.
* All input-output combinations can be done using equation:
𝐮 = 𝐜𝟏 𝛌𝟏 𝐯𝟏 + 𝐜𝟐 𝛌𝟐 𝐯𝟐 + ⋯ + 𝐜𝐧 𝛌𝒏 𝐯𝐧
*
* Inverse of a matrix 𝐖 is another matrix 𝐖 −𝟏 that obeys
following equations:
𝐖𝐖 −𝟏 = 𝐈
𝐖 −𝟏 𝐖 = 𝐈
𝐈 is identity matrix.
* Example:
𝟏 𝟏/𝟐 𝟐/𝟑 −𝟏/𝟑
𝐖= 𝐖 −𝟏 =
−𝟏 𝟏 𝟐/𝟑 𝟐/𝟑
𝟏 𝟏/𝟐 𝟐/𝟑 −𝟏/𝟑 𝟏 𝟎
𝐖𝐖 −𝟏 = =
−𝟏 𝟏 𝟐/𝟑 𝟐/𝟑 𝟎 𝟏
𝟐/𝟑 −𝟏/𝟑 𝟏 𝟏/𝟐 𝟏 𝟎
𝐖 −𝟏 𝐖 = =
𝟐/𝟑 𝟐/𝟑 −𝟏 𝟏 𝟎 𝟏
* Consider equation 𝐮 = 𝐖𝐯 where 𝐮 and 𝐖 are known & 𝐯
is unknown. Multiply both sides of equation by 𝐖 −𝟏 .
𝐖 −𝟏 𝐮 = 𝐖 −𝟏 𝐖𝐯 = 𝐈𝐯 = 𝐯
* Thus solution of 𝐯 is given by: 𝐯 = 𝐖 −𝟏 𝐮.
* Example: Find vector 𝐯 that satisfies the equation:
𝟏 𝟏/𝟐 𝟑
𝐯=
−𝟏 𝟏 𝟑
𝟏 𝟏/𝟐 𝟐/𝟑 −𝟏/𝟑 𝟑
Here 𝐖 = , 𝐖 −𝟏 = ,𝐮 =
−𝟏 𝟏 𝟐/𝟑 𝟐/𝟑 𝟑
𝟐/𝟑 −𝟏/𝟑 𝟑 𝟏
𝐯 = 𝐖 −𝟏 𝐮 = =
𝟐/𝟑 𝟐/𝟑 𝟑 𝟒
* Equation 𝐯 = 𝐖 −𝟏 𝐮 is a linear mapping . Domain of mapping is
range of 𝐖 and range of mapping is domain of 𝐖.
* 𝐖−𝟏 represent function from one vector space to another.
* For every 𝐮 in domain of 𝐖−𝟏 , there can only be one 𝐯 in the
range such that 𝐯 = 𝐖 −𝟏 𝐮. If 𝐖 maps any two distinct points 𝐯1
and 𝐯2 in its domain to same point 𝐮 in its range, (𝐖 is not one-
to-one) then there can be no 𝐖 −𝟏 to represent inverse
mapping.
* A matrix has an inverse only if its column vectors are
linearly independent.
* For square matrices with linearly dependent column
vectors & non-square matrices, an inverse called
Generalised inverse can be defined which performs part
of inverse mapping.
* Rank of matrix- number of linearly independent column
vectors in a matrix.
* An n*n matrix has full rank if its rank is n. The condition
that a matrix has an inverse is equivalent to condition
that it have full rank.
*
* A basis for a vector space is a set of linearly independent
vectors that span the space.
* To make a change of basis, we need to describe the
vectors & matrices in terms of new basis.
* Numbers that are used to represent a vector are relative
to a particular choice of basis. When we change basis,
these numbers, called coordinates also changes & we
have to relate coordinates in a new basis to coordinates in
old basis.
* Example:
2
* Consider vector 𝐯 which in standard basis have coordinates
1
* Change basis by choosing two new basis vectors:
1 1/2
𝐲𝟏 = , 𝐲𝟐 =
−1 1
* 𝐯 can be written as linear combination of 𝐲𝟏 and 𝐲𝟐 by using coefficients
1 & 2.
2 1 1/2
𝐯= = 𝟏. 𝐲𝟏 + 𝟐. 𝐲𝟐 = 1. + 2.
1 −1 1
* Let 𝐯 ∗ represent 𝐯 in new basis, then
1
𝐯∗ =
2
* The coordinates of a vector 𝐯 in a new basis 𝐲𝟏 , 𝐲𝟐 ,…,𝐲𝒏 is the
coefficients 𝐜𝐢 in the equation:
𝐯 = 𝐜𝟏 𝐲𝟏 + 𝐜𝟐 𝐲𝟐 + ⋯ + 𝐜𝐧 𝐲𝐧 ------- eqtn 1
* Form a matrix 𝐘 whose columns are the new basis vectors 𝐲𝐢 &
let 𝐯 ∗ be the vector whose components are 𝐜𝐢 . From eqtn 1,
𝐯 = 𝐘𝐯 ∗
where 𝐯 ∗ is unknown.
* To calculate unknown vector 𝐯∗ , use inverse matrix 𝐘 −1
𝐯 ∗ = 𝐘 −1 𝐯
1 1/2
* Example: Letting 𝐲𝟏 = −1 , 𝐲𝟐 = , we have
1
𝟏 𝟏/𝟐 −1 𝟐/𝟑 −𝟏/𝟑
𝐘= &𝐘 =
−𝟏 𝟏 𝟐/𝟑 𝟐/𝟑
𝟐/𝟑 −𝟏/𝟑 2 1
Thus 𝐯 = 𝐘 𝐯 =
∗ −1
=
𝟐/𝟑 𝟐/𝟑 1 2
* Consider a square matrix 𝐖 that transforms vectors using
equation 𝐮 = 𝐖𝐯.
* We change basis & write 𝐯 and 𝐮 in the new basis as 𝐯∗ and
𝐮∗ .
* Find a matrix 𝐖∗ such that
𝐮∗ = 𝐖 ∗ 𝐯 ∗
* Convert 𝐯∗ back to original basis,
then map from 𝐯 to 𝐮 using matrix 𝐖,
and finally convert 𝐮 to 𝐮∗ .
𝐯 = 𝐘𝐯 ∗ ,𝐮 = 𝐖𝐯,𝐮∗ = 𝐘 −𝟏 𝐮
Putting these 3 equations together,
𝐮∗ = 𝐘 −𝟏 𝐮
= 𝐘 −𝟏 𝐖𝐯
= 𝐘 −𝟏 𝐖𝐘𝐯 ∗
Thus 𝐖 ∗ must be equal to 𝐘 −𝟏 𝐖𝐘.
* Matrices related by an equation of the form 𝐖 ∗ = 𝐘 −𝟏 𝐖𝐘 are
called similar.
* Consider matrix 𝐖 and change its basis to eigenvectors of 𝐖.
* Find matrix 𝐖∗ in new basis.
* For each eigenvector 𝐲𝐢,
𝐖𝐲𝐢 = 𝜆𝐢 𝐲𝐢
* If 𝐘 is a matrix whose columns are 𝐲𝐢, then
𝐖𝐘 = 𝐘𝜦
where 𝜦 is a diagonal matrix whose entries on main diagonal
are eigenvalues 𝜆𝐢 . Now multiply both sides by 𝐘 −𝟏
𝐘 −𝟏 𝐖𝐘 =𝜦= 𝐖 ∗
Thus matrix 𝐖 ∗ = 𝜦. When we use eigenvectors as new basis,
the matrix corresponding to 𝐖 in new basis is a diagonal matrix
whose entries are eigenvalues.
* Change of basis for PDP models-
* A linear structure of a set of vectors remain the same over
a change of basis. i.e., if a vector can be written as a
linear combination of a set of vectors in one basis, then it
can be written as the same linear combination of those
vectors in all bases.
* Example: Let 𝐰 = a𝐯𝟏 + b𝐯𝟐 . Let 𝐘 be matrix of change of
basis, then
𝐰 ∗ = 𝐘 −𝟏 𝐖
= 𝐘 −𝟏 a𝐯𝟏 + b𝐯𝟐
= 𝐚𝐘 −𝟏 𝐯𝟏 + b𝐘 −𝟏 𝐯𝟐
= 𝐚𝐯𝟏 ∗ + 𝐛𝐯𝟐 ∗
Thus change of basis is a linear operation.
* Behaviour of a linear PDP model depends entirely on
linear structure of i/p vectors.
*
* Consider one unit of PDP model which computes closeness
of its weight vector & i/p vector in space.
* Draw a line perpendicular to weight vector
at some point. All vectors on this line
project to same point on weight vector,
their inner product with weight vectors are
equal.
* All vectors to left of line have smaller inner product & all
vectors to right have larger inner product.
* Choose fixed number as threshold for unit such that if
inner product>threshold, unit outputs a 1& if
inner product <threshold, unit outputs a 0.
* Such a unit breaks space into 2 parts.
* The use of threshold & unit can be used to classify
patterns as belonging to one group or another.
* The threshold permits unit to make a decision. All i/p
vectors on same side of space lead to same response.
* A function relating activation of unit & its output is shown
in Fig. It produces a 1 or 0 based on
magnitude of activation.
* Also possible to have probabilistic
threshold. The farther activation is above
threshold, more likely unit is to have an
output of 1 & vice versa.
* Another example of an underlying linear model modified with
a nonlinear function is subthreshold summation.
* In biological systems,2 stimuli presented separately to
system provoke no response, although when presented
simultaneously a response is obtained. Once the system is
responding, further stimuli are responded to in a linear
fashion.
* Only if sum of activations
produced by vectors exceeds T will
a response be produced. There is a
linear range in which system responds
linearly.
* Subthreshold summation suppresses
noise. System will not respond to small random i/ps that are
assumed to be noise.
* All physical systems have a limited dynamic range. i.e.,
response of system cannot exceed a certain maximum
response.
* Figure shows a linear range followed by cutoff.
* System will behave linearly until o/p
reaches M, at which point no further
increase can occur.
Figure below shows a nonlinear function
Which also have a maximum output M.
It is called sigmoid & combines noise suppression with a
limited dynamic range.