You are on page 1of 56

Matrix Algebra

Background
Matrices, Vector and Scalars
• A matrix is a rectangular or square array of
numbers arranged in rows or columns.
•  
• A=

• With three rows and two columns, the matrix


A is said to be 3x2. 
• In general if a matrix A has n rows and p
columns, it said to be nxp. Alternatively the
size of A is nxp.
• A vector is a matrix with a single column or
row. The following could be the test of
students grades in Multivariate Analysis
course:
• A single real number is called a scalar.
• Two matrices are equal if they are of the same size
and the elements in corresponding positions are
equal. Thus, if A=(aij) and B=(bij) then A=B if
• aij = bij for all i and j.
Transpose and symmetric matrix
• The transpose of a matrix A denoted by A’ is
obtained from A by interchanging rows and
columns.
• Example:
• If the transpose of a matrix is the same as the
original matrix, as in the last example, the
matrix is said to be symmetric.
• Special matrices
• The diagonal of a pxp square matrix A consist of
the elements a11, a22,…app
• For example, the matrix D is a diagonal matrix
as it contains zeros in all off diagonal positions.
• This matrix can also be denoted as:
• D=diag(3,6,2).
• A diagonal matrix with a 1 in each diagonal
position is called an identity matrix, and is
denoted by I. For example,
• An upper triangular matrix is square matrix
with zeros below the diagonal such as:
• A lower triangular matrix is defined similarly.
• A vector of 1’s is denoted by j:

• A square matrix of 1’s is denoted by J:


Addition of Matrices and Vectors:

• If two matrices (or two vectors) are the same


size, their sum is found by adding
corresponding elements: that is, if A is nxp and
B is nxp, then C=A+B is also nxp and is found
as (cij )=(aij +bij ).
• For example:
• Multiplication of Matrices:
• Vector Length: Given that a is a vector of size
nx1, then a’a is a sum of squares, and aa’ is a
square (symmetric ) matrix. The square root of
the sum of squares of the elements of a is the
distance from the origin to the point a, and
also referred to as the length of a:
• Rank:
• A set of vectors a1 , a2 , ….., an is said to be linearly
dependent if constants c1 , c2 , ….., cn (not all zero) can be
found such that
c1 a1, c2 a2, ….., cn an=0
• if no constants c1 , c2 , ….., cn can be found satisfying
condition above, the set of vectors is said to be linearly
independent.
• Then the rank of square of rectangular A can be defined
as:
• Rank(A)= number of linearly independent rows
or columns of A.
• It can be shown that the number of linearly
independent rows of a matrix is always equal
to the number of linearly independent columns.
• If A is nxp, the maximum possible rank of A is
the smaller of n and p, in which case A is said to
be full rank. For example,
• has rank 2 because the two rows are linearly
independent (neither row is a multiple of the
other).
• Inverse:
• If a matrix A is square and of full rank, then A
is said to be nonsingular, and A has a unique
inverse, such that:
• Example
• If A is square and of less than full rank, then an
inverse does not exist, and A is said to be
singular. Note that a rectangular matrices do
not have inverse even if they are full rank.
Positive Definite Matrices
• The symmetric matrix A is said to be positive
definite if x’Ax >0 for all possible vectors x
(except x=0). Similarly, A is positive semidefinite
if x’Ax >=0 for all x not equal 0.
• A positive definite matrix A can be factored into
• A=T’T
• Where T is a nonsingular upper triangular
matrix.
• One way to obtain T is the Cholesky
decomposition which can be carried out in
the following steps:
• Let A= (aij) and T=(tij ) be nxn. Then the
elements of T are found as:
• Example

• Then by the Cholesky decomposition:


• Trace:
• The trace of a matrix A is the sum of the
diagonal elements of A, that is:
• Example:

• Then
• tr(A)=5+(-3)+9=11
Orthogonality
• Orthogonal Vectors and Matrices:
• Two vectors a and b of the same size are said
to be orthogonal if

• If a’a=1, the vector a is said to be normalized.


The vector a can always be normalized by
dividing by its length, .
• Thus,
• is normalized so that c’c=1.
• A matrix C whose columns are normalized and
mutually orthogonal is called an orthogonal
matrix.

• Eigen values and Eigen vectors:
• For every square matrix A , a scalar λ and a
nonzero vector x can be found such that
where λ is called eigen value of A. To find λ
and x we write: (A- λI)x=0.
• Example:
• The characteristic equation is:

• Then:
• To find the eigen vector corresponding to , we
use the characteristic equation:
• Then
• The solution vector can be written with an
arbitrary constant:

• If c is set equal to
to normalize the eigen vector, we obtain
• Similarly, corresponding to

• we have
Positive Definite and Semidefinite Matrices

• The eigen values and eigen vectors of positive


definite and positive semidefinite matrices have
the following properties:
• The eigen values of a positive definite matrix
are all positive.
• The eigen values of a positive semi definite
matrix are positive or zero with the number of
positive eigen values equal to the rank of the
matrix.
• It is customary to list the eigen values of a
positive definite matrix in descending order:

The following result known as the Perron-


Forbenius theorm: If all elements of a positive
definite matrix A are positive, then all
elements of the first eigen vector are positive.
The first eigen vector is the one associated with
the first eigen value.
Displaying Multivariate Data

Basic concepts:
• A random variable can be defined as a variable
whose value depends on the outcome of a chance
experiment.
• A density function, f(y), indicates the relative
frequency of occurrence of the random variable y.
Thus, f(y1)>f(y2), then points in the neighborhood
of y1 are more likely to occur than points in the
neighborhood of y2. .
• The sample mean of a random sample of n
observations:
• y1 , y2 ……… yn is given by
• Generally, the sample mean y bar will never
be equal to the population mean. However,
the sample mean is considered a good
estimator of the population mean, because
E( ) =population mean.
• The sample variance is defined:

• The sample variance is never equal to the


population variance, , but it is an unbiased
estimator , that is :
• The square root of the sample or population
variances is standard deviation.
• Covariance: When two variables, x and y tend
to covary, means that if one is above its mean,
the other is more likely to be above its mean,
and vice versa. This illustrates positive
covariance. With negative covariance the x
and y tend to deviate simultaneously to
opposite sides of the mean. The sample
covariance is defined as:
• Alternatively,
• It is important to notice that the sample
covariance measures only linear relationships.
If the actual relationship is non-linear the
sample covariance cannot measure the
relationship correctly.
• Variables with zero sample covariance can be
said to be orthogonal.

• To see this, since we know from the definition
of orthogonality:

• which is also true for centered variables, then


Correlation:

• One major problem with covariance is that it


depends on the scale of measurement of x
and y, for example if the measurement scale
changed from inches to centimeters, the
covariance will change.
• To find a measure of linear relationship that is
invariant to change of scale we can
standardize the covariance by dividing by the
standard deviation of the two variables. This
standardized covariance is called correlation:
• The sample correlation is:

• Correlation range between -1 and 1.


Multivariate Covariance and Correlation Matrices:

• The variance-covariance matrix of the p variables


is:

• It should be noted that the term


is always used to denote the variance –covariance
matrix, it should not be confused with the
summation symbol.
• Similarly, the multivariate correlation matrix
is:

• The correlation matrix can be obtained from


the covariance matrix, and vice versa:
• Then

• Where S is the sample Variance-Covariance


matrix. Similarly, the sample Variance-
covariance matrix is:
Covariance matrices for subsets of variables:
Some times a researcher want to study two
different kind of variables simultaneously. We
will denote these two sub-vectors by x and y,
with p variables in y, and q variables in x. Thus,
each observation vector in a sample is
partitioned as:
• The covariance matrix S, becomes:
Where is pxp is pxq is qxp, is qxq.

You might also like