You are on page 1of 16

# Chapter 1 : Vectors

Sections
1.1 Vectors
1.3 Scalar Vector Multiplication
1.4 Inner Product
1.5 Complexity Of Vector Computations
Exercises

## introduce vectors + some common operations on them.

Some settings in which vectors are used.

1.1 Vectors

## A vector is a an ordered *finite* list of numbers, usually written as vertical

arrays surrounded by brackets

| -1.1 |
| 0.0 |
| 3.6 |
| 5.2 |

## alternate notation : ( -1.1, 0.0, 3.6, 5.2 )

The elements of the vector are called entries, elements, components, or coefficiens
of that vector.

The size (or dimension, or length) of the vector is the number of entries it
contains

The element above has size 4 and its third (_ note 1 indexing) is 3.6

## We use symbols to denote vectors. If an n-vector is denoted by the symbol a, a_i

denotes its i-th element. i is called its *index* and runs from 1 to n.

Two vectors a and b are considered equal iff they have the same size, and the same
elements in the same order.
Denoted by a = b. If a and b are n-vectors a = b iff a_1 = b_1 and a_2 = b_2
and .... and a_n = b_n.

The numbers that are the elements in a vector are called scalars.
We focus on the case where elements of a vector are real numbers (other types of
vectors, where contents are, e.g: complex numbers, exist)

## The set of all real numbers is written as R.

The set of all real n-vectors is denoted by R^n.
Therefore to say that a element of R^n is the same as to say that a is an n-vector.

## Block or stack vectors.

It is sometimes useful to create new vectors by blocking or stacking other vectors.

e.g:
v = | a |
| b |
| c |

## where a, b and c are themselves vectors. If this vector is called v, and a is an m-

vector and b is an n-vector and c is a p-vector, then v is an (m+n + p) vector.

## Stacked Vectors can also include scalars.

So | 1 |
| a | where a is a 3-vector is the same as

| 1 |
| a_1 |
| a_2 |
| a_3 |

Subvectors/Slices

## In the stacked vector v above, a, b and c are called subvectors or slices of v,

with sizes m, n, p respectively.

## Colon notation is used for slices

a_r:s is the vector that contains elements a_r thru a_s inclusive and has size (s -
r + 1)
The subscript r:s is called an index range.

## In the stacked vector example

a = v_1:m
b = v_(m+1): (m+n)
c = v_(m + n + 1) : v_(m + n + p)

1. In many computer languages, indices run from 0 to n-1. In math notation they
run from 1 to n
2. a_3 can mean the 3d element of vector a *or* the third vector of a list of
vectors.

## (a_i)_j refers to the j-th entry of the vector a_i.

Zero Vector: A vector whose elements are all 0. Written as 0_n or just 0, the size
being figured out from the context.

A standard unit vector is an n-vector with all its elements equal to zero *except*
one element which is equal to 1.

The i-th unit vector is a standard unit vector with the i-th element equal to 1.

## e.g: the vectors

e_1 = | 1 | e_2 = | 0 | e_3 = | 0 |
| 0 | | 1 | | 0 |
| 0 | | 0 | | 1 |
are the unit vectors of size 3.

Above, e_1 denotes a unit *vector*, not the ith element of vector e. This is an
example of notational ambiguity.

## We can denote the i-th unit vector e_i as

(e_i)j = 0 if i != j
l if i == j

## A sparse vector is a vector with most of its elements == 0

The sparsity pattern of vector is the set of its indices which are non - zero.

Unit vectors are sparse since they have only 1 non zero entry
The zero vector is the 'sparsest' possible vector.

## The number of non zero entries of a vector x is denoted by nnz(x)

nnz(unit_vector) = 1

Examples:

## An n vector can be used to n quantities or values in an application and may have

the same or different units attached to the numbers.

## an interesting use case.

feature vectors.
the elements in a vector denote n different quantities relating to attributes
of a single thing or object. The entries of a feature vector are called features or
attributes.
e.g: a 6 vector v could be age, height, weight, blood pressure, temperature,
gender of a patient admitted to a hospital, with gender being encoded as say, 0 for
male, 1 for female. Note that the quantities have different physical units.

## Vector Entry Labels

Often a separate vector maintains the labels for entries in other vectors. E.g
a vector could hold the labels (AGE, HEIGHT, WEIGHT ...) etc.

Two vectors *of the same size* can be added together by adding the corresponding
elements to create a new vector called the sum vector.
| 0 | | 1 | | 1 |
| 7 | + | 2 | = | 9 |
| 3 | | 0 | | 3 |

Similarly vector subtraction. The result is called the difference of the two
vectors.

1. commutativity : a + b = b + a
2. associativity : a + (b + c) = (a + b) + c
3. a + 0 = 0 + a = a
4. a - a = 0
Examples:
Treating the components of a vector as displacements along an axis gives the
'geometric interpretation' of vector addition. THe order of displacement doesn't
matter (commutativity) the resulting vector is the same.

## 1.3 Scalar Vector Multiplication

is done by multiplying each element of the vector by a scalar.

e.g
| 1 | | -2 |
(-2)| 9 | = | -18 |
| 6 | | -12 |

## Properties of scalar vector multiplication

1. commutatitivity
for any scalar alpha and vector v , alpha * v = v * alpha
2. associativity
for any two scalars alpha, beta and vector v
alpha (beta v) = (alpha * beta) v

## 3. (alpha + beta) v = alpha * v + beta * v

4. v (alpha + beta) = (v * alpha) + (v * beta)
5. beta (v + w) = beta * v + beta * w

## (KEY) Linear Combinations

If a_1, a_2, ...., a_m are n-vectors, and alpha_1, ... , alpha_m are scalars, then
the n-vector alpha_1 * a_1 + ... + alpha_m * a_m is the linear combination of
vectors a_1 through a_n
The scalars alpha_1 thru alph_m are called the coefficients of the combination.

(KEY) Any m-vector can be written as the linear combination of unit-vectors of size
m.
b = b1 * e1 + ... + b-m * e-m.
where b_i is the i-th entry of b and e_i is the i-th unit vector.

e.g
| 2 | 2 * | 1 | 9 * | 0 | 6 * | 0 |
| 9 | = | 0 | + | 1 | + | 0 |
| 6 | | 0 | | 0 | | 1 |

## Special Linear Combinations

1. When alpha1, ... , alpha-m are all = 1, the linear combination is the sum of
the vectors a_1 thru a_m
2. when alpha1, ... , alpha-m are all = 1/m , the linear combination, given by
1/m (a_1 + ... a_m) is called the average of the vectors.
3. when the sum of all the coefficients alpha-1 thru alpha_m = 1, then the
linear combination is called an affine combination.
4. when the coefficients in an affine combination are all non negative, the
combination is called a weighted average (or a convex combination, or mixture). The
coefficients in an affine or convex combination are sometimes given in percentages

Examples
1. When vectors represent displacement, linear combinations represents the sum
of scaled displacements.
2. when vectors represent a time series of audio signals over the same time
period (called a 'track') the linear combination is called a mixture or a 'mix'. A
producer in a studio or a sound engineer at a rock show chooses alpha_1 through
alpha_m to provide a balance between different instruments and voices.
3. when a vector represents a cashflow, a linear combination represents a
replication
4. When a and b are different vectors, the affine combination

## represents a point on the line that passes through a and b.

when 0 <= theta <= 1, the affine combination is called a convex combination of
a and b and the point is said to lie on the *segment* between a and b.

## Definition: (Standard) Inner Product of two m-vectors a and b is defined as

(a^T) * b = a1b1 + ... + a_m b_m
i.e the sum of products of the corresponding entries. (the notation a^T will be
explained later)

e.g:
T
| -1 | | 1 |
| 2 | | 0 | = (-1 * 1) + (2 * 0) + (2 * -3) = -1 + 0 + -6 = -7
| 2 | | -3 |

note: when m = 1, the inner product reduces to the product of two numbers

Properties:
1. Commutativity a.b = b.a
2. Associativity with scalar multiplication: alpha*(a T b) = (alpha * a) T b
3. Associativity with vector addition: alpha * (a + b) = alpha * a + alpha * b

General Examples
1. (e_i T a) = a_i the inner product of a vector with the i-th unit vector
gives or 'picks out' the i-th component of a.
2. (1 T a) = gives the sum of elements of the vector.
3. (1/n T a) = average of the elements of the vector.
4. (a T a) = a_1 ^ 2 + a_2 ^ 2 + .... + a_n ^ 2 = sum of squares of the
elements of the vector.
5. Let b_m be a vector all of whose entries are 0 or 1. Then (b T m) is the sum
of those elements of a, which correspond to the elements of bi with value 1.

## Inner Product Of Block Vectors.

if a vectors a and b are composed of block vectors (REM: here the components are
themselves vectors)

then T
| a_1 | | b_1 |
| ... | | ... | = (a_1 T b_1) + ... + (a_k T b_k) IFF the size of a_i = size
of b_i aka 'if a_i and b_i *conform*.
| a_k | | b_k |

Applications:
1. If A and B are m-vectors that describe occurrences, i.e each of their
elements is 0 or 1, then their inner product gives the total number of indices for
which their components are both 1.
2. when vector a represents features of an object and vector b represents a
list of weights, then a T b represents a weighted sum of features, sometimes called
a score. Thus a credit score can be obtained from a feature vector (age, income
etc) and a weight vector.
3. price quantity (as in a bill of goods) if one vector represents quantities
of goods, and another vector represents their prices, then the inner product
represents the total price of goods
4. If one vector represents the probabilities of m outcomes (which sum to 1)
and another represents the value of a variable per outcome, then their dot product
represents the expected value of the outcome.
5. Polynomial Evaluation:
Consider the polynomial
p(x) = c0 + c1 x + c2 x^2 + ... + cn x^n (note: n + 1 terms. I use zero
indexing here vs 1 indexing in the text to make the exponent and co-efficient
number match up)

## then c0 thru c_(n-1) are the *co-efficients* of the polynomial. Let

these be in a vector c.
Let t be some number. Let (1, t, t^2, ... t^n) be another vector.

## 6. Document Sentiment Analysis

Let a dictionary have m words. Let a vector of size m have a value 1 at
index m if the m-th word is positive, 0 if the word is neutral and -1 if the word
is negative. Let another vector of size m have the frequency of occurence of the m-
th dictionary word in a document (_ most of these would be zero for any given
document). Then the dot product of these vectors gives a (crude) estimate of the
sentiment expressed in the document.

## 1.5 Complexity Of Vector Computations

Real numbers are stored in computers with 64 bits, (== 8 bytes of 8 bits each).
Vectors are stored as arrays of floating point numbers. A vector with n elements
takes n * 64 bits (or n * 8 bytes) of storage. With today's giga/tera byte storage
we can store vectors with millions or billions of dimensions.

Sparse vectors are stored in a way that tracks indices and non zero values.

Roundoff Errors
When computers do numerical FP ops (aka FLOPS), the results are rounded to the
nearest FP operator. The very small error in the result (between actual and rounded
value) is called floating point error. For most applications this is irrelevant.
The study of FP errors and how to mitigate thim is a part of numerical analysis.
This is not considered in this book.

FLOP counts.
A very rough approximation of the time it takes to do vector (and matrix and
tensor) operations can be done by counting the total number of floating point
operations performed to do that operation. The speed with which a computer can
perform FLOPS is expressed as giga flops per second. (the actual time it takes for
a computer to perform a lin alg op depends on many factors other than flops, so
flops can be approximated trivially (ignoring factors of 2 for example).

Complexity.
In this book we use the term 'complexity' to denote the number of flops
required to carry out a lin alg op by the best method.

## Complexity Of Vector Operations:

1. Scalar Vector Multiplication alpha x takes m multiplications, one for each
of alpha * x_i. Order of m.

2. Vector addition takes m additions, one for each a_i, b_i. Order of m flops.

## 3. Inner product takes m scalar multiplications + ( m - 1) additions, for a

total of 2m - 1 flops. We simplify this to 2m flops. order of m

How long does it take a computer that can do a billion flops per second to
compute the inner product of two vectors having a million entries each?

## Complexity of sparse vector operations

If vectors are sparse, then
1. scalar vector multiplication takes nnz(a) flops
2. vector addition takes min (nnz(a), nnz(b)) flops. (adding something to 0
does not consume flops. so if non sparse element positions of a and b don't
intersect then it takes 0 flops to compute)
3. inner product takes min(nnz(a),nnz(b)) flops to multiply components and
min(nnz(a),nnz(b)) - 1 additions for a total of 2 * min (nnz(a), nnz(b)) - 1 flops.
When sparsity patterns don't overlap, takes 0 flops since x T y == 0 in this case.

## Cash Flow application

1. Vectors: A vector can represent the cash flow of an entity. Each positive
entry is a cash receipt by the entity. Each negative entry is a cash payment by
entity.
e.g: (1000, -10, -10, -1010) can represent a loan taken by the entity, with 1%
interest payment at the end of two periods, with a final payment of principal +
interest for that period.

3. scalar multiplication -

4.linear combination:
suppose c1, c2, c3 are m-vectors representing loans, investments etc.
the linear combination (beta1 * c1) + (beta2 * c2) + (beta3 * c3)
represents a cash flow that has *been replicated* by the original cashflows c1, c2,
c3.

let
c1 = (1, -1.1, 0) \$1 loan. paid off in period 2. no money incoming or
outgoing in period 3
c2 = (0, 1, -1.1) no in/out flowin period 1. \$1 loan taken in period 2.
paid off in period 3

## Then d = c1 + 1.1 * c2 = (1, 0, -1.21) is an equivalent loan with \$1 loan taken

in period 1. no interest paid in period 2. paid off in period 3.Note: 1.1 comes
from (1 + 0.1^1). We replicated a two period loan from two one period loans. (_
still don't understand wtf this is).

5. inner product
let c be an m-vector representing a cash flow, with c_i the cash received
(paid out, when < 0) in a period. (_ so there are m periods)
let d be an m-vector defined as (1, 1/(1 + r), 1/(1 + r)^2, ... , 1/(1
+r^(m - 1)) ), where r >= 0 is an interest rate.

Then the inner product of these vectors is the 'discounted total of the
cash value' or net present value with interest rate r.

Exercises

1. Vector equations. Determine whether each of the equations below is true, false,
or contains bad notation (and therefore does not make sense)
(a)

| 1 |
| 2 | = (1, 2, 1)
| 1 |

true

(b) | 1 |
| 2 | = [1,2,1]
| 1 |
false incorrect notation [] instead of ()

## (c) (1,(2,1) = ((1, 2), 1)

false. vectors are ordered.

2. Which of the following uses correct notation? when the expression makes sense,
calculate the length. In the following a and b are 10 vectors and c is a 20 vector.

a. (a + b ) - c_3:12
this is correct. result is a 10 vector.

b. (a, b, c_3:13)
this is correct. this is a stacked vector with 31 elements.

c. 2a + c
this is incorrect. trying to add a 10 vector and a 20 vector.

d. (a,1) + (c1, b)
this is correct. the result is a 11 vector.

e. ((a,b), a)
this is correct. the result is a 2 vector.

f. [a,b] + 4c
since we haven't learned matrices yet, this is incorrect. (and anyway you can't add
a matrix and a vector of non conforming dimensions)

g. | a | + 4c
| b |

## Exercise 3: notation ambiguity blah. skipped.

Exercise 4
Periodic energy usage. The 168-vector w gives the hourly electricity consumption of
a manufacturing plant, starting on Sunday midnight to 1AM, over one week, in MWh
(megawatt-hours). The consumption pattern is the same each day, i.e., it is 24-
periodic,which means that w_{t+24} = w_t for t = 1. ... 144. Let d be the 24-vector
that gives the energy consumption over one day, starting at midnight.
(a) Use vector notation to express w in terms of d.
(b) Use vector notation to express d in terms of w.

w = (d, d, ..64 times .. d)
d = w_{1:64}

Exercise 1.5
Interpreting sparsity. Suppose the n-vector x is sparse, i.e., has only a few
nonzero entries.
Give a short sentence or two explaining what this means in each of the following
contexts.
(a) x represents the daily cash flow of some business over n days
most days no money comes in

## (b) x represents the annual dollar value purchases by a customer of n products or

services.
the customer does not buy most products / services

## (c) x represents a portfolio, say, the dollar value holdings of n stocks.

the portfolio has only a few stocks

## (d) x represents a bill of materials for a project, i.e., the amounts of n

materials needed.
few materials are needed for the project

## (e) x represents a monochrome image, i.e., the brightness values of n pixels.

(assuming 0 represents black) the image is mostly dark

## (f) x is the daily rainfall in a location over one year.

it rains only a few days a year

Exercise 1.6

## Vector of differences. Suppose x is an n-vector. The associated vector of

differences is the (n − 1)-vector d given by d = (x2 − x1, x3 − x2, ... , xn −
xn−1). Express d in terms of x using vector operations (e.g., slicing notation,
sum, difference, linear combinations, inner product).

## The difference vector has a simple interpretation when x represents a timeseries.

For example, if x gives the daily value of some quantity, d gives the day-to-
daychanges in the quantity

## x_n = | x1 | d_(n-1) = | x2 - x1 | = | x2 | | x1 | x_{2:n} -

x_{1:(n-1)}
| x2 | | x3 - x2 | | x3 | - | x2 | =
| .. | | ... | | .. | | .. |
| xn | | xn - x(n-1)| | xn | | x(n-1) |

Exercise 1.7

Transforming between two encodings for Boolean vectors. A Boolean n-vector is one
for which all entries are either 0 or 1.
Such vectors are used to encode whether each of n conditions holds, with ai = 1
meaning that condition i holds.
Another common encoding of the same information uses the two values −1 and +1 for
the entries.

For example the Boolean vector (0; 1; 1; 0) would be written using this alternative
encodingas (−1; +1; +1; −1).

Suppose that x is a Boolean vector with entries that are 0 or 1, and y is a vector
encoding the same information using the values −1 and +1.
Express y in terms of x using vector notation. Express x in terms of y.

## easy enough to do programmatically with a substitute y for x operator. and also a

complement(boolean) function. Else not sure how to do tis.

Exercise 1.8

## Profit and sales vectors. A company sells n different products or items.

The n-vector p gives the profit, in dollars per unit, for each of the n items.

(The entries of p are typically positive, but a few items might have negative
entries.
These items are called loss leaders,and are used to increase customer engagement in
the hope that the customer will make
other, profitable purchases.)

The n-vector s gives the total sales of each of the items, over some period (such
as a month), i.e., s_i is the total number of units of item i sold. (These are also
typically nonnegative, but negative entries can be used to reflect items that were
purchased in a previous time period and returned in this one.)

## Express the total profit in terms of p and s using vector notation.

Solution:
There are n items.
p gives profit per item
s gives sales (and returns) per item.

p T s ;; inner product

Exercise 1.9

## Symptoms vector. A 20-vector s records whether each of 20 different symptoms is

present in a medical patient, with s_i = 1 meaning the patient has the symptom and
s_i = 0 meaning she does not.

## (a) The total number of symptoms the patient has.

1 T s

(b) The patient exhibits five out of the first ten symptoms.

## the statement is in the form of an assertion so the type of whatever expression we

come up with has to be boolean
1 T s_{1:10} == 5

Exercise 1.10

## Total score from course record.

The record for each student in a class is given as a 10-vector r, where r1, ... ,
r8 are the grades for the 8 homework assignments, each on a 0-10
scale, r9 is the midterm exam grade on a 0-120 scale, and r10 is final exam score
on a0-160 scale.

The student’s total course score s, on a 0-100 scale, is based 25% on the homework,
35% on the midterm exam, and 40% on the final exam.

## Express s in the form s = wT r. (That is, determine the 10-vector w.)

You can give the coefficients of w to 4 digits after the decimal point.

;; workthrough

doing manually
total homework score = (1 T r_{1..8}) / 80 = ths/80
mid-term exams score - r_9 = mte/120
final_exam score = r_10 / 160

## since LCM = 480 this is

total homework score = (1 * r_{1..8}) / 80 = ths/80 = 6 * ths/480
mid-term exams score - r_9/120 = mte/120 = 4 * mte / 120
final_exam score = r_10/160 = fse / 160 = 3 * fse / 480

## = 1/480 (1.5 * ths + 1.4 * mte + 1.2 fse)

T
= 1/480 * | 1.5 | | ths |
| 1.4 | | mte |
| 1.2 | | fse |

= T
1/480 * | 1.5 | | ths |
| 1.4 | | r_9 |
| 1.2 | | r_10 |

Exercise 1.11

## Word count and word count histogram vectors.

Suppose the n-vector w is the word count vector associated with a document and a
dictionary of n words.
For simplicity we will assume that all words in the document appear in the
dictionary.
(a) What is 1 T w?
The total number of words in the document.

## (b) What does w_{282) = 0 mean?

There are no occcurences of the 282nd word (in the dictionary) in the document.

(c) Let h be the n-vector that gives the histogram of the word counts, i.e., h_i is
the fraction of the words in the document that are word i.
Use vector notation to express h in terms of w. (You can assume that the document
contains at least one word.)

## h = 1/sigma * n where sigma = 1 T n.

Exercise 1.12

Total cash value. An international company holds cash in five currencies: USD (US
dollar), RMB (Chinese yuan), EUR (euro), GBP (British pound), and JPY (Japanese
yen), in amounts given by the 5-vector c.

For example, c2 gives the number of RMB held. Negative entries in c represent
liabilities or amounts owed.
Express the total (net) value of the cash in USD, using vector notation.

Be sure to give the size and define the entries of any vectors that you introduce

## Your solution can refer to currency exchange rates.

I introduce a 5-vector e which holds the exchange ratios from the currencies in c
to the USD (the first value is 1)
Then inner product e T c gives the total value of the holdings in US \$

Exercise 1.13

## Average age in a population. Suppose the 100-vector x represents the distribution

of ages in some population of people, with x_i being the number of i−1 year olds,
for i = 1, .. ,100.

(You can assume that x != 0, and that there is no one in the population over age
99.)

## (a) The total number of people in the population.

1 * x

(b) The total number of people in the population age 65 and over.
1 * x_{66:100}

(c) The average age of the population. (You can use ordinary division of numbers in
]/99 * ((0 .. 99 ) T x)

Exercise 1.14

## Industry or sector exposure.

Consider a set of n assets or stocks that we invest in.

Let f be an n-vector that encodes whether each asset is in some specific industry
or sector, e.g., pharmaceuticals or consumer electronics.
Specifically, we take f_i = 1 if asset i is in the sector, and f_i = 0 if it is
not.

Let the n-vector h denote a portfolio, with h_i the dollar value held in asset i
(with negative meaning a short position).
The inner product f T h is called the (dollar value) exposure of our portfolio to
the sector.

It gives the net dollar value of the portfolio that is invested in assets from the
sector.

## A portfolio h is called neutral (to a sector or industry) if f T h = 0.

A portfolio h is called long only if each entry is nonnegative, i.e., h_i ≥ 0 for
each i. This means the portfolio does not include any short positions.

## What does it mean if a long-only portfolio is neutral to a sector, say,

pharmaceuticals?
with an argument.

work through.
n assets
practically only one sector, call it pharma
n-vector f is a boolean vector which denotes if asset i (imagine a specific
share here, say GOOG) is in the sector or not.
a second n-vector h contains dollar value of each asset in a portfolio
inner product f T h == exposure of portfolio h to the sector
defn: h neutral to sector s = f T h = 0 (i.e this could be because we hold no
stocks in that sector or the dollar value of that asset is zero the sum of long and
short exposures is zero)
defn: long portfolio = a portfolio in which all entries are non negative

## ? == what does it mean if a long-only portfolio is neutral to a sector?

Putting the two defns together, it means that the portfolio has no assets in
that sector or the asset values are zero. Or in vector terms, f is a zero vector
for that sector.

Exercise 1.15

Cheapest supplier.

You must buy n raw materials in quantities given by the n-vector q where q_i is the
amount of raw material i that you must buy.
A set of K potential suppliers offer the raw materials at prices given by the n-
vectors p1, ... , p_K. (Note that p_k is an
n-vector; (p_k)_i is the price that supplier k charges per unit of raw material i.)

## We will assume that all quantities and prices are positive.

If you must choose just one supplier, how would you do it? Your answer should use
vector notation.
A (highly paid) consultant tells you that you might do better (i.e., get a better
total cost)
by splitting your order into two, by choosing two suppliers and ordering (1/2)q
(i.e., half
the quantities) from each of the two. He argues that having a diversity of
suppliers is
better. Is he right? If so, explain how to find the two suppliers you would use to
fill half the order.

workthru:
n raw materials
n-vector q with amounts of raw materials to buy. all quantities positive.
k potential suppliers
p is a k vector of supplier prices, each of which is an n-vector. all prices
positive

If you must choose just one supplier, how would you do it? Your answer should use
vector notation.

## min_k (inner product q T p_k)

To get two suppliers, select suppliers with lowest values for (inner product q T
p_k)

Exercise 1.16

## Inner product of nonnegative vectors. A vector is called nonnegative if all its

entries are
nonnegative.
(a) Explain why the inner product of two nonnegative vectors is nonnegative.
because the inner product is built by multiplying and adding real numbers. With
non negative real numbers, neither op can result in a negative number
(b) Suppose the inner product of two nonnegative vectors is zero. What can you say
i.e., which entries are zero and nonzero.

Exercise 1.18

## Linear combinations of linear combinations. Suppose that each of the vectors

b1, . . . , bk is
a linear combination of the vectors a1, ... ,am, and c is a linear combination of
b1, . . . , bk.
Then c is a linear combination of a1, . . . , am. Show this for the case with m = k
= 2.
(Showing it in general is not much more difficult, but the notation gets more
complicated.)

Let b1 = p1 a1 + p1 a2
b2 = p3 a1 + p4 a2 p_n are scalars

then
c = p5 b1 + p6 b2
== substuting for b1 and b2
p5 (p1 a1 + p2 a2) + p6 (p3 a1 + p4 a2)
== algebra
(p5p1 + p6p3) a1 + (p5p2 + p6p4) a2
== arithmetic
p7 a1 + p8 a2

Exercise 1.19

Auto-regressive model. Suppose that z1, z2, ... is a time series, with the number
z_t giving
the value in period or time t.

For example z_t could be the gross sales at a particular store on day t.

An auto-regressive (AR) model is used to predict zt+1 from the previous Mvalues,
z_t, z_(t−1), ... , z_(t−M+1)
p_(t+1) = (z_t, zt−1, . . . , z_(t−M+1)) T β, t = M, M + 1, ....

Here p_(t+1) denotes the AR model’s prediction of z_t+1, M is the memory length of
the
AR model, and the M-vector β is the AR model coefficient vector.

For this problem we will assume that the time period is daily, and M = 10. Thus,
the AR model predicts
tomorrow’s value, given the values over the last 10 days.

For each of the following cases, give a short interpretation or description of the
AR model
in English, without referring to mathematical concepts like vectors, inner product,
and
so on. You can use words like ‘yesterday’ or ‘today’.

## (_ e_i are unit vectors)

(a) β ≈ e1.
sales today will be the same as yesterday's sales
(b) β ≈ 2e1 − e2.
sales at time T will be the difference between twice yesterday's sales and day
before yesterday's sales

(c) β ≈ e6.
sales at time T will be the sales from 6 days ago

## (d) β ≈ 0:5e1 + 0:5e2

arguably this doesn't make sense because the inner product fails if the z
vector has > 5 days of data.

Exercise 20

How many bytes does it take to store 100 vectors of length 10^5?

10^5 * 8 bytes

How many flops does it take to form a linear combination of them (with 100 nonzero
coefficients)?

## 200 approx (100 multiplications, 99 additions)

About how long would this take on a computer capable of carrying out 1 Gflop/s?
200 / 1,000,000,000 seconds approximately 1 tenth-millionth of a second.