## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Ram Sewak Dubey

**D EPARTMENT OF E CONOMICS , F ELICIANO S CHOOL OF B USINESS , M ONTCLAIR S TATE
**

U NIVERSITY, M ONTCLAIR , N EW J ERSEY, 07043

E-mail address: dubeyr@mail.montclair.edu

Contents

Preface vii

Syllabus ix

§0.1. Overview ix

§0.2. Course Schedule x

§0.3. Topics covered x

§0.4. Textbook xi

§0.5. Mathematics Proficiency Test xi

**Chapter 1. Introduction to Logic 1
**

§1.1. Introduction 1

§1.2. Statements 2

§1.3. Logical Connective 3

§1.4. Quantifiers 8

§1.5. Rules of Negation of statements with quantifiers 12

§1.6. Logical Equivalences 14

§1.7. Some Math symbols and Definitions 15

**Chapter 2. Proof Techniques 17
**

§2.1. Methods of Proof 17

§2.2. Trivial Proofs 18

§2.3. Vacuous Proofs 18

§2.4. Proof by Construction 19

iii

iv Contents

§2.5. Proof by Contraposition 21

§2.6. Proof by Contradiction 22

§2.7. Proof by Induction 24

§2.8. Additional Notes on Proofs 28

§2.9. Decomposition or proof by cases 30

Chapter 3. Problem Set 1 35

Chapter 4. Set Theory, Sequence 37

§4.1. Set Theory 37

§4.2. Set Identities 42

§4.3. Functions 44

§4.4. Vector Space 45

§4.5. Sequences 50

§4.6. Sets in Rn 56

Chapter 5. Problem Set 2 63

Chapter 6. Linear Algebra 65

§6.1. Vectors 65

§6.2. Matrices 67

§6.3. Determinant of a matrix 72

§6.4. An application of matrix algebra 75

§6.5. System of Linear Equations 78

§6.6. Cramer’s Rule 82

§6.7. Principal Minors 84

§6.8. Quadratic Form 85

§6.9. Eigenvalue and Eigenvectors 86

§6.10. Eigenvalues of symmetric matrix 89

§6.11. Eigenvalues, Trace and Determinant of a Matrix 90

Chapter 7. Problem Set 3 93

Chapter 8. Single and Multivariable Calculus 97

§8.1. Functions 97

§8.2. Surjective and Injective Functions 97

§8.3. Composition of Functions 99

§8.4. Continuous Functions 101

Contents v

**§8.5. Extreme Values 103
**

§8.6. An application of Extreme Values Theorem 105

§8.7. Differentiability 107

§8.8. Monotone Functions 112

§8.9. Functions of Several Variables 115

§8.10. Composite Functions and the Chain Rule 119

Chapter 9. Problem Set 4 121

Chapter 10. Convex Analysis 123

§10.1. Concave, Convex Functions 123

§10.2. Quasi-concave Functions 129

Chapter 11. Problem Set 5 133

Chapter 12. Inverse and Implicit Function Theorems 135

§12.1. Inverse Function Theorem 135

§12.2. The Linear Implicit Function Theorem 136

§12.3. Implicit Function Theorem for R2 138

Chapter 13. Homogeneous and Homothetic Functions 141

§13.1. Homogeneous Functions 141

§13.2. Homothetic Functions 144

Chapter 14. Problem Set 6 145

Chapter 15. Unconstrained Optimization 147

§15.1. Optimization Problem 147

§15.2. Maxima / Minima for C 2 functions of n variables 148

§15.3. Application: Ordinary Least Square Analysis 154

Chapter 16. Problem Set 7 159

Chapter 17. Optimization Theory: Equality Constraints 161

§17.1. Constrained Optimization 161

§17.2. Equality Constraint 163

Chapter 18. Optimization Theory: Inequality Constraints 173

§18.1. Inequality Constraint 173

§18.2. Global maximum and constrained local maximum 180

Chapter 19. Problem Set 8 187

vi Contents

**Chapter 20. Envelope Theorem 191
**

§20.1. Envelope Theorem for Unconstrained Problems 191

§20.2. Meaning of the Lagrange multiplier 193

§20.3. Envelope Theorem for Constrained Optimization 194

Chapter 21. Elementary Concepts in Probability 195

§21.1. Discrete Probability Model 195

§21.2. Marginal and Conditional Distribution 199

§21.3. The Law of Iterated Expectation 202

§21.4. Continuous Random Variables 202

Chapter 22. Solution to PS 1 205

Chapter 23. Solution to PS 2 213

Chapter 24. Solution to PS 3 219

Chapter 25. Solution to PS 4 227

Chapter 26. Solution to PS 5 233

Chapter 27. Solution to PS 6 239

Chapter 28. Solution to PS 7 247

Chapter 29. Solution to PS 8 257

Preface

These notes have been prepared for the Math Review Class for Graduate students joining Ph. D.

program in the field of economics at Cornell University. While making these notes we have referred

to the material used in previous years’ classes.

The objective of Math Review class is to present elementary concepts from set theory, multi-

variable calculus, linear algebra, elementary probability concepts, real analysis and optimization

theory. I have used examples and problem sets to explain the concepts, definitions and techniques

which are useful in Fall semester graduate economics classes.

These notes could serve to refresh the memory for those incoming students who are familiar

with the material. To others, these notes could be a ready reckoner of math techniques they will

need to know in the first few weeks of the graduate classes (Econ 6090, Econ 6130, Econ 6190) in

Economics before they are discussed in Econ 6170 in more rigorous way.

The topics have been arranged so that the entire material can be covered in thirteen classes

of three hours duration each. Additional problem sets with solutions are provided on each day’s

material. Three additional sections of three hours each are sufficient to go over the questions in

problem sets. It is hoped that they will help the reader to better understand the material in lecture

notes.

Earlier versions have been used for the Math Review Classes during 2009-16. My sincere

thanks go to the participants for their comments and also for pointing out typos, errors.

Ram Sewak Dubey

vii

Syllabus

**Math Review 2017
**

Field of Economics

Cornell University

Instructor: Ram Sewak Dubey Office Room: 474G, Uris Hall

Office Hour: 12:15 -1:15pm E-mail: rsd28@cornell.edu

0.1. Overview

The Field of Economics offers the August Math Review Course for incoming first-year Ph.D.

students. The aim of this review is to refresh students’ mathematical skills and introduce concepts

that are critical to success in the first year economics core courses, i.e., Econ 6090, Econ 6130,

Econ 6170, and Econ 6190. The emphasis is on rigorous treatment of proof techniques, underlying

concepts and illustrative examples.

There is usually a great deal of variation in the mathematical background of incoming first-year

students. However, almost all students have something to gain from the review course. For those

who do not have an adequate mathematics background (by a US Ph.D. standard), the course offers

an opportunity to catch up on critical concepts and get a head start on the fall classes. For those

who took their core undergraduate courses in analysis and algebra some years ago, the course is a

good refresher. For those who do not have significant experience with technical courses taught in

English, the review offers an opportunity to pick up the math vocabulary that will be in use from

the first day of regular instruction.

The Math Review Course is funded by the Department of Economics. There is no charge for

students matriculating into the Economics Ph.D. Program. Students matriculating into other Ph.

D. programs should contact the Director of Graduate Studies in their Field. There will be a charge

ix

x Syllabus

for these students, and the DGS in the student’s Field must make arrangements to pay that charge

before the student may attend the Math Review Course.

The Math Review Course is not linked to Econ 6170, Intermediate Mathematical Economics I.

There is no course grade, and no record will be kept of your performance. However, the Economics

Ph.D. program strongly encourages you to attend. Most students who have taken this course in past

years have found it useful, regardless of their prior mathematics training. Perhaps most importantly,

the review period is an excellent time to get acquainted with other incoming students, meet the

faculty and settle into Ithaca.

0.2. Course Schedule

The course duration will be July 31- August 18. There will be a lecture session each working day.

The room for all the sessions is URIS 202.

(A) Session Time:

July 31-August 4, August 7-11, 14-18 Time: 9am-Noon.

(B) There will be a handout of some basic definitions distributed at each session, and practice

problems will be assigned on each topic. You are strongly encouraged to at least attempt every

problem, as this is the best way to understand the material. The problem sets will be due the

following day in class (for example, the problem set given in class on Monday will be due on

Tuesday) and I intend to grade some of the questions in each problem set. We will go over the

solutions to the problem sets in class.

0.3. Topics covered

A. Elements of Logic: Statements, Truth tables, Implications, Tautologies, Contradictions, Logical

Equivalence, Quantifiers, Negation of Quantified Statements

B. Proof Techniques: Trivial Proofs, Vacuous Proofs, Direct Proofs, Proof by Contrapositive,

Proof by Cases, Proof by Contradiction, Existence Proofs, Proof by Mathematical Induction

C. Set Theory: Definitions, Set Equality, Set Operations, Venn Diagrams, Set Identities, Cartesian

Products, Properties of the Set of Real Numbers

D. Sequences: Convergent Sequences, Subsequences, Cauchy Sequences, Upper and Lower Lim-

its, Algebraic Properties of Limits, Monotone Sequences

E. Functions of One Variable: Limits of Functions, Continuous Functions, Monotone Functions,

Properties of Exponential and Logarithmic Functions

F. Linear Algebra: System of Linear Equations, Solution by Substitution or Elimination of Vari-

ables, Systems with Many or No Solutions

G. Vectors I: Addition, Subtraction, Scalar Multiplication, Length, Distance, Inner Product

H. Matrix Algebra I: Addition, Subtraction, Scalar and Matrix Multiplication, Transpose, Laws of

Matrix Algebra

0.5. Mathematics Proficiency Test xi

**I. Determinants: Definition, Computation, Properties, Use of Determinants, Matrix Inverse, Cramer’s
**

Rule

J. Vectors II: Linear Independence, Rn as an example of Vector Space, Basis and Dimension in

Rn

K. Matrix Algebra II: Algebra of Square Matrices, Eigenvalues, Eigenvectors, Properties of Eigen-

values

L. Differential Calculus: Derivative of a Real Function, Mean Value Theorem, Continuity of

Derivatives, L’Hospital’s Rule, Higher Order Derivatives, Taylor’s Theorem

M. Functions of Several Variables: Graphs of Functions of Two Variables, Level Curves, Continu-

ous Functions, Total Derivative, Chain Rule, Partial Derivatives

N. Unconstrained Optimization: First Order Conditions, Global Maxima and Minima, Examples

O. Constrained Optimization with equality constraints: First Order Conditions, Constrained Min-

imization Problems, Examples

P. Constrained Optimization with inequality constraints: Kuhn-Tucker conditions, Interpreting

the Multipliers, Envelope Theorem

0.4. Textbook

There is no textbook for the math review course, however the following books may be helpful.

The textbook ? is used in the Microeconomics course sequence. ? and ? are useful textbooks for

Mathematical Economics. It will be useful to refer to ? for understanding the material. Copies of

this textbook are available in the libraries. ? will be our reference book for analysis. ? contains

many useful examples. ? is the set of Lecture Notes used in Econ 6170. It should be available at

the bookstore at the start of the course.

**0.5. Mathematics Proficiency Test
**

A Mathematics Proficiency Test will be given on Friday, August 18, 2017 from 12:30pm - 3:30

pm in URIS 202. The test will be based on the course material of Economics 6170. If you pass

this test, you have satisfied the mathematics proficiency requirement of the field of economics,

and need not take the Economics 6170 course. If you fail this test, or if you do not take this test,

you can complete the mathematics proficiency requirement of the field of economics by taking the

Economics 6170 course for credit, and getting a course grade of B- or better.

If you would like any more information, you can contact me at rsd28@cornell.edu. Enjoy

your summer and I look forward to meeting you in August.

Chapter 1

Introduction to Logic

1.1. Introduction

The theory that you’ll learn during the first year is built on a foundation borrowed from engineering

and pure mathematics. You will be required to both understand and reproduce certain key proofs,

particularly in microeconomics. On some problem sets and exams you’ll be asked to produce your

own proofs.

If you haven’t taken any pure math courses, you might be thinking “I don’t even know what a

proof is”. That is completely fine. There are plenty of very accomplished Ph.D. students at Cornell

who had no idea how to write a proof when they arrived. It’s important not to get discouraged

because it takes time to learn how to write good proofs. There is a standard bag of tricks that will

get you through almost any proof in the first year sequence, but it takes exposure and then practice

for you to learn and be comfortable with these tricks. Math majors are at an advantage here, more

than in most areas, but by the end of the year they’ll have forgotten the fancier proof techniques

and you’ll have learned the necessary ones, so the field will be surprisingly level.

A proof is a series of statements that demonstrates the truth of a proposition. In writing a proof

you make use of (i) the rules of logic and (ii) Definitions, theorems, and other propositions that

have already been proved, or that you are told you can take as given.

The rules of logic are obviously fixed and unchanging. The components of the second point,

however, will vary depending on the task at hand. The most important question to ask yourself

when attempting to prove a proposition is “What do I already know”? It will often be the case that

if you write down all of the relevant mathematical definitions, the theorems or results that you were

given or that you know you can take as given, and any result that you just proved in a previous

problem, a straightforward rearrangement of everything on the page will give you the proof that

you want.

1

2 1. Introduction to Logic

**In this chapter we will discuss the principles of logic that are essential for problem solving in
**

mathematics. The ability to reason using the principles of logic is key to seek the truth which is

our goal in mathematics. Before we explore and study logic, let us start by spending some time

motivating this topic. Mathematicians reduce problems to the manipulation of symbols using a set

of rules. As an illustration, let us consider the following problem.

Example 1.1. Joe is 7 years older than John. Six years from now Joe will be twice Johns age. How

old are Joe and John?

Solution 1.1. To answer the above question, we reduce the problem using symbolic formulation.

We let John’s age be x. Then Joe’s age is x + 7. We are given that six years from now Joe will be

twice John’s age. In symbols, (x + 7) + 6 = 2(x + 6). Solving for x yields x = 1. Therefore, John

is 1 year old and Joe is 8.

**Our objective is to reduce the process of mathematical reasoning, i.e., logic, to the manipulation
**

of symbols using a set of rules. The central concept of deductive logic is the concept of argument

form. An argument is a sequence of statements aimed at demonstrating the truth of an assertion (a

“claim”). Consider the following two arguments.

Argument 1. If x is a real number such that x < −3 or x > 3, then x2 > 9. Therefore, if x2 ≤ 9,

then x ≥ −3 and x ≤ 3.

Argument 2. If it is raining or I am sick, then I stay at home. Therefore, if I do not stay at home,

then it is not raining and I am not sick.

**Although the content of the above two arguments is very different, their logical form is the
**

same. To illustrate the logical form of these arguments, we use letters of the alphabet (such as p, q

and r) to represent the component sentences and the expression ”not p” to refer to the sentence “It

is not the case that p.”. Then the common logical form of both the arguments above is as follows:

If p or q, then r. Therefore, if not r, then not p and not q.

We start by identify and giving names to the building blocks which make up an argument. In

Arguments 1 and 2, we identified the building blocks as follows:

Argument 1. If x is a real number such that x < −3 (p) or x> 3 (q), then x2 > 9 (r). Therefore,

if ≤ 9 (not r), then x ≥ −3 (not p) and x ≤ 3 (not q).

x2

Argument 2. If it is raining (p) or I am sick (q), then I stay at home (r). Therefore, if I do not

stay at home (not r), then it is not raining (not p) and I am not sick (not q).

1.2. Statements

The study of logic is concerned with the truth or falsity of statements.

Definition 1.1 (Statement). A statement is a sentence which can be classified as true or false

without ambiguity. The truth or falsity of the statement is known as the truth value.

1.3. Logical Connective 3

**For a sentence to be a statement, it is not necessary for us to know whether it is true or false.
**

However, it must be clear that it is one or the other.

Example 1.2. Consider following examples.

(a) One plus two equals three. It is a statement which is true.

(b) One plus one equals three. It is also a statement which is not true.

(c) He is a university student. This sentence is neither true nor false. The truth or falsity depends

on the reference for the pronoun he. For some values of he the sentence is true; for others it is

false, and so it is not a statement.

(d) Every continuous function is differentiable.” is a statement with truth value being false.

(e) “x < 1 ” is true for some values of x and false for some others. It is a statement if we have

some particular context in mind. Otherwise, it is not a statement.

(f) “Goldbach’s Conjecture Every even number greater than 2 is the sum of two prime numbers”

is a statement whose truth value is not known yet.

(g) “There are infinitely many prime numbers of the form 2n + 1, where n is a natural number.” is

another statement whose truth value is not known till now.

Every statement has a truth value, namely true (denoted by T) or false (denoted by F). We often use

p, q and r to denote statements, or perhaps p1 , p2 , · · · , pn if there are several statements involved.

Exercise 1.1. Which of the following sentences are statements?

(a) If x is a real number, then x2 ≥ 0.

(b) 11 is a prime number.

(c) This sentence is false.

The possible truth values of a statement are often given in a table, called a truth table. The truth

values for two statements p and q are given below. Since there are two possible truth values for

each of p and q, there are four possible combinations of truth values for p and q. It is customary to

consider the four combinations of truth values in the order of TT, TF, FT, FF from top to bottom.

p q

T T

(1.1) T F

F T

F F

1.3. Logical Connective

A logical connective (also called a logical operator) is a symbol or word used to connect two or

more statements such that the compound statement produced has a truth value dependent on the

respective truth values of the original statements.

We discuss some of the elementary logical operators (connectives) first.

4 1. Introduction to Logic

(1) Logical Negation

Logical negation is an operation on one logical value, typically, the value of a proposition,

that produces a value of true if its operand is false and a value of false if its operand is true.

The truth table for ¬A (also written as NOT A or ∼ A) is as follows:

A ¬A

(1.2) T F

F T

For example, consider the statement,

p : The integer 2 is even.

Then the negation of p is the statement

∼ p : It is not the case that the integer 2 is even.

It would be better to write,

∼ p : The integer 2 is not even.

Or better yet to write,

∼ p : The integer 2 is odd.

(2) Logical Conjunction

Logical conjunction is an operation on the values of two propositions, that produces a value

of true if and only if both of its operands are true. The truth table for A ∧ B (also written as

A AND B) is as follows:

A B A∧B

T T T

(1.3) T F F

F T F

F F F

In words, if both A and B are true, then the conjunction A ∧ B is true. For all other assignments

of logical values to A and to B the conjunction A ∧ B is false.

For example, consider the statements

p : The integer 2 is even.

q : 4 is less than 3.

The conjunction of p and q, namely,

p ∧ q : The integer 2 is even and 4 is less than 3,

is a false statement since q is false (even though p is true).

1.3. Logical Connective 5

(3) Logical Disjunction

Logical disjunction is an operation on the values of two propositions, that produces a value

of false if and only if both of its operands are false. The truth table for A ∨ B (also written as

A OR B) is as follows:

A B A∨B

T T T

(1.4) T F T

F T T

F F F

Thus for the statements p and q described earlier, the disjunction of p and q, namely,

p ∨ q : The integer 2 is even or 4 is less than 3,

is a true statement since at least one of p and q is true (in this case, p is true).

(4) Logical Implication

Logical implication is associated with an operation on the values of two propositions, that

produces a value of false only in the case that the first operand is true and the second operand

is false. The truth table associated with A ⇒ B is as follows:

A B A⇒B

T T

T

(1.5) T F F

F T T

F F T

The last row of the table may appear to be counterintuitive. Note, however, that the use of “if

· · · then ” as a connective is quite different from that of day-to-day language.

Consider the following example.

Example 1.3. Suppose your supervisor makes you the following promise:

“If you meet the month-end deadline, then you will get a bonus.”

Under what circumstances are you justified in saying that your supervisor spoke falsely?

The answer is: You do meet the month-end deadline and you do not get a bonus. Your

supervisor’s promise only says that you will get a bonus if a certain condition (you meet the

month-end deadline) is met; it says nothing about what will happen if the condition is not met.

So if the condition is not met, your supervisor did not lie (your supervisor promised nothing if

you did not meet the month-end deadline); so your supervisor told the truth in this case. Are

you convinced? Good! If not, let us then check the truth and falsity of the implication based

on the various combinations of the truth values of the statements

p: You meet the month-end deadline;

q: You get a bonus.

The given statement can be written as p ⇒ q.

6 1. Introduction to Logic

Suppose first that p is true and q is true. That is, you meet the month-end deadline and you

do get a bonus. Did your supervisor tell the truth? Yes, indeed. So if p and q are both true,

then so too is p ⇒ q, which agrees with the first row of the truth table of (1.5).

Second, suppose that p is true and q is false. That is, you meet the month-end deadline

and you did not get a bonus. Then your supervisor did not do as he / she promised. What your

supervisor said was false, which agrees with the second row of the truth table of (1.5).

Third, suppose that p is false and q is true. That is, you did not meet the month-end

deadline and you did get a bonus. Your supervisor (who was most generous) did not lie (your

supervisor promised nothing if you did not meet the month-end deadline); so he/she told the

truth. This agrees with the third row of the truth table of (1.5).

Finally, suppose that p and q are both false. That is, you did not meet the month-end

deadline and you did not get a bonus. Your supervisor did not lie here either. Your supervisor

only promised you a bonus if you met the month-end deadline. So your supervisor told the

truth. This agrees with the fourth row of the truth table of (1.5).

In summary, the implication p ⇒ q is false only when p is true and q is false.

A conditional (or implication) statement that is true by virtue of the fact that its hypothesis

is false is said to be vacuously true or true by default. Thus the statement: “If you meet the

month-end deadline, then you will get a bonus”is vacuously true if you do not meet the month-

end deadline!

**Example 1.4. Consider the expression 4 + 1 = 9 ⇒ 8 − 1 = 3. It may not be apparent as to
**

why this statement is assigned a truth value of T. But it is indeed true can be seen as follows.

4 + 1 − 4 = 9 − 4 = 5 so 1 = 5 and therefore 8 − 1 = 8 − 5 = 3.

(5) Logical Equality

Logical equality is an operation on the values of two propositions, that produces a value of

true if and only if both operands are false or both operands are true. The truth table for A ≡ B

is as follows:

A B A≡B

T T T

(1.6) T F F

F T F

F F T

So A ≡ B is true if A and B have the same truth value (both true or both false), and false if they

have different truth values.

**Definition 1.2. A compound statement (statement with connective) is said to be a tautology if it is
**

always true regardless of the truth value of the simple statements from which it is constructed. It is

a contradiction if it is always false. Thus a tautology and a contradiction are negation of each other.

1.3. Logical Connective 7

**Example 1.5. A ∨ (¬A) is a tautology, while A ∧ (¬A) is a contradiction.
**

A ¬A A ∨ (¬A) A ∧ (¬A)

(1.7) T F T F

F T T F

Example 1.6. [A ∧ (A ⇒ B)] ⇒ B is a tautology.

A B A ⇒ B A ∧ (A ⇒ B) [A ∧ (A ⇒ B)] ⇒ B

T T

T T T

(1.8) T F F F T

F T T F T

F F T F T

Definition 1.3.

(a) The converse of A ⇒ B is B ⇒ A.

(b) The inverse of A ⇒ B is ∼ A ⇒∼ B.

(c) The contrapositive of A ⇒ B is ∼ B ⇒∼ A.

Example 1.7. Write the converse, inverse and contrapositive of the statement in Example 1.3.

**Recall that the given statement can be written as p ⇒ q where p and q are the statements:
**

p: You meet the month-end deadline;

q: You get a bonus.

(a) The converse of this implication is q ⇒ p: If you get a bonus, then you have met the month-end

deadline.

(b) The inverse of this implication is ∼ p ⇒∼ q: If you do not meet the month-end deadline, then

you will not get a bonus.

(c) The contrapositive of this implication is ∼ q ⇒∼ p: If you do not get a bonus, then you will

not have met the month-end deadline.

The following theorem is extremely useful.

Theorem 1.1. (A ⇒ B) ⇔ (∼ B ⇒∼ A).

**Proof. Using Truth table,
**

A B A ⇒ B ∼ B ∼ A ∼ B ⇒∼ A

T T T F F T

(1.9) T F F T F F

F T T F T T

F F T T T T

The entries in third and sixth columns are identical.

8 1. Introduction to Logic

**Remark 1.1. It is an exercise to see that A ⇒ B is not logically equivalent to its converse, B ⇒ A.
**

One should avoid the very common mistake of claiming the opposite.

Example 1.8. Consider following two statements,

(A) Cornell is in Ithaca.

(B) Cornell is in NY state.

and the compound statements:

(a) Implication : A ⇒ B : If Cornell is in Ithaca, then Cornell is in NY state.

(b) Contrapositive : ∼ B ⇒∼ A : If Cornell is NOT in NY state, then Cornell is NOT in Ithaca.

(c) Converse : B ⇒ A : If Cornell is in NY state, then Cornell is in Ithaca.

Note that the converse statement is FALSE. This leads us to another important interpretation

of the implication A ⇒ B. It means that every time A is true, then B must be true. Hence A is a

sufficient condition for B. If we know that A is true then we can always conclude that B is also

true. The contrapositive ∼ B ⇒∼ A showed us that when B is not true then A cannot be true either.

Hence B is a necessary condition for A. If A is true we must necessarily have that B is true, because

if B isn’t true then A cannot be true either. Thus we have following ways of reading

A implies B,

If A then B,

(1.10) A⇒B:

A is sufficient for B,

B is necessary for A.

**Remark 1.2. Note that for equivalence relation (the if and only if) A ⇔ B, the implication goes
**

in both the directions. In this case A and B are necessary and sufficient conditions for each other.

A ⇔ B means that both the statement A ⇒ B and its converse B ⇒ A are true.

1.4. Quantifiers

In the previous sections, we learnt some definitions and basic properties of compound statements.

We were interested in whether a particular statement was true or false. This logic is called propo-

sitional logic or statement logic. However there are many arguments whose validity cannot be

verified using propositional logic. Consider, for example, the sentence

p : x is an even integer.

This sentence is neither true nor false. The truth or falsity depends on the value of the variable x.

For some values of x the sentence is true; for others it is false. Thus this sentence is not a statement.

However, let us denote this sentence by P(x), i.e.,

P(x) : x is an even integer.

Then, P(5) is false, while P(6) is true. To study the properties of such sentences, we need to extend

the framework of propositional logic to what is called first-order logic.

1.4. Quantifiers 9

**Definition 1.4. A predicate or propositional function is a sentence that contains a finite number of
**

variables and becomes a statement when specific values are substituted for the variables. The do-

main of a predicate variable is the set of all values that may be substituted in place of the variables.

In our earlier example, the sentence

P(x) : x is an even integer

**is a propositional function with domain D, the set of integers; since for each x ∈ D, P(x) is a
**

statement, i.e., for each x ∈ D, P(x) is true or false, but not both.

Example 1.9. The following are examples of predicate or propositional functions:

**(a) The sentence “P(x) : x + 3 is an even integer” with domain D the set of positive integers.
**

(b) The sentence “P(x) : x + 3 is an even integer” with domain D the set of integers.

(c) The sentence “P(x; y; z) : x2 + y2 = z2 ” with domain D the set of positive integers.

Before proceeding further, we introduce following notations. A more comprehensive list of

notation will be described later.

**∈ : “is an element of”,
**

: “such that”,

∧ : AND in the sense that A ∧ B means both Aand B,

∨ : OR in the sense that A ∨ B means either A or B or both

∀ : Universal “for all”

∃ : Existential “there exists” (one or more).

(a) The Universal Quantifier:

Let P(x) be a predicate with domain D. Then the sentence

Q(x) : for all x, P(x)

**is a statement. To see this, notice that either P(x) is true at each value x ∈ D (the notation x ∈ D
**

indicates that x is in the set D, while x ∈

/ D means that x is not in D) or P(x) is false for at least

one value of x ∈ D. If P(x) is true at each value x ∈ D, then Q(x) is true. However, if P(x) is

false for at least one value of x ∈ D, then Q(x) is false. Hence, Q(x) is a statement because it is

either true or false (but not both).

**Definition 1.5. Each of the phrases “every”, “for every”, “for each”, and “for all” is referred
**

to as the universal quantifier and is expressed by the symbol ∀. Let P(x) be a statement with

domain D. A universal statement is a statement of the form ∀x ∈ D, P(x). It is false if P(x) is

false for at least one x ∈ D; otherwise, it is true.

**Example 1.10. Let D be a set.
**

10 1. Introduction to Logic

The statement

∀x ∈ D, x > 0

means “For all x that are elements of D, x is positive.”

Example 1.11. Let P(x) be the predicate “P(x) : x2 ≥ x.”.

Determine whether the following universal statements are true or false.

(i) ∀x ∈ R; P(x);

(ii) ∀x ∈ Z; P(x);

( )2 ( )

(i) Let x = 12 ∈ R. Then, 12 = 14 < 12 , and so P 12 is false. Therefore, “∀x ∈ R; P(x)” is

false.

(ii) For all integers x, x2 ≥ x is true, and so P(x) is true for all ∀x ∈ Z. Hence,“∀x ∈ Z; P(x)”

is true.

(b) The Existential Quantifier:

Each of the phrases “there exists”, “there is”, “for some”, and “for at least one” is referred

to as the existential quantifier and is denoted in symbols ∃. Let P(x) be a predicate with domain

D. An existential statement is a statement of the form ∃x ∈ D such that P(x): It is true if P(x)

is true for at least one x ∈ D; otherwise, it is false.

Example 1.12. As before let D be a set.

The statement

∃x ∈ D, x > 0

tells us that “There exists an element x of D such that x is positive.”

Example 1.13. Let P(x) be the predicate “P(x) : x2 < x.”.

Determine whether the following existential statements are true or false.

(i) ∃x ∈ R; P(x);

(ii) ∃x ∈ Z; P(x);

(i) Let x = 12 ∈ R. Then, ( 12 )2 = 14 < 12 , and so P( 21 ) is true. Therefore, “∃x ∈ R; P(x)” is

true.

(ii) For all integers x, x2 ≥ x is true, and so there is no x ∈ Z such that P(x) is true. Hence,“∃x ∈

Z; P(x)” is false.

(c) Universal Conditional Statements

Recall that a conditional statement has a contrapositive, a converse, and an inverse. These

definitions can be extended to universal conditional statements. Consider a universal condi-

tional statement of the form ∀x ∈ D; P(x) ⇒ Q(x).

(i) Its contrapositive is the statement,

∀x ∈ D; ∼ Q(x) ⇒∼ P(x).

(ii) Its converse is the statement,

∀x ∈ D; Q(x) ⇒ P(x)

1.4. Quantifiers 11

**(iii) Its inverse is the statement,
**

∀x ∈ D; ∼ P(x) ⇒∼ Q(x).

Example 1.14. Write the contrapositive, converse, and inverse of the statement: If a real num-

ber is greater than 3, then its square is greater than 9.

Solution 1.2. Symbolically, the statement can be written as:

∀x ∈ R; if x > 3 then x2 > 9

Here P(x) is the statement x > 3 and Q(x) the statement x2 > 9.

(i) The contrapositive is:

∀x ∈ R; if x2 ̸> 9 then x ̸> 3,

or, equivalently,

∀x ∈ R; if x2 ≤ 9 then x ≤ 3.

(ii) The converse is:

∀x ∈ R; if x2 > 9 then x > 3.

Note that the converse is false; take, for example, x = −4. Then, (−4)2 > 9 is true but

−4 > 3 is false. Hence the statement if (−4)2 > 9 then −4 > 3 is false. Hence the

universal statement ∀x ∈ R; if x2 > 9 then x > 3 is false.

(iii) The inverse is:

∀x ∈ R; if x ̸> 3 then x2 ̸> 9,

or, equivalently,

∀x ∈ R; if x ≤ 3 then x2 ≤ 9.

(d) Order of quantifiers:

If the quantifiers are of the same type, the order in which they appear does not matter.

∀x, ∀y : x+y = y+x

∃x ∧ ∃y : x + y = 2 ∧ x + 2y = 3.

But if the quantifiers are of different types we have to be careful. For the set of real numbers,

the statement

(1.11) ∀x ∃y y > x

is TRUE, that is given any real number x, there is always a real number y that is greater than x.

But the statement

(1.12) ∃y ∀ x, y>x

is FALSE, since there is no fixed real number y that is greater than every real number.

Example 1.15. The statement [∃y ∈ U ∀x ∈ V, statement A] means that one y will make A

true regardless of what x is. The statement [∀x ∈ V, ∃y ∈ U statement A] means that A can be

made true by choosing y depending on x.

12 1. Introduction to Logic

**1.5. Rules of Negation of statements with quantifiers
**

Fact 1. The negation of a universal statement of the form ∀x ∈ D; P(x) is logically equivalent to an

existential statement of the form ∃x ∈ D; such that ∼ P(x). Symbolically,

∼ [∀x ∈ D; P(x)] ≡ ∃x ∈ D; such that ∼ P(x)

Consider the universal statement ∀x ∈ D; P(x). It is false if P(x) is false for at least one x ∈ D;

otherwise, it is true. Hence it is false if and only if P(x) is false for at least one x ∈ D, or, if and

only if ∼ P(x) is true for at least one x ∈ D. Thus the negation of this statement is the statement

∃x ∈ D such that ∼ P(x).

Example 1.16. What is the negation of the statement “All mathematicians wear glasses ”?

Solution 1.3. Let us write this statement symbolically. Let D be the set of all mathematicians and

let P(x) be the predicate “x wears glasses” with domain D. The given statement can be written as

∀x ∈ D; P(x). The negation is ∃x ∈ D such that ∼ P(x). In words, the negation is “There exists a

mathematician who does not wear glasses” or “Some mathematicians do not wear glasses”.

Fact 2. The negation of an existential statement of the form ∃x ∈ D such that P(x) is logically

equivalent to a universal statement of the form ∀x ∈ D; ∼ P(x). Symbolically,

∼ (∃x ∈ D such that P(x)) ≡ ∀x ∈ D; ∼ P(x).

Consider the existential statement, ∃x ∈ D such that P(x). It is true if P(x) is true for at least

one x ∈ D; otherwise, it is false. Hence it is false if and only if P(x) is false for all x ∈ D, in other

words, if and only if ∼ P(x) is true for all x ∈ D. Thus the negation of this statement is the statement

∀x ∈ D; ∼ P(x).

Example 1.17. What is the negation of the statement “Some politicians are honest”?

Solution 1.4. Let us write this statement symbolically. Let D be the set of all politicians and let

P(x) be the predicate “x is honest” with domain D. The given statement can be written as ∃x ∈ D

such that P(x). The negation is ∀x ∈ D; ∼ P(x). In words, the negation is “All politicians are not

honest” or “No politician is honest”.

Consider next the negation of a universal conditional statement. By the second Fact, we have

that ∼ (∀x ∈ D; (P(x) ⇒ Q(x))) ≡ ∃x ∈ D such that ∼ (P(x) ⇒ Q(x)). But the negation of an “if

p then q” statement is logically equivalent to an “p and not q” statement. Hence, ∼ (P(x) ⇒

Q(x)) ≡ P(x)∧ ∼ Q(x). Therefore we have the following fact:

Fact 3. The negation of a universal conditional statement of the form ∀x ∈ D; (P(x) ⇒ Q(x)) is

logically equivalent to the existential statement of the form ∃x ∈ D such that (P(x)∧ ∼ Q(x)).

Symbolically,

∼ (∀x ∈ D; (P(x) ⇒ Q(x))) ≡ ∃x ∈ D such that (P(x)∧ ∼ Q(x)).

Written less symbolically, this becomes

∼ (∀x ∈ D; if P(x) then Q(x)) ≡ ∃x ∈ D such that P(x) and ∼ Q(x).

1.5. Rules of Negation of statements with quantifiers 13

**1.5.1. More Examples. We can use the truth tables to prove following examples of negations.
**

∼ (A ∧ B) ⇔∼ A∨ ∼ B

∼ (A ∨ B) ⇔∼ A∧ ∼ B

∼ (x > y) ⇔ x 6 y

∼ (A ⇒ B) ⇔ A∧ ∼ B

∼ (∼ A) ⇔ A.

Try proving them (Good Exercise).

**1.5.2. Negation of statement with one quantifier. The universal statement in the Example 1.10
**

contains a universal quantifier term and the statement x > 0. To negate a universal statement we

need to find only one counterexample. In this example, if we can find just one x in D that is non

positive, we know that it is not true that all x are positive. Thus the negation of the universal

statement

∀x ∈ D, x > 0

is an existential statement,

∃x ∈ D, x 6 0.

To negate an existential statement we must show that every possible instance is false. The existen-

tial statement

∃x ∈ D, x > 0

is false if there are no positive elements of D. Thus the negation of the existential statement is a

universal statement

∀x ∈ D, x 6 0.

Insight from these examples can be generalized to rules of negation. Note that , such that always

follows ∃ (the existential quantifier).

**Rule 1.1. For negating the statement, [quantifier term, statement], first change the quantifier: ∀
**

becomes ∃, ∃ becomes ∀ and then negate the statement.

1.5.3. Negation with more than one quantifier.

**Rule 1.2. To negate a statement with a string of quantifiers, change the type of each quantifier,
**

preserve their order and negate the statement that follows the quantifiers.

Example 1.18. Statement:

(1.13) ∀ε > 0 ∃N ∀ n, if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε.

14 1. Introduction to Logic

Negation: ∃ε > 0 ∼ [∃N ∀ n, if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε],

or ∃ε > 0 ∀N, ∼ [ ∀ n, if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε],

(1.14) or ∃ε > 0 ∀N, ∃n ∼ [if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε],

or ∃ε > 0 ∀N, ∃n n > N and ∼ [∀ x ∈ D, fn (x) − f (x) < ε],

or ∃ε > 0 ∀N, ∃ n > N and ∃ x ∈ D, fn (x) − f (x) > ε.

1.6. Logical Equivalences

There are many fundamental logical equivalences that we often encounter. Several of these are

listed in Theorem below. We may find them to be useful for future reference.

Theorem 1.2. Let p, q and r be statements. Then the following logical equivalences hold.

(1) Commutative Laws

(i) p ∧ q ≡ q ∧ p;

(ii) p ∨ q ≡ q ∨ p.

(2) Associative Laws

(i) (p ∧ q) ∧ r ≡ p ∧ (q ∧ r);

(ii) (p ∨ q) ∨ r ≡ p ∨ (q ∨ r).

(3) Distributive Laws

(i) p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r);

(ii) p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r).

(4) De Morgans Laws

(i) ∼ (p ∨ q) ≡ (∼ p) ∧ (∼ q);

(ii) ∼ (p ∧ q) ≡ (∼ p) ∨ (∼ q).

(5) Idempotent Laws

(i) p ∧ p ≡ p;

(ii) p ∨ p ≡ p.

(6) Negation Laws

(i) p ∨ (∼ p) ≡ T ;

(ii) p ∧ (∼ p) ≡ F;

where T: True; F: False.

(7) Universal Bound Laws

(i) p ∨ T ≡ T ;

(ii) p ∧ F ≡ F.

(8) Identity Laws

(i) p ∨ F ≡ p;

(ii) p ∧ T ≡ p.

(9) Double Negation Law ∼ (∼ (p)) ≡ p.

1.7. Some Math symbols and Definitions 15

**The De Morgans Laws can be expressed in words as under: “The negation of an and statement is
**

logically equivalent to the or statement in which each component is negated, while the negation of

an or statement is logically equivalent to the and statement in which each component is negated.”

**1.7. Some Math symbols and Definitions
**

This is a very brief list of some of the mathematical shorthand that will be used in this course and

in the first year courses. Some of these symbols will be explained in more detail as we go.

Operator Meaning

∀ For all, for every, for each

∃ There exists, there is

∈ In, a member of

∋ Owns, contains

∨ Or

∧ And

∴ Therefore

∼ or ¬ Not

0/ Empty set

⊂ Subset, is a subset of

⊃ Contains the set

∪ Union (of sets)

∩ Intersection (of sets)

⇒ Implies

⇐⇒ or iff If and only if, each implies the other

s.t., |, or : Such that

Q.E.D. Quod erat demonstrandum (Proof complete)

**Next we define some of the commonly used mathematical terms.
**

(a) Theorem A statement which can be demonstrated to be true by accepted mathematical

operations and arguments.

In general, a theorem is an embodiment of some general principle that makes it part of a

larger theory. The process of showing a theorem to be correct is called a proof.

(b) Proposition A statement which is required to be proved.

(c) Axiom A proposition regarded as self-evidently true without proof. The word “axiom” is

synonym for postulate.

(d) Corollary An immediate consequence of a result already proved. Corollaries usually state

more complicated theorems in a language simpler to use and apply.

(e) Lemma A short theorem used in proving a larger theorem.

16 1. Introduction to Logic

**(f) Hypothesis A hypothesis is a proposition that is consistent with known data, but has been
**

neither verified nor shown to be false.

(g) Definition Tells us how or what things are.

Chapter 2

Proof Techniques

2.1. Methods of Proof

A proof is a method of establishing the truthfulness of an implication. An example would be to

prove a proposition of the form, “If H1 , · · · , Hn , then T. ”. The statements H1 , · · · , Hn are referred to

as hypotheses of the proof and proposition T is referred to as the conclusion. A formal proof would

consist of a sequence of valid propositions ending with the conclusion T. By valid proposition,

we mean the proposition in the sequence must either be one of the hypotheses H1 , · · · , Hn , or an

axiom, a definition, a tautology or a proposition proved earlier, or it must be derived from previous

propositions using either logical implication or substitution.

Before we present proof techniques, we describe some elementary definitions in number theory.

**Definition 2.1. An integer n is even if and only if n = 2k for some integer k. An integer n is odd if
**

and only if n = 2k + 1 for some integer k.

Using the quotient-remainder theorem, we can show that every integer is either even or odd.

**Definition 2.2. An integer n is prime if and only if n > 1 and for all positive integers r and s, if
**

n = r · s then r = 1 or s = 1. An integer n is composite if and only if n = r · s for some positive

integers r and s, with r ̸= 1 and s ̸= 1.

**First three prime numbers are 2, 3, and 5. First six composite numbers are 4, 6, 8, 9, 10 and 12.
**

Every integer greater than 1 is either prime or composite since the two definitions are negations of

each other.

**Definition 2.3. Two integers m and n are said to be of the same parity if m and n are both even or
**

are both odd, while m and n are said to be of the opposite parity if one of m and n is even and the

other is odd. Two integers are consecutive if one is one more than the other.

17

18 2. Proof Techniques

**Integers 2 and 8 are of same parity while 5 and 10 are of opposite parity.
**

Definition 2.4. Let n and d be integers with d ̸= 0. Then n is said to be divisible by d if n = d · k for

some integer k. In such case we say that n is a multiple of d, or d is a factor of n, or d is a divisor

of n, or d divides n.

**The notation “d|n” is read as “d divides n”.
**

We discuss following techniques of writing proofs. Our emphasis here will be on showing how

each of them is used through several examples.

2.2. Trivial Proofs

Let P(x) and Q(x) be statements with domain D. If Q(x) is true for every x ∈ D, then the universal

statement

∀x ∈ D, P(x) → Q(x)

is true regardless of the truth value of P(x). Such a proof is called a trivial proof.

Claim 2.1. For x ∈ R, if x > −3, then x2 + 1 > 0.

**Proof. Consider the two statements P(x) : x > −3 and Q(x) : x2 + 1 > 0. Since x2 ≥ 0 for every
**

x ∈ R, it follows that x2 + 1 ≥ 0 + 1 > 0 for every x ∈ R. Thus P(x) → Q(x) is true for every x ∈ R

and hence for x > −3.

Claim 2.2. If n is an odd integer, then 6n3 + 4n + 3 is an odd integer.

**Proof. Since 6n3 + 4n + 3 = 2(3n3 + 2n + 1) + 1 where 3n3 + 2n + 1 ∈ Z (i.e. 6n3 + 4n + 3 = 2k + 1
**

where k = 3n3 + 2n + 1 ∈ Z), the integer 6n3 + 4n + 3 is odd for every integer n.

Observe the fact that 6n3 + 4n + 3 is odd does not depend on n being odd. It would have been

better to replace the statement of the claim by “if n is an integer, then 6n3 + 4n + 3 is odd.”

2.3. Vacuous Proofs

Let P(x) and Q(x) be the statements with domain D. If P(x) is false for all every x ∈ D, then the

universal statement

∀x ∈ D, P(x) → Q(x)

is true regardless of the truth value of Q(x). Such a proof is called vacuous proof.

Claim 2.3. For x ∈ R, if x2 − 2x + 1 < 0, then x > 1.

**Proof. Let P(x) : x2 − 2x + 1 < 0 and Q(x) : x > 1. Since x2 − 2x + 1 = (x − 1)2 ≥ 0 for every
**

x ∈ R, we have (x − 1)2 < 0 is false for every x ∈ R. Hence, P(x) is false for every x ∈ R. Thus,

P(x) → Q(x) is true for every x ∈ R.

2.4. Proof by Construction 19

2.4. Proof by Construction

In a proof by construction we work straight from the set of assumptions.

Example 2.1. Consider a function

(2.1) f (n) = n2 + n + 17,

**where n ∈ N. If we evaluate this function, it seems that we always get a prime number. For
**

instance

f (1) = 19

f (2) = 23

f (3) = 29

f (15) = 257.

We can verify that all these numbers are prime. Then we might conjecture that

Conjecture 1. The function f (n) = n2 + n + 17 generates prime numbers for all n ∈ N.

**Drawing such a conclusion is an example of inductive reasoning. It is important to note that
**

we have NOT proved the conjecture made above in the example. In fact, this conjecture is false.

Take n = 17, f (17) = 172 + 17 + 17 = 17 · 19 which is not a prime number.

Example 2.2. Let NE be the set of even natural numbers and NO be the set of odd numbers.

**We want to show that (i) the sum of two even numbers is even,
**

∀x, y ∈ NE , x + y ∈ NE

and (ii) the sum of an odd number and an even number is odd

∀x ∈ NE , ∀y ∈ NO , x + y ∈ NO .

Proof. (By construction)

(i) Let

x, y ∈ NE ⇔ ∃m, n ∈ N x = 2m ∧ y = 2n,

x + y = 2m + 2n = 2 (m + n) ∈ NE since m + n ∈ N.

(ii) Let

x ∈ NE ⇔ ∃m ∈ N x = 2m, y ∈ NO ⇔ ∃n ∈ N y = 2n + 1,

x + y = 2m + 2n + 1 = 2 (m + n) + 1, where m + n ∈ N ⇒ x + y ∈ NO .

Example 2.3. Consider function g (n, m)

20 2. Proof Techniques

g (n, m) = n2 + n + m where m, n ∈ N.

g (1, 2) = 12 + 1 + 2 = 22

g (2, 3) = 22 + 2 + 3 = 32

g (12, 13) = 122 + 12 + 13 = 132

On the basis of above, we can form a conjecture,

Conjecture 2.

(2.2) ∀n ∈ N, g (n, n + 1) = (n + 1)2 .

It turns out that this conjecture is true.

Proof. By construction.

g (n, n + 1) = (n)2 + n + (n + 1)

= n2 + 2n + 1

= (n + 1)2 .

Having proved the general statement, we know that

g (15, 16) = 162 .

This is an example of deductive reasoning.

Example 2.4. Show that if x is odd then x2 is odd.

**Proof. By construction. Let x > 1.
**

x ∈ NO ⇔ ∃n ∈ N x = 2n + 1,

x2 = (2n + 1)2

= 4n2 + 4n + 1

( )

= 2 2n2 + 2n + 1

⇒ x 2 ∈ NO .

For x = 1, x2 = 1 which is odd.

Example 2.5. If the sum of two integers is even, then so is their difference.

**Proof. Assume that the integers m and n are such that m + n is even. Then m + n = 2k for some
**

integer k. So, m = 2k − n and m − n = 2k − n − n = 2(k − n) = 2l, where l = k − n is an integer.

Thus m − n is even.

2.5. Proof by Contraposition 21

2.5. Proof by Contraposition

Note that A ⇒ B is not logically equivalent to its converse statement B ⇒ A. It is possible for an

implication to be false while its converse is true. Hence we cannot prove A ⇒ B by showing B ⇒ A.

Example 2.6. The implication

m2 > 0 ⇒ m > 0

is false but its converse

m > 0 ⇒ m2 > 0

is true.

To show that A ⇒ B, we can instead show that ∼ B ⇒∼ A. We have already shown before that

implication and its contrapositive are logically equivalent.

Example 2.7. Consider a theorem.

**“If 7m is an odd number then m is an odd number.”
**

Its contrapositive is “If m is not an odd number, then 7m is not an odd number.”, or, equivalently,

“If m is an even number, then 7m is an even number.”

We are talking about integers here. Using contrapositive, we can construct a proof of theorem

as under:

Proof.

m ∈ NE ⇔ ∃k ∈ N m = 2k,

7m = 7 (2k) = 2 (7k) , 7k ∈ N ⇒7m ∈ NE .

This is much easier than trying to show directly that 7m being odd implies that m is odd.

Example 2.8. Show that if x2 is even, then x is even.

(2.3) x 2 ∈ NE ⇒ x ∈ NE

Its contrapositive is

(2.4) x ∈ NO ⇒ x 2 ∈ NO

**This we have already shown in an example above.
**

22 2. Proof Techniques

2.6. Proof by Contradiction

To prove that statement C is true, try supposing ∼ C is true and then show that this leads to a

contradiction. To show that A ⇒ B we can use

(2.5) ∼ (A ⇒ B) ⇔ A∧ ∼ B.

So assume A to be true and show ∼ B is false. Hence A∧ ∼ B is false. So A ⇒ B is true.

**Example 2.9. In the last example,
**

x 2 ∈ NE ⇒ x ∈ NE .

We can prove the statement by contradiction as follows.

**Proof. Assume x2 is even and x is odd.
**

x2 ∈ NE ⇔ ∃m ∈ N x2 = 2m.

x ∈ NO ⇔ ∃n ∈ N x = 2n + 1

⇒ x2 = 4n2 + 4n + 1, which is odd.

This contradicts initial assumption that x2 is even.

Example 2.10. There is no greatest integer.

**Proof. Assume, to the contrary, that there is a greatest integer, say N. Then, N ≥ n for every integer
**

n. Let m = N + 1. Now m is an integer since it is the sum of two integers. Also, m > N. Thus, m is

an integer that is greater than the greatest integer, which is a contradiction. Hence our assumption

that there is a greatest integer is false. Thus there is no greatest integer.

For next example, we define the rational number first.

**Definition 2.5. A real number r is rational number if r = mn for some integers m and n with n ̸= 0.
**

A real number that is not a rational number is called an irrational number.

Example 2.11. There is no smallest positive rational number.

**Proof. Assume, to the contrary, that there is a least positive rational number x. Then, x ≤ y for
**

every positive rational number y. Consider the number 2x . Since x is a positive rational number,

so too is 2x . Multiplying both sides of the inequality 12 < 1 by x, which is positive, gives 2x < x.

Hence, 2x is a positive rational number that is less than x, which is a contradiction. Hence our

assumption that there is a least positive rational number is false. Thus there is no least positive

rational number.

**Example 2.12. The sum of a rational number and an irrational number is irrational.
**

2.6. Proof by Contradiction 23

**Proof. Assume, to the contrary, that there exists a rational number p and an irrational number q
**

whose sum is a rational number. Thus, by definition of rational numbers, p = ab and p + q = r = dc

for some integers a; b; c and d with b ̸= 0 and d ̸= 0. Hence,

c a bc − ad

q = r− p = − =

d b bd

Now, bc − ad ∈ Z and bd ∈ Z since a; b; c and d ∈ Z. Since b ̸= 0 and d ̸= 0, bd ̸= 0. Hence,

r ∈ Q, which is a contradiction. Hence our assumption that there exists a rational number and an

irrational number whose sum is a rational number is false. Thus, the sum of a rational number and

an irrational number is irrational.

√

We end this section with a proof of the classical result that 2 is irrational.

√

Example 2.13. The real number 2 is irrational.

√

Proof. Assume, to the contrary, that 2 is rational. Then,

√ m

2=

n

where m; n ∈ Z and n ̸= 0. By dividing m and n by any common factors, if necessary, we may

further assume that m and n have no common factors, i.e., mn has been expressed in (or reduced to)

2

lowest terms. Then, 2 = mn2 , and so m2 = 2n2 . Thus, m2 is even. Hence, m is even, and so m = 2k,

where k ∈ Z. Substituting this into our earlier equation m2 = 2n2 , we have (2k)2 = 2n2 , and so

4k2 = 2n2 . Therefore, n2 = 2k2 . Thus, n2 is even, and so n is even. Therefore each of m and n has

2 as a factor, which contradicts our assumption that m ̸= n has been reduced to lowest terms and√

therefore that m and n have no common

√ factors. We deduce, therefore, that our assumption that 2

is rational is incorrect. Hence, 2 is irrational.

Exercise 2.1. The square root of any prime number is irrational.

Remark 2.1. One should be very careful when writing proof by contradiction. Here is a very

strong word of caution which can be found in ?, page 3.

**“All students are enjoined in the strongest possible terms to eschew proofs by contradiction!
**

There are two reasons for the prohibition: First such proofs are very often fallacious, the contra-

diction on the final page arising from an erroneous deduction on an earlier page, rather than from

the incompatibility of p with ¬q. Second, even when correct, such a proof gives little insight into

the connection between p and q whereas both the direct proof and the proof by contraposition con-

struct a chain of argument connecting p and q. One reason why mistakes are so much more likely

in proofs by contradiction than in direct proofs is that in a direct proof (assuming the hypotheses is

not always false) all deduction from the hypothesis are true in those cases where hypothesis holds.

One is dealing with true statements, and one’s intuition and knowledge about what is true help to

keep one from making erroneous statements. In proofs by contradiction, however, you are (assum-

ing the theorem is true) in the unreal world where any statement can be derived, and so the falsity

of a statement is no indication of an erroneous deduction.”.

24 2. Proof Techniques

2.7. Proof by Induction

A proof by induction involves three steps.

(a) Base of induction. Check for n = 1, whether the statement is true.

(b) Inductive transition: Assume that the statement is true for some n and show that it is also true

for n + 1.

(c) Inductive conclusion: The statement is true for all n > 1.

Example 2.14. Show that if f (x) = xn , then f ′ (x) = nxn−1 for n ∈ N.

Proof. By Induction.

(a) Base of induction:

(2.6) f (x) = x, f ′ (x) = 1 = x0 = 1 · x1−1

(b) Inductive transition:

Assume that for

(2.7) f (x) = xn , f ′ (x) = nxn−1

then for

f (x) = xn+1 = xn · x,

f ′ (x) = nxn−1 · x + xn · 1

= nxn + xn

(2.8) = (n + 1) xn

(c) Inductive conclusion:

∀n ∈ N if f (x) = xn then f ′ (x) = n · xn−1 .

Example 2.15. Prove by induction that 7n − 4n is a multiple of 3, for n ∈ N.

Proof. (a) Base of induction:

(2.9) 7n − 4n = 7 − 4 = 3

Statement is true.

(b) Inductive transition:

2.7. Proof by Induction 25

**Assume that 7n − 4n = 3m where m ∈ N, then
**

7n+1 − 4n+1 = 7 · 7n − 4 · 4n

= 7 · 7n − 7 · 4n + 7 · 4n − 4 · 4n

= 7 · (7n − 4n ) + (7 − 4) · 4n

= 7 · (3m) + 3 · 4n

= 3 · (7m + 4n )

Since m and n are natural numbers, so is 7m + 4n . So 7n+1 − 4n+1 is a multiple of 3.

(c) Inductive conclusion:

7n − 4n is a multiple of 3, for all n ∈ N.

(n)

Example 2.16. Prove the Binomial Theorem : (a + b)n = ∑nk=0 k an−k bk by induction.

**Proof. (a) Base of induction:
**

For n = 1, the claim is trivially true.

(b) Inductive transition:

Assume that the Binomial Theorem holds true for n. Then

n ( )

n n−k k

(a + b)n+1

= (a + b)(a + b) = (a + b) ∑

n

a b

k=0 k

n ( ) n ( )

n n−k+1 k n n−k k+1

= ∑ a b +∑ a b

k=0 k k=0 k

n ( ) ( )

n n−k+1 k n+1 n

= ∑ a b +∑ an−l+1 bl by change of variable l = k + 1

k=0 k l=1 l − 1

( ) {( ) ( )} ( )

n

n n+1 n n n n+1

= a +∑ + an−l+1 bl + b

0 l=1 l l − 1 n

( ) {( )} ( )

n + 1 n+1 n+1 n+1 n + 1 n+1

= a +∑ an−l+1 l

b + b

0 l=1 l n+1

n+1 ( )

n + 1 (n+1)−k k

= ∑ a b

k=0 k

In the fifth line we have used the fact that ,

( ) ( ) ( )

n n n+1

+ = .

l l −1 l

It is a good exercise to verify this.

(c) Inductive conclusion:

The Binomial Theorem holds for all n ∈ N.

26 2. Proof Techniques

**Observe that in the inductive hypothesis of our proof above, we assume that P(k) is true for an
**

arbitrary, but fixed, positive integer k. We certainly do not assume that P(k) is true for all positive

integers k, for this is precisely what we wish to prove! It is important to understand that our aim

is to establish the truth of the implication If P(k) is true, then P(k + 1) is true. which together with

the truth of the statement P(1) allows us to conclude that an infinite number of statements (namely,

P(1), P(2),P(3), · · · ) are true.

Example 2.17. For every positive integer n,

n(n + 1)(2n + 1)

12 + 22 + · · · + n2 = .

6

n(n+1)(2n+1)

Proof. For every integer n ≥ 1, let P(n) be the statement P(n) : 12 + 22 + · · · + n2 = 6 .

(a) Base of induction:

When n = 1, the statement P(1) : 12 = 1(1+1)(2·1+1)

6 is certainly true since 1(1+1)(2·1+1)

6 =

6

6 = 1. This establishes the base case when n = 1.

(b) For every integer n > 1, let P(n) be the statement P(n) : 12 + 22 + · · · + n2 = n(n+1)(2n+1)

6 . For

the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 and assume

that P(k) is true; that is, assume that 12 + · · · + k2 = k(k+1)(2k+1)

6 . For the inductive step, we

need to show that P(k + 1) is true. That is, we show that

(k + 1)(k + 2)(2k + 3)

12 + 22 + · · · + k2 + (k + 1)2 = .

6

Evaluating the left-hand side of this equation, we have

12 + 22 + · · · + k2 + (k + 1)2 = (12 + 22 + · · · + k2 ) + (k + 1)2

k(k + 1)(2k + 1)

= + (k + 1)2 (by the inductive hypothesis)

6

k(k + 1)(2k + 1) 6(k + 1)2

= +

6 6

(k + 1)(2k2 + k + 6k + 6)

=

6

(k + 1)(2k2 + 7k + 6) (k + 1)(2k2 + 4k + 3k + 6)

= =

6 6

(k + 1)(k + 2)(2k + 3)

= ;

6

thus verifying that P(k + 1) is true.

(c) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1; that is,

n(n + 1)(2n + 1)

12 + 22 + · · · + n2 =

6

2.7. Proof by Induction 27

**is true for every positive integer n.
**

**Recall that in a geometric sequence, each term is obtained from the preceding one by multiply-
**

ing by a constant factor. If the first term is 1 and the constant factor is r, then the sequence is 1, r,

r2 , r3 , · · · , rn , · · · . The sum of the first n terms of this sequence is given by a simple formula which

we shall verify using mathematical induction. This is left as an exercise.

Induction can also be used to solve problems involving divisibility, as the next two example

illustrates.

Example 2.18. For all integers n ≥ 1, 22n − 1 is divisible by 3.

**Proof. We proceed by mathematical induction. When n = 1, the result is true since in this case
**

22n − 1 = 22 − 1 = 3 and 3 is divisible by 3. Hence, the base case when n = 1 is true. For the

inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 and assume that the

property holds for n = k, i.e., suppose that 22k − 1 is divisible by 3. For the inductive step, we must

show that the property holds for n = k + 1. That is, we must show that 22(k+1) − 1 is divisible by

3. Since 22k − 1 is divisible by 3, there exists, by definition of divisibility, an integer m such that

22k − 1 = 3m, and so 22k = 3m + 1. Now,

22(k+1) − 1 = 22k 22 − 1

= 4 · 22k − 1

= 4(3m + 1) − 1

= 12m + 3

= 3(4m + 1).

Since m ∈ Z, we know that 4m + 1 ∈ Z. Hence, 22(k+1) − 1 is an integer multiple of 3; that is,

22(k+1) − 1 is divisible by 3, as desired. Hence, by the principle of mathematical induction, the

property holds for all integers n ≥ 1.

**Induction can also be used to verify certain inequalities, as the next example illustrates.
**

Example 2.19. For all integers n ≥ 2,

√ 1 1 1

n < √ + √ +···+ √

1 2 n

.

**Proof. We proceed by mathematical induction. To show the inequality holds for n = 2, we must
**

show that

√ 1 1

2< √ +√ .

1 2

√ √

But √ √ if and1 only1 if 2 < 2 + 1 which is true if and only if 1 < 2. Since

this inequality is true

1 < 2 is true, so too is 2 < √1 + √2 . Hence the inequality holds for n = 2. This establishes the

28 2. Proof Techniques

base case. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 2 and

assume that the inequality holds for n = k, i.e., suppose that

√ 1 1 1

k < √ + √ +···+ √ .

1 2 k

For the inductive step, we must show that the inequality holds for n = k + 1. That is, we must show

that

√ 1 1 1 1

k +1 < √ + √ +···+ √ + √ .

1 2 k k+1

√ √ √ √ √

Since k > 2, k < k +√ √ so (multiplying both sides by k, k < √k k + 1. Hence

1, and √(adding

1√to both sides), k + 1 < k k + 1 + 1; and so (dividing both sides by k + 1 we have k + 1 <

1

k + √k+1 . Hence, by the inductive hypothesis,

√ 1 1 1 1

k +1 < √ + √ +···+ √ + √ ;

1 2 k k+1

as desired. Hence, by the principle of mathematical induction, the inequality holds for all integers

n > 2.

**2.8. Additional Notes on Proofs
**

To prove a universal statement

(2.10) ∀x ∈ D, p (x)

we let x represent an arbitrary element of the set D and then show that statement p (x) is true. The

only properties we can use about x are those that apply to all elements of D. For example, if the set

D consists of the natural numbers, then we cannot assume x to be odd as not all natural numbers

are odd. To prove an existential statement,

(2.11) ∃x ∈ D, p (x)

all we need to do is to show that there exists at least one member of D for which p (x) is true. We

show these techniques through following examples.

Example 2.20. For every ε > 0, there exists a δ > 0 such that

(2.12) 1 − δ < x < 1 + δ ⇒ 5 − ε < 2x + 3 < 5 + ε

In this example we are asked to prove that the statement is true for each positive number ε. We

begin with an arbitrary ε and use it to find a δ which is positive and has the property that the

implication holds true. We give a particular value of δ which could possibly depend on ε and show

that the statement is true.

2.8. Additional Notes on Proofs 29

**Proof. Let ε > 0 be arbitrary and let δ = 2ε . Note that δ > 0.
**

1−δ < x < 1+δ

ε ε

1− < x < 1+

2 2

2 − ε < 2x < 2 + ε

5 − ε < 2x + 3 < 5 + ε.

**In some cases, it is possible to prove an existential statement in an indirect way without actually
**

producing any specific element of the set. One indirect method is to use contrapositive and another

is to use a proof by contradiction. Consider following example to show this aspect.

Example 2.21. Let f be a continuous function.

If

∫1

(2.13) f (x) dx ̸= 0,

0

then there exists a point x ∈ [0, 1] such that

f (x) ̸= 0.

**Proof. The contrapositive implication can be written as
**

∫1

(2.14) If ∀x ∈ [0, 1] f (x) = 0, then f (x) dx = 0

0

This is lot easier to prove. Instead of having to conclude the existence of an x having a particular

property, we are given that all x have a different property. The proof follows directly from the

definition of the integral, since each of the terms in any Riemann sum will be zero.

1

Example 2.22. Let x be a real number. If x > 0 then x > 0.

**Proof. Note that p ⇒ q is equivalent to (p∧ ∼ q) ⇒ contradiction. We begin by assuming x > 0
**

and

1

(2.15) 6 0.

x

Since x > 0, we can multiply both sides by x.

( )

1

(2.16) (x) 6 (x) · 0 or 1 6 0.

x

This is a contradiction.

**Consider the proof of the following existential statement.
**

30 2. Proof Techniques

**Claim 2.4. There exist irrational numbers a and b such that ab is rational.
**

√ √2

Proof. Consider the real number, 2 . This number is either rational or irrational. We consider

each case in turn.

√ √2 √ √

(1) 2 is rational. Let a = 2 and b = 2. Thus a and b are irrational, and by assumption, ab

is rational.

√ √2 √ √2 √

(2) 2 √ is irrational. Let a = 2 and b = 2. Thus a and b are irrational. Moreover, ab =

√ 2 √ √ √ √ √

( 2 ) 2 = ( 2) 2· 2 = ( 2)2 = 2 is rational.

In both cases, we proved the existence of irrational numbers a and b such that ab is rational, and so

we have the desired result.

**We remark that as it stands, this proof does not enable us to pinpoint which of the two choices
**

of the pair (a, b) has the required property. In order to determine the correct choice of (a, b),

√ √2

we would need to decide whether 2 is rational or irrational. √It is not a constructive proof.

Following would be a constructive proof of this claim. Let a = 2 and b = log2 9. Then b is

an irrational number, for if it were rational, then log2 9 = mn where m and n are integers with no

common factor. This implies 2m = 9n which is a contradiction as 2m is an even number and 9n is

an odd number. This gives ab = 3 which is rational 1.

**2.9. Decomposition or proof by cases
**

Let P(x) be a statement. If x possesses certain properties, and if we can verify that P(x) is true

regardless of which of these properties x has, then P(x) is true. Such a proof is called a proof by

cases.

Some proofs naturally divide themselves into consideration of two or more cases. For example

positive integers are either even or odd. Real numbers are positive, negative or zero. It may be that

different arguments are required for each case.

More rigorously, suppose we want to prove that p ⇒ q, and that p can be decomposed into two

disjoint propositions p1 , p2 such that p1 ∧ p2 is a contradiction. Then p ≡ (p1 ∨ p2 ) ∧ ¬(p1 ∧ p2 ) ≡

(p1 ∨ p2 ).

With this choice of p1 and p2 , we have,

(p ⇒ q) ⇔ (¬p ∨ q) ⇔ [¬(p1 ∨ p2 ) ∨ q]

⇔ [(¬p1 ∧ ¬p2 ) ∨ q] ⇔ [(¬p1 ∨ q) ∧ (¬p2 ∨ q)]

⇔ [(p1 ⇒ q) ∧ (p2 ⇒ q)].

This means that we only need to show that p1 ⇒ q and p2 ⇒ q. Note that this method works also

if we can decompose p into a number of propositions greater than 2 as far as these propositions are

1There is an extensive literature on constructive mathematics. You may like to do a google search for easy to read articles on the

subject. A classic reference is ?.

2.9. Decomposition or proof by cases 31

**mutually exclusive ( i.e., every pair of them is a contradiction). Following example illustrates this
**

technique.

Before going over some examples, we state the following theorem.

Theorem 2.1. (Quotient-Remainder Theorem) For every given integer n and positive integer d,

there exist unique integers q and r such that

n = d ·q+r and 0 ≤ r < d.

Definition 2.6. Let n be a nonnegative integer and let d be a positive integer. By the Quotient-

Remainder Theorem, there exist unique integers q and r such that n = d · q + r; where 0 ≤ r < d.

We define,

n div d = q (read as “n divided by d ”), and

n mod d = r (read as “n modulo d ”).

Thus n div d and n mod d are the integer quotient and integer remainder, respectively, obtained

when n is divided by d.

**Observe that given a nonnegative integer n and a positive integer d, we have that n mod d ∈
**

{0, · · · , d − 1} (since 0 ≤ r ≤ d − 1) and that n mod d = 0 if and only if n is divisible by d.

Result 2.1. Every integer is either even or odd.

**Proof. By the Quotient-Remainder Theorem with d = 2, there exist unique integers q and r such
**

that n = 2 · q + r and 0 ≤ r < 2. Hence, r = 0 or r = 1. Therefore, n = 2q or n = 2q + 1 for some

integer q depending on whether r = 0 or r = 1, respectively. In the case that n = 2q, the integer n

is even. In the other case that n = 2q + 1, the integer n is odd. Hence, n is either even or odd.

**Let Z denote the set of integers.
**

Example 2.23. If n ∈ Z, then n2 + 5n + 3 is an odd integer.

**Proof. We use a proof by cases, depending on whether n is even or odd.
**

(1) n is even.

Then, n = 2k for some integer k. Thus, n2 + 5n + 3 = (2k)2 + 5(2k) + 3 = 4k2 + 10k + 3 =

2(2k2 + 5k + 1) + 1 = 2m + 1, where m = 2k2 + 5k + 1. Since k ∈ Z, we must have m ∈ Z.

Hence, n2 + 5n + 3 = 2m + 1 for some integer m, and so the integer n2 + 5n + 3 is odd.

(2) n is odd.

Then, n = 2k + 1 for some integer k. Thus, n2 + 5n + 3 = (2k + 1)2 + 5(2k + 1) + 3 =

4k + 14k + 9 = 2(2k2 + 7k + 4) + 1 = 2m + 1, where m = 2k2 + 7k + 4. Since k ∈ Z, we must

2

**have m ∈ Z. Hence, n2 + 5n + 3 = 2m + 1 for some integer m, and so the integer n2 + 5n + 3 is
**

odd.

32 2. Proof Techniques

**Example 2.24. Let m, n ∈ Z. If m and n are of the same parity (either both even or both odd), then
**

m + n is even.

**Proof. We use a proof by cases, depending on whether m and n are both even or both odd.
**

(1) m and n are both even.

Then, m = 2k and n = 2l for some integers k and l. Thus, m + n = 2k + 2l = 2(k + l). Since

k + l ∈ Z, the integer m + n is even.

(2) m and n are both odd.

Then, m = 2k + 1 and n = 2l + 1 for some integers k and l. Thus, m + n = (2k + 1) + (2l +

1) = 2(k + l + 1). Since k + l + 1 ∈ Z, the integer m + n is even.

Example 2.25. Let n ∈ Z. If n2 is a multiple of 3, then n is a multiple of 3.

**Proof. We shall combine two proof techniques and use both a proof by contrapositive and a proof
**

by cases. Suppose that n is not a multiple of 3. We wish to show then that n2 is not a multiple of

3. By the Quotient-Remainder Theorem with d = 3, there exist unique integers q and r such that

n = 3 · q + r and 0 ≤ r < 3. Hence, r ∈ {0; 1; 2}. Therefore, n = 3q or n = 3q + 1 or n = 3q + 2

for some integer q depending on whether r = 0; 1 or 2, respectively. Since n is not a multiple of 3,

either n = 3q + 1 or n = 3q + 2 for some integer q. We consider each case in turn.

(1) n = 3q + 1 for some integer q.

Then, n2 = (3q + 1)2 = 9q2 + 6q + 1 = 3(3q2 + 2q) + 1, and so n2 is not a multiple of 3.

(2) n = 3q + 2 for some integer q.

Then, n2 = (3q + 2)2 = 9q2 + 12q + 4 = 3(3q2 + 4q + 1) + 1, and so n2 is not a multiple of

3.

Example 2.26. Let n ∈ Z. If n is an odd integer, then n2 = 8m + 1 for some integer m.

**Proof. We shall use both a direct proof and a proof by cases. Assume that n is an odd integer.
**

By the Quotient-Remainder Theorem with d = 4, there exist unique integers q and r such that

n = 4 · q + r and 0 ≤ r < 4. Hence, r ∈ {0; 1; 2; 3}. Therefore, n = 4q or n = 4q + 1 or n = 4q + 2

or n = 4q + 3 for some integer q depending on whether r = 0; 1; 2 or 3, respectively. Since n is

odd, and since 4q and 4q + 2 are both even, either n = 4q + 1 or n = 4q + 3 for some integer q. We

consider each case in turn.

(1) n = 4q + 1 for some integer q.

Then, n2 = (4q + 1)2 = 16q2 + 8q + 1 = 8(2q2 + q) + 1 = 8m + 1, where m = 2q2 + q.

Since q ∈ Z, we must have m ∈ Z. Hence, n2 = 8m + 1 for some integer m.

(2) n = 4q + 3 for some integer q.

2.9. Decomposition or proof by cases 33

**Then, n2 = (4q + 3)2 = 16q2 + 24q + 9 = (16q2 + 24q + 8) + 1 = 8(2q2 + 3q + 1) + 1 =
**

8m + 1, where m = 2q2 + 3q + 1. Since q ∈ Z, we must have m ∈ Z. Hence, n2 = 8m + 1 for

some integer m.

**We remark that the last conclusion can be restated as follows: For every odd integer n, we have
**

n2 mod 8 = 1. Here are some additional illustrative examples.

Example 2.27. If x is a real number, then

x 6 |x| .

**Recall the definition of absolute value:
**

{

x if x > 0

(2.17) |x| =

−x if x < 0.

Since this definition is divided into two parts, it makes sense to divide the proof also in two parts.

**Proof. Let x be an arbitrary real number. Then either x > 0 or x < 0. If x > 0, then by definition
**

|x| = x. If x < 0, then −x > 0, so that

x < 0 < −x = |x|

In either case,

x 6 |x| .

Chapter 3

Problem Set 1

**(1) Prove or give a counterexample for the following claims. Capital letters refer to propositions
**

or sets, depending on the context.

(a)

∼ (A ∧ B) ⇔ ∼ A ∨ ∼ B

(b)

∼ (A ∨ B) ⇔∼ A ∧ ∼ B.

(c)

∼ (A ⇒ B) ⇔ A ∧ ∼ B.

(d)

((A ∨ B) ⇒ C) ⇔ ((A ⇒ C) ∧ (B ⇒ C)).

(e) If n and n + 1 are consecutive integers, then both cannot be even.

(f) Give a counter example of the proposed statement: If n ∈ N then n2 > n.

(g) If x is odd then x2 is odd.

(2) Write the negation of the following statements

(a) If S is closed and bounded, then S is compact.

(b) If S is compact, then S is closed and bounded.

(c) If a function is continuous then it is differentiable.

(3) Find the contrapositive of

(a) If x2 ̸= 3 ∧ y2 > 5 then xy is a rational number.

(b) If x ̸= 0 then ∃y xy = 1.

(4) Find the mistake in the “proof”of the following results, and provide correct proofs.

(a) If m is an even integer and n is an odd integer, then 2m + 3n is an odd integer.

Proof. Since m is an even integer and n is an odd integer, m = 2k and n = 2k + 1 for some

integer k. Therefore, 2m + 3n = 2(2k) + 3(2k + 1) = 10k + 3 = 2(5k + 1) + 1 = 2l + 1;

35

36 3. Problem Set 1

**where l = 5k + 1. Since k ∈ Z, l ∈ Z. Hence, 2m + 3n = 2l + 1 for some integer l, whence
**

2m + 3n is an odd integer.

(b) For all integers n ≥ 1, n2 + 2n + 1 is composite.

Proof. Let n = 4. Then, n2 + 2n + 1 = 42 + 2(4) + 1 = 25 and 25 is composite.

(5) Prove the following claims:

(a) An integer that is not divisible by 2, cannot be divisible by 4. (Try proving this twice, once

with contraposition and once with contradiction).

(b) There is no greatest negative real number.

(c) The product of an irrational number and a nonzero rational number is irrational.

(6) Prove that for n ∈ N,

(a)

1 + 3 + 5 + · · · + (2n − 1) = n2 .

(b)

n(n + 1)

1+2+···+n = .

2

(c)

[ ]2

n(n + 1)

13 + 23 + · · · + n3 = .

2

(d) For q ̸= 1 and n > 1,

n−1

a − [a + (n − 1)r]qn rq(1 − qn−1 )

∑ (a + kr)qk = 1−q

+

(1 − q)2

k=0

(7) (Sum of a Geometric Sequence): For all integers n ≥ 0 and all real numbers r with r ̸= 1,

n

rn+1 − 1

∑ ri = r−1

i=0

What can we say when n → ∞ for arbitrary values of r? For what values of r is the sum well

defined? What is the sum for such values of r?

(8) (a) For all integers n ≥ 2, n3 − n is divisible by 6.

(b) For all integers n ≥ 3, 2n > 2n + 1.

(9) All prime numbers greater than 6 are either of the form 6n + 1 or 6n + 5, where n is some

natural number.

(10) If |9 − 5x| ≤ 11 then show that x ≥ − 25 and x ≤ 4.

Chapter 4

Set Theory, Sequence

4.1. Set Theory

4.1.1. Basic Definitions.

Definition 4.1. A set is a well-specified collection of elements.

**We define a set as a “well-specified collection”in order to emphasize that there must be a clear
**

rule or group of rules that determine membership in the set. Essentially all mathematical objects

can be gathered into sets: numbers, variables, functions, other sets, etc. Examples of sets can be

found everywhere around us. For example, we can speak of the set of all living human beings,

the set of all cities in Europe, the set of all propositions, the set of all prime numbers, and so on.

Each living human being is an element of the set of all living human beings. Similarly each prime

number is an element of the set of all prime numbers. If A is a set and a is an element of A, then

we write a ∈ A. If it so happens that a is not an element of A, then we write a ∈ / A. If S is the set

whose elements are s, t, and u, then we write S = {s;t; u}. The left brace and right brace visually

indicate the “bounds” of the set, while what is written within the bounds indicates the elements

of the set. For example, if S = {1; 2; 3; 5}, then 2 ∈ S, but 4 ∈ / S. Sets are determined by their

elements. The order in which the elements of a given set are listed does not matter. For example,

{1; 2; 3} and {3; 1; 2} are the same set. It also does not matter whether some elements of a given

set are listed more than once. For instance, {1; 2; 2; 2; 3; 3} is still the set {1; 2; 3}. Many sets are

given a shorthand notation in mathematics as they are used so frequently. A set may be defined by

a property. For instance, the set of all true propositions, the set of all even integers, the set of all

odd integers, and so on. Formally, if P(x) is a property, we write A = {x ∈ S : P(x)} to indicate that

the set A consists of all elements x of S having the property P(x). The colon : is commonly read as

“such that” and is also written as “| ”. So {x ∈ S|P(x)} is an alternative notation for {x ∈ S : P(x)}.

For a concrete example, consider A = {x ∈ R : x2 = 2}. Here the property P(x) is x2 = 2. Thus, A

is the set of all real numbers whose square is one.

37

38 4. Set Theory, Sequence

A B

Figure 4.1. Set B is a strict subset of set A: B⊂ A

**Definition 4.2. If A is a set, then B is a subset of A if every element of B is also an element of A.
**

We write B ⊆ A or A ⊇ B.

Definition 4.3. If A is a set, then B is a strict subset of A if every element of B is also an element

of A, and there exists at least one element of A which is not an element of B.

**We write B ⊂ A or A ⊃ B. In shorthand we could write these as: B is a subset of A if
**

b∈B⇒b∈A

and B is a strict subset of A if

b ∈ B ⇒ b ∈ A ∧ ∃a ∈ A s.t. a ∈

/ B.

Technically we should differentiate between subsets and strict subsets, but economists are usually

sloppy about this. In most courses you will see the operator ⊂ used for both, and you will not be

required to differentiate between the two concepts. Now let X be a universal set, such that we are

interested in subsets of this set.

Definition 4.4. The complement of the set A is the set Ac containing all elements not in A.

We write Ac = {x : x ∈

/ A}.

For the complement of a set to be clearly understood, we need to know what the relevant

universe is. For example, we can define the set J as all real numbers between 2 and 4, inclusive:

J = {x ∈ R | 2 ≤ x ≤ 4}1.

In this context, the set J c is the set of all real numbers strictly less than 2 or strictly greater than 4:

J c = {x ∈ R | x < 2 ∨ x > 4}.

The “universe” in this case is the set of real numbers. The complement of J doesn’t include all

mathematical objects not in J, nor does it include all numbers not in J (because complex numbers

are excluded). In most cases the universe is clear from the context.

1This can also be written as J = [2, 4], where the square brackets indicate the closed interval between the first entry and the second.

4.1. Set Theory 39

Set A

Set AC

Figure 4.2. Complement of Set A

Example 4.1. Some examples of sets are:

D = {2, 4, 10},

B = {x ∈ R s.t. x ≥ 10}

S = The set of all real-valued functions on R.

**4.1.2. A Few Common Sets.
**

R The real numbers

R+ Real numbers ≥ 0

R++ Real numbers > 0

Z The set of integers (−10, 0, 2, 451, etc.)

Z+ The set of integers ≥ 0 (also called N)

Z++ The set of integers > 0 (sometimes also called N)

Q The rational numbers (numbers that can be expressed as fractions)

C The complex numbers

0/ Empty set or null set

Ω The universal set

R2 The set of pairs of real numbers

**The last set R2 is shorthand notation for the Cartesian product R × R. This notation is accept-
**

able for any n ∈ Z++ number of sets. You will often encounter proofs and theorems defined on the

set Rn , which is the general way of describing the space of n-vectors, each element of which is a

real number (this is taking us ahead to linear algebra).

4.1.3. Set Operations.

40 4. Set Theory, Sequence

**Definition 4.5. Union : The union of n sets is the set containing all elements from all n sets. We
**

write

A ∪ B = {x : x ∈ A ∨ x ∈ B}.

n

∪ Ai = A1 ∪ A2 ∪ · · · ∪ An == {x : for some i = 1, · · · , n, x ∈ Ai }

i=1

Union of two sets: A ∪ B

A. B

**Definition 4.6. Intersection : The intersection of n sets is the set containing the elements common
**

to all n sets. We write

A ∩ B = {x : x ∈ A ∧ x ∈ B}.

n

∩ Ai = A1 ∩ A2 ∩ · · · ∩ An = {x : for all i = 1, · · · , n, x ∈ Ai }

i=1

Exercise 4.1. Let A1 , · · · , An be subsets of X. Then,

C C

∪

j=n ∩

j=n ∩

j=n ∪

j=n

A j = ACj ; A j = ACj .

j=1 j=1 j=1 j=1

Intersection of two sets: A ∩ B

A. B

**Definition 4.7. Exclusion : The exclusion of the set B from the set A is the set of all elements in
**

A that are, in addition, not elements of B. We write

A \ B = {x ∈ A | x ∈

/ B}.

4.1. Set Theory 41

A\B B\A

A B

Figure 4.3. Proposition 1: Sets A \ B and B \ A have empty intersection.

A−B

A. B

B−A

A. B

Proposition 1. (A \ B) ∩ (B \ A) = 0/

Proof.

A \ B = A ∩ BC ⊆ BC

B \ A = B ∩ AC ⊆ B

/

B ∩ BC = 0.

**Here is a pictorial representation (Fig. 4.1.3) of this proof.
**

42 4. Set Theory, Sequence

Exercise 4.2. Let B, and A1 , · · · , An be subsets of X. Then,

∪

j=n ∩

j=n ∩

j=n ∪

j=n

B− A j = (B − A j ); B− A j = (B − A j ).

j=1 j=1 j=1 j=1

**Next we consider the sets whose elements are sets themselves. For example, let A, B, and C be
**

subsets of X, then the collection A = {A, B,C} is a set, whose elements are A, B and C. We call a

set whose elements are subsets of X a family of subsets of X, or a collection of subsets of X. The

notation we follow would be, the lower case letters refer to the elements of X, upper case letters

refer to subsets of X and script letters refer to families of subsets of X.

**Any subset of empty set is empty. Observe that the empty set 0/ is a subset of X. It is possible to
**

/ In this case {0}

form a non-empty set whose only element is the empty set, i.e., {0}. / is a singleton.

Also 0/ ⊂ {0}

/ and 0/ ∈ {0}.

/

**There is a special family of subsets of X with a special name.
**

Definition 4.8. Let A be any subset in X. The power class of A or the power set of A is the family

of all subsets of A. We denote the power set of A by P (A).

Specifically,

P (A) = {B : B ⊆ A}

/ = {0},

The power set of the empty set is P (0) / i.e., the singleton of 0.

/ The power set of a singleton

P ({a}) = {0, / {a}}. Note that the power set of A always contains A and 0. / In general, if A is a

n

finite set with n elements, then P (A) contains 2 elements.

Exercise 4.3. Prove that if A is a finite set with n elements, then P (A) contains 2n elements.

4.2. Set Identities

There are a number of set identities that the set operations of union, intersection, and set difference

satisfy. They are very useful in calculations with sets. Below we give a table of such set identities,

where U is a universal set and A, B, and C are subsets of U.

• Commutative Laws: A ∪ B = B ∪ A ; A ∩ B = B ∩ A

4.2. Set Identities 43

• Associative Laws: (A ∪ B) ∪C = A ∪ (B ∪C) ; (A ∩ B) ∩C = A ∩ (B ∩C)

• Distributive Laws: A ∩ (B ∪C) = (A ∩ B) ∪ (A ∩C) ; A ∪ (B ∩C) = (A ∪ B) ∩ (A ∪C)

• Idempotent Laws: A ∪ A = A ; A ∩ A = A

• Absorption Laws: A ∩ (A ∪ B) = A ; A ∪ (A ∩ B) = A

• Identity Laws: A ∪ 0/ = A ; A ∩U = A

• Universal Bound Laws: A ∪U = U ; A ∩ 0/ = 0/

• De Morgan’s Laws: (A ∪ B)c = Ac ∩ Bc ; (A ∩ B)c = Ac ∪ Bc

• Complement Laws: A ∪ Ac = U ; A ∩ Ac = 0/

• Complements of U and 0/ : U c = 0/ ; 0/ c = U

• Double Complement Law: (Ac )c = A

• Set Difference Law: A \ B = A ∩ Bc

De Morgan’s Laws: (A ∩ B)c = Ac ∪ Bc

A. B

Exercise 4.4. Prove the following using only set identities:

(a) (A ∪ B) \C = (A \C) ∪ (B \C).

(b) (A ∪ B) \ (C \ A) = A ∪ (B \C).

/

(c) A ∩ (((B ∪Cc ) ∪ (D ∩ E c )) ∩ ((B ∪ Bc ) ∩ Ac )) = 0.

44 4. Set Theory, Sequence

**We will discuss additional concepts in set theory after we have gone over some elementary
**

exposition of functions and sequences.

4.3. Functions

**First we define a correspondence.
**

Definition 4.9. A correspondence consists of:

(a) A set D called the domain;

(b) A set R called the range; and

**(c) A mapping f (x) which assigns at least one element from R to each element x ∈ D.
**

Definition 4.10. A function consists of:

(a) A set D called the domain;

(b) A set R called the range; and

(c) A mapping f (x) which assigns exactly one element from R to each element x ∈ D.

**Here are some examples of functions.
**

f (x) = x3 , D = R, R=R

f (x) = 0, D = R, R = R.

The range need not be exhausted but the domain must be.

**The set of all functions is a strict subset of the set of all correspondences. This is the same as
**

saying that all functions are correspondences, but not the other way around. From here onwards it’s

critical that you specify the domain and the range when defining or using a function. For example

these two functions:

f : R → R such that f (x) = x2

g : R → R+ such that g(x) = x2

4.4. Vector Space 45

**are not the same function, even though in practice they produce identical results.2
**

Definition 4.11. The argument of a function is the element from the domain that is mapped into

the range and the value of a function is the element from the range that is the destination of the

mapping.

Definition 4.12. A real-valued function is a function whose range is the set R or any subset of R.

**From the above definition 4.12, the definitions of integer-valued functions, complex-valued
**

functions, etc., should be clear.

{ }

Definition 4.13. Let f : D → R and let A ⊆ D. We let f (A) represent the subset f (x) : x ∈ A

{ R. The set f (A)

of } is called the image of A in R. If B ⊆ R, we let f −1 (B) represent the subset

−1

x ∈ D : f (x) ∈ B of D. The set f (B) is called the pre-image of B in D.

**Note that the image of a function may be equivalent to the range, or it may be a strict subset of
**

the range. In the above example, the image of the function f is a strict subset of its range, but the

image of g is equal to its range.

4.4. Vector Space

**The vector space is defined over a field which is a set on which two operations + and · (called
**

addition and multiplication respectively) are defined. The formal definition of field is as follows:

Definition 4.14. A field F is a set on which two operations, called addition (+) and multiplication

(·), are defined so that for each pair of elements x, y in F there are unique elements x + y and x · y

in F, such that the following conditions hold for all a, b c in F.

**(i) Commutativity of addition and multiplication:
**

a + b = b + a, and a · b = b · a.

**(ii) Associativity of addition and multiplication:
**

(a + b) + c = a + (b + c), and (a · b) · c = a · (b · c)

**(iii) Existence of identity elements for addition and multiplication: There exists elements 0 and 1
**

in F such that

0 + a = a, and 1 · a = a

2The only difference between the two is that the range of f is all real numbers, and the range of g is the set of non-negative

real numbers. This is inconsequential, since the mapping in both cases takes all elements from the domain and assigns them to a

non-negative real number. But the two functions are still not the same.

46 4. Set Theory, Sequence

**(iv) Existence of inverses for addition and multiplication: For each element a in F and for each
**

non-zero element b in F, there exist elements c and d in F such that

a + c = 0, and b · d = 1

**(v) Distributivity of multiplication over addition:
**

a · (b + c) = a · b + a · c.

Example of fields include the set of real numbers R with the usual definitions of addition and mul-

tiplication, the set of rational numbers Q with the usual definitions of addition and multiplication.

**Definition 4.15. A vector space V over a field F consists of a set on which two operations, called
**

addition (+) and scalar multiplication (·), are defined so that for each pair of elements x, y in V

there is a unique element x + y in V , and for each element a in the field F and for each element x in

V , there is a unique element ax in V, such that the following conditions hold.

(i) Commutativity of addition:

∀x, y ∈ V, x + y = y + x

(ii) Associativity of addition:

∀x, y, z ∈ V, (x + y) + z = x + (y + z)

**(iii) Existence of additive identity:
**

∃ an element O ∈ V x + O = x ∀x ∈ V

**(iv) Existence of additive inverse:
**

∀x ∈ V ∃ some element y ∈ V x + y = O

**(v) Distributivity of scalar multiplication:
**

∀ α ∈ R, ∀ x, y ∈ V, α · (x + y) = (α · x) + (α · y)

(vi) Scalar distribution:

∀ α, β ∈ R, ∀ x ∈ V, (α + β) · x = α · x + β · x

**(vii) Scalar association:
**

∀ α, β ∈ R, ∀ x ∈ V, (αβ) · x = α · (β · x)

4.4. Vector Space 47

**(viii) Identity element for scalar multiplication:
**

1 · x = x ∀ x ∈ V.

**In order to show that any space is a vector space, we simply need to show that the properties in
**

the above definition are satisfied.

Definition 4.16. The Cartesian Product of sets A and B is the set of pairs (a, b) satisfying a ∈

A ∧ b ∈ B. We write

A × B = {(a, b) | a ∈ A ∧ b ∈ B}.

The Cartesian product is the two set case of the general “cross product” of sets, which is the

same concept defined for any number of sets. For example using sets A, B, C and D we could define

E = A × B × C × D, and a typical element of E would be (a, b, c, d) for some a ∈ A, b ∈ B, c ∈ C

and d ∈ D.

Example 4.2.

{ }

R3 = R × R × R = (x, y, z) | x ∈ R ∧ y ∈ R ∧ z ∈ R

R2+ = R+ ×R+ ; R2++ = R++ ×R++ .

**The order of the sets in the cross-product does matter as the following example shows.
**

Example 4.3. Let

A = {1, 2, 3} , B = {2, 4}

{ }

A × B = (1, 2) , (1, 4) , (2, 2) , (2, 4) , (3, 2) , (3, 4)

{ }

B × A = (2, 1) , (2, 2) , (2, 3) , (4, 1) , (4, 2) , (4, 3) .

(a) The nonzero vectors u and v are parallel if there exists a ∈ R such that u = av.

**(b) The vectors u and v are orthogonal or perpendicular if their scalar product is zero, that is, if
**

u · v = 0.

( )

uv

(c) The angle between vectors u and v is arccos ∥u∥·∥v∥ .

4.4.1. Metric.

Definition 4.17. A distance function is a real-valued function d : V ×V → R which satisfies

(i) Non-negativity,

∀ x, y ∈ V, d(x, y) > 0 with equality if and only if x = y

48 4. Set Theory, Sequence

(ii) Symmetry

∀x, y ∈ V, d(x, y) = d(y, x)

**(iii) Triangle Inequality
**

∀x, y, z ∈ V, d(x, z) 6 d(x, y) + d(y, z).

**Any function satisfying these three properties is a distance function. A distance function is also
**

called a metric. The space V with elements x, y, which would be called points, is a metric space if

we can associate a distance function to it.

Example 4.4.

(a) Euclidean Distance:

√

d (x, y) = (x1 − y1 )2 + · · · + (xn − yn )2

where V = Rn .

(b) Discrete metric: {

0 if x = y

d (x, y) =

̸ y

1 if x =

where V is any vector space.

(c) In V = R2

{ }

d (x, y) = max |x1 − y1 | , |x2 − y2 |

**(d) In space V if d(·, ·) is a metric then
**

d(x, y)

d1 (x, y) =

1 + d(x, y)

is also a metric. This allows us to construct any number of metric d(x, y) from any given metric.

4.4.2. Norm.

Definition 4.18. A norm is a real-valued function written ∥ · ∥: V → R, defined on vector space V ,

which satisfies

(i) Non-negativity:

∀x ∈ V, ∥ x ∥ > 0; with equality if only if x = 0,

4.4. Vector Space 49

(ii) Homogeneity:

∀x ∈ V, α ∈ R, ∥ α · x ∥ = | α | · ∥ x ∥,

**(iii) Triangle Inequality:
**

∀x, y ∈ V, ∥ x + y ∥ 6∥ x ∥ + ∥ y ∥ .

Example 4.5.

(a) Euclidean Norm:

√

∀x ∈ Rn , ∥x∥ = x12 + · · · + xn2

(b) Taxicab Norm:

n

∀x ∈ Rn , ∥x∥ = ∑ |xi |

i=1

4.4.3. Inner Product.

**Definition 4.19. An inner product is a real valued function ⟨·, ·⟩ : V × V → R, defined on vector
**

space V , which satisfies

(i) Symmetry:

∀x, y ∈ V, ⟨x, y⟩ = ⟨y, x⟩ ,

(ii) Positive definiteness:

∀x ∈ V, ⟨x, x⟩ > 0 with equality if and only if x = 0,

(iii) Bilinearity:

∀x, y, z ∈ V, ∀α, β ∈ R, ⟨αx + βy, z⟩ = ⟨αx, z⟩ + ⟨βy, z⟩ .

Example 4.6. V = Rn . Dot Product

∀x, y ∈ V, x · y = x1 y1 + · · · + xn yn .

Definition 4.20. A metric space (V, d) is a space V and a distance function d.

**A normed metric space (V, ∥·∥) is a metric space V and a norm ∥·∥. An inner product space
**

(V, ⟨·, ·⟩) is a space V and an inner product ⟨·, ·⟩.

50 4. Set Theory, Sequence

**4.4.4. Cauchy-Schwartz Inequality. The Cauchy-Schwarz inequality states that for all vectors x
**

and y of an inner product space,

|⟨x, y⟩|2 6 ⟨x, x⟩ · ⟨y, y⟩,

where ⟨·, ·⟩ is the inner product. Equivalently, by taking the square root of both sides, and referring

to the norms of the vectors, the inequality is written as

|⟨x, y⟩| 6 ∥x∥ · ∥y∥.

Moreover, the two sides are equal if and only if x and y are linearly dependent (or, in a geometrical

sense, they are parallel or one of the vectors is equal to zero).

**If x1 , · · · , xn ∈ R and y1 , · · · , yn ∈ R are any real numbers, the inequality may be restated in a
**

more explicit way as follows:

|x1 y1 + · · · + xn yn |2 6 (x12 + · · · + xn2 ) · (y21 + · · · + y2n ).

4.5. Sequences

**Definition 4.21. A sequence is a function
**

{xn } : N → Rm

that gives us an ordered infinite list of points in Rm .

**Another notation for sequence is ⟨xn ⟩ where ⟨xn ⟩ ≡ (x1 , x2 , · · · ). As we saw above, sets are
**

unordered collections of elements. Even if there is an intuitive ordering to the elements of a set,

with respect to the definition of the set itself there is no “first element” or “last element”. Sequences,

however, are sets for which the elements are assigned a particular order.

Example 4.7.

{ }

1

S1 = , n ∈ N is a sequence in R

n

( )

1

S2 = n , n ∈ N is a sequence in R2

n

**The interpretation of S1 is that the nth element of the sequence is givenby 1n . So we could also
**

( ) ( )

1 1

have written S1 = {1, 2 , 3 , 4 , · · · }. Similarly S2 =

1 1 1 1 , 2 , · · · . Note the implication

1 2

of this definition is that the elements of the sequence are numbered from 1 onwards, not from 0.

4.5. Sequences 51

It’s usually assumed in the first year courses that the first element of a sequence is numbered “1”

not “0”, but this need not always be the case. Note that order of appearance of elements matters

{1, 2, 3, 4, · · · } ̸= {2, 1, 3, 4, · · · }

and elements can be repeated,

S = {1, 1, 1, · · · } is a sequence.

**4.5.1. Convergence and Limits.
**

Definition 4.22. We say that x is a limit point of {xn } , n ∈ N, if

∀ε > 0 there exist infinite number of terms xn d (x, xn ) < ε.

Example 4.8. (a) Let xn = (−1)n . This sequence has two limit points: a = −1 and a = 1.

**(b) Let xn = sin( π·n
**

2 ). This sequence has three limit points: a = −1, 0, 1.

{ }

(c) The sequence 1, −1, 12 , −1, 13 , −1, · · · has two limit points 0 and −1.

n

(d) Let xn = n(−1) . This sequence has a limit point a = 0.

**(e) Let xn be a convergent sequence: xn → x as n → ∞. Then xn has a limit point x.
**

Definition 4.23. The sequence S converges to x (has a limit) x if

∀ ε > 0, ∃ N ∈ N such that d (xn , x) < ε ∀ n > N

**In this case we write
**

x = lim xn .

n→∞

**Definition 4.23 is a source of a lot of difficulty. However it’s one of the most important def-
**

initions in macroeconomic theory and in parts of micro, and it’s worth forcing yourself to fully

absorb it before the end of the Review. The intuition behind limits is not nearly as difficult as the

formal definition. A sequence converges to x if after choosing any very, very tiny number (ε), you

can identify a point in the sequence (N) after which all of the remaining members of the sequence

are no farther than ε from some particular value x. This concept is only well-defined for infinite

sequences. In most economic theory, the elements of a convergent sequence never actually reach

their limiting value. They simply get closer and closer to it as the sequence progresses.

1

Example 4.9. The sequence xn = n is a convergent sequence.

(Use claim 1

n → 0).

52 4. Set Theory, Sequence

**Proof. Let ε > 0 be given. We have to find N such that
**

∀n > N, d (xn , 0) = |xn | < ε ⇒ |xn | < ε

1 1

⇔ xn < ε ⇔ < ε ⇔ n > .

n ε

So by choosing N to be any natural number greater than 1ε , we have

1 1

∀n > N, d (xn , 0) = |xn | = < < ε.

n N

Definition 4.24. A sequence {xn } is bounded if

∃ B ∈ R such that d (xn , 0) 6 B, ∀ n ∈ N.

Definition 4.25. A sequence {xn } is unbounded if

∀ B ∈ R ∃ n ∈ N such that d (xn , 0) > B.

Example 4.10. The sequence {1, 0, 1, 0, · · · } is bounded. The sequence {xn } , xn = n, n ∈ N is

unbounded.

Definition 4.26. The tail of a sequence {xn } is the continuation of {xn } after some m ∈ N, that is

{xm+1 , xm+2 , · · · }.

Theorem 4.1. A sequence {xn } is bounded if and only if the tail of {xn } is bounded.

Proof. {xn } is bounded⇒ tail {xn } is bounded.

(TRIVIAL).

Now tail {xn } is bounded ⇒ {xn } is bounded.

**Fix some m. Then the tail {xn } is bounded. ⇔
**

∃ B such that |xn | < B, ∀ n > m.

Let

B′ = max {x1 , x2 , · · · , xm−1 , B} .

Then B′ is a bound for {xn },

∀n ∈ N, |xn | < B′ .

{ }∞

Definition 4.27. If {xn }∞

n=1 is a sequence, a subsequence xn(k) is obtained from {xn } by

k=1

crossing out some (possibly infinitely many) elements, while preserving the order.

4.5. Sequences 53

{ }

Example 4.11. Sequence: {xn } = 1, −1, 12 , −1, 13 , −1, · · · .

{ } { }

Subsequence: xn(k) = {−1, −1, −1, · · · } or 1, 12 , 13 , · · · .

**Definition 4.28. A sequence is monotone increasing if
**

∀n ∈ N, xn+1 > xn

and is monotone decreasing if

∀n ∈ N, xn+1 6 xn .

**Following claim on monotone sequences characterizes convergence property of monotone se-
**

quences.

Claim 4.1. Let {xn } be monotonic. Then it is convergent if and only if it is bounded.

Theorem 4.2. Bolzano-Weierstrass Theorem Every bounded sequence {xn } has a convergent sub-

sequence.

**Following proposition is useful in proving Bolzano-Weierstrass Theorem.
**

Proposition 2. Nested Interval Property Suppose that I1 = [a1 , b1 ], I2 = [a2 , b2 ], · · · , where I1 ⊇

I2 ⊇ · · · , and limn→∞ (bn − an ) = 0. Then there exists exactly one real number common to all

intervals In .

**Proof. Note that we have a1 < a2 < a3 · · · < an < · · · < bn < · · · < b2 < b1 . Then each bi is an
**

upper bound for the set A = {a1 ; a2 ; · · · }. In other words sequence {an } is monotone increasing

and bounded sequence. Therefore, limn→∞ an = a exists and a = sup{an } 6 {bk } for each natural

number k. Hence ak 6 a 6 bk for every k ∈ N or a is contained in each Ik . Now let b be contained

in In for all n ∈ N. Then an 6 b 6 bn for every n ∈ N or 0 6 (b − an ) 6 (bn − an ) for each n. Then

limn→∞ (b − an ) = 0. It follows that b = limn→∞ an = a, and so a is the only real number common

to all intervals.

Now we prove the Bolzano-Weierstrass Theorem as under.

**Proof. Let {xn }∞
**

n=1 be bounded. There is B ∈ R such that xn 6 B for all n ∈ N. We prove the

theorem in following steps.

**Step 1 We inductively construct a sequence of intervals I0 ⊇ I1 ⊇ I2 ⊇ · · · such that:
**

(i) In is a closed interval [an , bn ] where bn − an = 2B

2n ; and

(ii) {i : xi ∈ In } is infinite.

54 4. Set Theory, Sequence

**We let I0 = [−B, B]. This closed interval has length 2B and xi ∈ I0 for all i ∈ N. Suppose we
**

have In = [an , bn ] satisfying (i) and (ii). Let cn be the midpoint an +b n

2 . Each of the intervals

[an , cn ] and [cn , bn ] is half the length of In . Thus they both have length 12 · 2B

2n = 2n+1 . If xi ∈ In ,

2B

**then xi ∈ [an , cn ] or xi ∈ [cn , bn ], possibly both. Thus at least one of the sets {i : xi ∈ [an , cn ]}
**

or {i : xi ∈ [cn , bn ]} is infinite. If the first set is infinite, we let an+1 = an and bn+1 = cn . If

the second is infinite, we let an+1 = cn and bn+1 = bn . Let In+1 = [an+1 , bn+1 ]. Then (i) and

(ii) are satisfied. By the Nested Interval Property, there exists a ∈ ∩∞ n=1 In .

**Step 2 We next find a subsequence converging to a. Choose i1 ∈ N such that xi1 ∈ I1 . Suppose we
**

have in . We know that {i : xi ∈ In+1 } is infinite. Thus we can choose in+1 > in such that

xin+1 ∈ In+1 . This allows us to construct a sequence of natural numbers i1 < i2 < i3 < · · ·

where xin ∈ In for all n ∈ N.

**Step 3 We complete the proof by showing that the subsequence (xin )∞
**

n=1 converges to a. Let ε > 0.

Choose N such that ε > 2N . Suppose n > N. Then xin ∈ In and a ∈ In . Thus |xin − a| 6 2B

2B

2n 6

2B

2N

for all n > N.

Remark 4.1. Every bounded sequence {xn } has at least one limit point x̄.

Definition 4.29. A sequence {xn } is Cauchy sequence if

∀ ε > 0, ∃N, such that ∀n, m > N, d ( xn , xm ) < ε.

**After N, each element is close to every other element or in other words, the elements lie within
**

a distance of ε from each other.

Some properties of Cauchy sequence are

**(i) Every convergent sequence {xn } (with limit x, say) is a Cauchy sequence, since, given any
**

real number ε > 0, beyond some fixed point, every term of the sequence is within distance 2ε

of x, so any two terms of the sequence are within distance ε of each other.

**(ii) Every Cauchy sequence of real numbers is bounded (since for some N, all terms of the se-
**

quence from the N-th position onwards are within distance 1 of each other, and if M is the

largest absolute value of the terms up to and including the N-th, then no term of the sequence

has absolute value greater than M + 1).

**(iii) In any metric space, a Cauchy sequence which has a convergent subsequence with limit x is
**

itself convergent (with the same limit), since, given any real number ε > 0, beyond some fixed

4.5. Sequences 55

**point in the original sequence, every term of the subsequence is within distance 2ε of x, and
**

any two terms of the original sequence are within distance 2ε of each other, so every term of

the original sequence is within distance ε of x.

Theorem 4.3. Every sequence has at most one limit.

**Proof. By contradiction. We use the intuition that all points end up being close to say r1 and r2
**

at the same time which is not possible. Let sequence {xn } converge to two limits r1 and r2 . It is

enough to show that there is one ε for which this does not hold. Let us choose ε = d(r14,r2 ) = |r1 −r

4

2|

**or |r1 − r2 | = 4ε. Since r1 is a limit,
**

∃N1 , ∀n > N1 , |xn − r1 | < ε

and since r2 is a limit,

∃N2 , ∀n > N2 , |xn − r2 | < ε.

Let N = max {N1 , N2 }. Then

∀n > N, |xn − r1 | + |xn − r2 | < 2ε.

By triangle inequality,

4ε = |r1 − r2 | = (xn − r2 ) − (xn − r1 ) 6 |xn − r1 | + |xn − r2 | < 2ε

which is a contradiction.

Remark 4.2. A sequence can have more than one limit point.

4.5.2. Some Results on Sequences.

**(a) Every convergent sequence is bounded BUT a bounded sequence may not be convergent. For
**

example {1, −1, 1, −1, · · · }.

(b) If xn → x and yn → y, then

xn + yn → x + y

xn · yn → x · y,

and if yn ̸= 0, ∀n ∧ y ̸= 0,

xn x

→ .

yn y

**(c) Weak inequalities are preserved in the limit.
**

>

>

6 6

If {xn } → x and xn b, ∀n ∈ N, then x b.

>

>

<

6

56 4. Set Theory, Sequence

{ }∞

(d) x is a limit point of {xn } if and only if ∃ a subsequence xn(k) of the sequence {xn } such

{ } k=1

that xn(k) → x.

x1

x1

n

x2n x2

(e) Sequence of vectors {xn } = ∈ R converges to a limit {x} =

N if and only if

···

···

xN

xN

{ } n

xin → {xi } , ∀i = 1, 2, · · · , N.

**(f) Every convergent sequence is also a Cauchy sequence.
**

Definition 4.30. A vector space in which every Cauchy sequence has a limit is called a complete

vector space.

4.6. Sets in Rn

**Now we are ready for additional useful concepts in set theory. We begin with some definitions.
**

Definition 4.31. A set A on the real line is bounded if ∃B ∈ R ∀x ∈ A, ∥x∥ 6 B.

Theorem 4.4. For every non-empty bounded set A ⊂ R, ∃ a real number sup A such that

**(a) sup A is an upper bound for A,
**

∀ x ∈ A, x 6 sup A

**(b) If y is any upper bound for A then
**

y > sup A

or sup A is the least upper bound for A.

**Similarly inf A is the greatest lower bound.
**

Example 4.12. For the sets

A = [0, 1] , B = (0, 1) ,C = [0, 1), D = (0, 1]

sup = 1, inf = 0

This example shows that sup and inf of a set need not belong to the set. If sup belongs to the

set A, it is called max {A} and if inf {A} belongs to the set A, it is called min {A}.

4.6. Sets in Rn 57

r x

Br (x)

Figure 4.4. Open ball Br (x) in R2

**Definition 4.32. Point x is a limit point of a set A if every neighborhood of x contains a point of A
**

different from x : x is a limit point of A if

∀ε > 0, ∃y ∈ A, y ̸= x ∧ d (x, y) < ε.

Theorem 4.5. Bolzano-Weierstrass Theorem for sets Every bounded infinite set has at least one

limit point.

Example 4.13. For the set A = (0, 1), x = 0 is a limit point of the set A.

**This shows that limit point of a set need not belong to the set.
**

Theorem 4.6. Point x is a limit point of set A ⊆ Rn if ∃ a sequence

{xn } ∀n ∈ N, xn ̸= x ∧ xn ∈ A ∧ xn → x.

Definition 4.33. An open ball in Rn centered at x with radius r > 0 is

{ }

Br (x) = y ∈ Rn | d (x, y) < r .

Note that the open ball does not include its boundary points.

**Example 4.14. An open ball in R2 centered at x = (0, 0) with radius 1 is
**

{ }

y ∈ R | y1 + y2 < 1 .

2 2 2

**Definition 4.34. The set A is open if
**

∀x ∈ A, ∃r > 0 Br (x) ⊆ A.

**Around any point in an open set, one can draw an open ball which is completely contained in
**

the set.

Example 4.15. Following sets are open

58 4. Set Theory, Sequence

A = (0, 1) ∪ (5, 10) ;

/

B = (−∞, 0) ; R; 0.

Definition 4.35. The set A is closed if A contains all its limit points. (contains its borders)

Theorem 4.7. Set A ⊆ Rn is closed if and only if AC is open.

Example 4.16. Following sets are closed.

/

A = [2, 5] since AC = (−∞, 2) ∪ (5, ∞) is open.; R; 0.

There are two sets which are both open and closed. The empty set and the universal set. Empty set

0/ is open since

int 0/ = 0/

and 0/ is closed since

bd 0/ = 0/ ⊆ 0.

/

The universal set is complement of the empty set and so is both open and closed. There can be sets

which are neither open nor closed: A = (0, 1]. Following theorem characterizes the closed set using

convergent sequences.

Theorem 4.8. A set A ⊆ Rn is closed if and only if every convergent sequence of points {xn } ∈ A

has its limit x ∈ A.

Example 4.17. The budget set

{ }

B (p, I) = y ∈ Rn+ | p · y 6 I ,

where p ∈ Rn++ and I ∈ R++ , is closed.

**Proof. Take any sequence
**

{xn } xn ∈ B (p, I) ∀n ∧ xn → x.

xn > 0, ∀n ⇒ x > 0,

p · xn 6 I, ∀n ⇒ p · x 6 I

⇒ x ∈ B (p, I) ⇒ B (p, I) is closed.

Theorem 4.9.

4.6. Sets in Rn 59

Good 2 p1

|Slope| = p2

M

P2

0 M

Good 1

P1

Figure 4.5. Budget set B(p, I)

(a) The union of any number of open sets is open.

(b) The intersection of a finite number of open sets is open.

(c) A singleton set is a closed set.

(d) The union of a finite number of closed sets is closed.

**(e) The intersection of any number of closed sets is closed.
**

60 4. Set Theory, Sequence

Figure 4.6. A Non-convex Set

**Remark 4.3. The finite number of sets in (b) and (d) are necessary as following example shows.
**

( )

1 1 ∞

For (b) , An = − , , n ∈ N, ∩ An = [0] which is closed.

n n n=1

[ ]

1 ∞

For (d) , Bn = , 2 , n ∈ N, ∪ Bn = (0, 2] which is not closed.

n n=1

**Definition 4.36. A set A ⊆ Rn is compact if and only if A is closed and bounded.
**

Example 4.18.

A = [1, 2] is compact.

R is closed but not bounded. NOT compact.

B = (1, 2] is bounded but not closed. NOT compact.

Definition 4.37. A set A ⊆ Rn is compact if every sequence of points {xn } ∈ A has a limit point

x ∈ A.

Definition 4.38. A set A ⊆ Rn is convex if ∀x, y ∈ A, ∀λ ∈ (0, 1),

λx + (1 − λ) y ∈ A.

**It will be useful to draw some sets to differentiate between convex and non-convex sets.
**

4.6. Sets in Rn 61

Figure 4.7. Not a convex set

**Figure 4.8. A Convex Set
**

Chapter 5

Problem Set 2

**(1) Verify that the following are distance function or metric.
**

(a) The Manhattan distance, for x, y ∈ Rn

n

(5.1) d(x, y) = ∑ | xi − yi | ∀ x, y ∈ Rn

i=1

(b) For x, y ∈ R2 ,

(5.2) d(x, y) = max{| x1 − y1 |, | x2 − y2 |}

(c) Let d(·, ·) be a metric, then

d(x, y)

(5.3) d1 (x, y) = .

1 + d(x, y)

(2) Determine whether

∞ [ ]

∪ 1 2

(5.4) ,

n=1

n n

is compact.

**(3) Which of the following is true? Prove or give a counterexample.
**

(a) (A ∪ B)c ⊆ Ac ∪ Bc

(b) (A ∪ B)c ⊇ Ac ∪ Bc .

**(4) Define he set C as an ordered pair of real numbers (x1 , x2 ) as follows.
**

{ }

(5.5) C = (x1 , x2 ) ∈ R2 : x12 + x22 = 1

63

64 5. Problem Set 2

**If (a1 , a2 ) and (b1 , b2 ) are elements of C and c ∈ R, define
**

(a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 + b2 ), and c(a1 , a2 ) = (ca1 , ca2 ).

Is C a vector space? Justify your answer.

**(5) Let V denote the set of ordered pairs of real numbers. If (a1 , a2 ) and (b1 , b2 ) are elements of V
**

and c ∈ R define

(a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 − b2 ), and c(a1 , a2 ) = (ca1 , ca2 ).

Is V a vector space over R with these operations? Justify your answer.

**(6) Let V denote the set of ordered pairs of real numbers. If (a1 , a2 ) and (b1 , b2 ) are elements of V
**

and c ∈ R define

(a1 , a2 ) + (b1 , b2 ) = (a1 + 2b1 , a2 + 3b2 ), and c(a1 , a2 ) = (ca1 , ca2 ).

Is V a vector space over R with these operations? Justify your answer.

(7) Prove

(J ∩ K)c = J c ∪ K c

(J ∪ K)c = J c ∩ K c

**(8) Consider sequences {xn }∞ ∞ ∞ ∞
**

n=1 and {yn }n=1 such that {xn }n=1 → x and {yn }n=1 → y. Show that

{xn + yn }∞

n=1 → x + y.

**(9) Let xn = n, n ∈ N. Show that {xn }∞
**

n=1 is not convergent.

**(10) Prove that every Cauchy sequence {xn } is bounded.
**

{ }

(11) Prove that the sequence {xn } = 2 − 1n : n ∈ N is not convergent to 1.

(12) Prove or disprove: A monotone sequence is convergent if and only if it is bounded.

**(13) Determine whether the following sets are open, closed, neither or both:
**

(i) S = (0, 1);

(ii) S = [0, 1];

(iii) S = R;

(iv) S = [0, 1).

Chapter 6

Linear Algebra

**Linear algebra is the branch of mathematics dealing with (among many other things) matrices and
**

vectors. It’s intuitively easy to see why linear algebra is important for econometrics and statistics.

Economic data is arranged in matrix format (rows corresponding to observations, columns corre-

sponding to variables), so the body of theory governing matrices should help us analyze data. It

is harder to see the connection between matrix theory and the optimization that we do in micro

theory, but there are some important links. We’ll cover the basics and some of the necessary detail

here, but more detailed coverage will be offered in the core courses.

6.1. Vectors

You may be familiar with vectors from physics courses, in which a vector is a pair giving the mag-

nitude and direction of a moving body. The vectors we use in economics are more general, in that

they can have any finite number of elements (rather than just 2), and the meaning of each element

can vary with the context (rather than always signifying magnitude and direction). Formally speak-

ing a vector can be defined as a member of a vector space, but we don’t need to deal with such a

definition here. For our purposes:

Definition 6.1. A vector is an ordered array of elements with either one row or one column.

**The elements are usually numbers. A vector is an n × k matrix for which either n = 1, k = 1 or
**

both (see the definition of a matrix below). A general vector, for which the number of elements is

not specified but left as n, will sometimes be called an “n-vector”. We also refer to these as “vectors

in Rn ”. A vector can be written in either row or column form:

65

66 6. Linear Algebra

x1

( )

x2

Row Vector: x ∈ Rn = x1 x2 . . . xn ; Column Vector: x ∈ Rn =

.. .

.

xn

**Although you will sometimes be able to switch between thinking of a vector as a row or a
**

column without restriction, there are certain operations that require a vector to be oriented in a

certain way, so it is good to distinguish between row and column vectors whenever possible. Most

people use x to refer to the vector in column form and x′ to refer to it in row form, but this is not

universal. Also, we usually use lowercase letters for vectors and uppercase letters for matrices.

6.1.1. Special Vectors.

0

.

Null vector 0n×1 = ..

0

1

.

Sum vector un×1 = ..

1

**Unit vector : In Rn there are n unit vectors.
**

The ith unit vector, called ei , has all elements 0 except for the i th, which is equal to 1. The

definition of a unit vector is specific to the vector space in which it sits. For example:

0

(6.1) e2 ∈ R3 = 1

0

and

0

1

(6.2) e2 ∈ R4 =

0

0

**6.1.2. Vector Relations and Operations.
**

Definition 6.2.

6.2. Matrices 67

(a) Equality :

Vectors x ∈ Rn , y ∈ Rm are equal if n = m and xi = yi ∀ i.

(b) Inequalities : ∀x, y ∈ Rn :

x ≥ y if xi ≥ yi ∀ i = 1, · · · , n;

x > y if xi ≥ yi ∀ i = 1, · · · , n and xi > yi for at least one i;

x ≫ y if xi > yi ∀i = 1, · · · , n.

(c) Addition :

∀x, y ∈ Rn , x + y = z ∈ Rn where zi = xi + yi , ∀i.

**(d) Scalar Multiplication :∀x ∈ Rn , and α ∈ R, we define the scalar product as
**

αx1

αx2

(6.3) αx =

..

.

αxn

**(e) Vector Multiplication : This is essentially an inner product rule applied to Rn . See the rules
**

for matrix multiplication below, as they also apply for vectors.

6.2. Matrices

Definition 6.3. A matrix is a rectangular array of elements (usually numbers, for our purposes).

**A matrix is characterized as n × k when it has n rows and k columns. To represent the n × k
**

matrix A, we can write:

a11 a12 . . . a1k

[ ]

a21 a22 . . . a2k

[A] = ai j = .. .. .. ..

n×k n×k . . . .

an1 an2 . . . ank

The matrix An×k is a null matrix if ai j = 0 for i = 1, · · · , n, j = 1, · · · , k.

The matrix An×k is square if n = k. In this case we refer to it as an n × n matrix.

**The square matrix An×n is symmetric if ai j = a ji ∀ i, j.
**

68 6. Linear Algebra

The symmetric matrix An×n is diagonal if ai j = 0 whenever i ̸= j.

The diagonal matrix An×n is an identity matrix if ai j = 1 whenever i = j.

The square matrix An×n is lower triangular if ai j = 0 ∀ i < j.

The square matrix An×n is upper triangular if ai j = 0 ∀ i > j.

**It’s worth it to check your understanding of each of the above definitions by writing out a
**

matrix that satisfies each. Then note this next definition carefully:

The k × n matrix B is called the transpose of An×k if bi j = a ji ∀ i, j.

**We write the transpose of A as either AT or A′ . If A is symmetric then A′ = A. This is an
**

obvious statement, but you could try proving it formally. It should only take a few lines.

6.2.1. Matrix Operations.

**6.2.1.1. Addition. Matrix addition is only defined for matrices of the same size. If A is n × k and
**

B is n × k then

(6.4) A + B = Cn×k

where

(6.5) ci j = ai j + bi j ∀ i = 1, · · · , n, j = 1, · · · , k.

We say that matrix addition occurs “element wise” because we move through each element of the

matrix A, adding the corresponding element from B.

**6.2.1.2. Scalar Multiplication. Scalar multiplication is also an element wise operation. That is,
**

λa11 λa12 · · · λa1k

λa21 λa22 · · · λa2k

(6.6) ∀ λ ∈ R, λ · [A] = . ..

. .. ..

n×k . . . .

λan1 λan2 · · · λank

**6.2.1.3. Matrix Multiplication. Matrix multiplication is defined for matrices [A] and [B] if j = n
**

m× j n×k

or m = k. That is, the number of columns in one of the matrices must be equal to the number of

rows in the other. If matrices A and B satisfy this condition, so that A is m × j and B is j × k, their

6.2. Matrices 69

**product [C] ≡ [A] · [B] is given by ci j = Ai · B j , where Ai is the ith row of A and B j is the jth column
**

m×k m× j j×k

of B. For example, suppose

[ ] [ ]

1 2 6 5 4

[A] = and [B] =

2×2 3 4 2×3 3 2 1

Multiplication between A and B is only defined if A is on the left and B is on the right. It must

always be the case that the number of columns in the left hand matrix is the same as the number of

rows in the right hand matrix. In this case, if we say AB = C, then element

[ ]

[ ] 6

c11 = 1 2 · = 1 · 6 + 2 · 3 = 12

3

Likewise

c12 = 1 · 5 + 2 · 2 = 9

c13 = 1 · 4 + 2 · 1 = 6

c21 = 3 · 6 + 4 · 3 = 30

c22 = 3 · 5 + 4 · 2 = 23

c23 = 3 · 4 + 4 · 1 = 16

which gives [ ]

12 9 6

[A] · [B] = [C] =

2×2 2×3 2×3 30 23 16

Note that matrix multiplication is not a symmetric operation. In general, AB ̸= BA, and in fact it

is often the case that the operation will only be defined in one direction. In our example BA is not

defined because the number of columns of B = (3) is not equal to the number of rows of A = (2).

For both AB and BA to be defined

[A] · [B] = [C] ,

n×k k×n n×n

and

[B] · [A] = [D].

k×n n×k k×k

6.2.2. Some Fun facts about matrix multiplication.

(i) Even if n = k,

AB ̸= BA.

[ ] [ ] [ ] [ ]

1 2 0 −1 12 13 −3 −4

A= , B= , AB = , BA = .

3 4 6 7 24 25 27 40

70 6. Linear Algebra

**(ii) AB may be null matrix even when A ̸= 0 and B ̸= 0.
**

[ ] [ ] [ ]

2 4 −2 4 0 0

A= , B= , AB = .

1 2 1 −2 0 0

**(iii) CD = CE ; D = E even when C ̸= 0.
**

[ ] [ ] [ ] [ ]

2 3 1 1 −2 1 5 8

C= , D= , E= , CD = CE = .

6 9 1 2 3 2 15 24

**6.2.3. Rules for matrix operations.
**

A+B = B+A

A + (B +C) = (A + B) +C

(AB)C = A(BC)

(A + B)C = AC + BC

A(B +C) = AB + AC

Check that you have a clear understanding of the restrictions needed on the number of rows and

columns of A, B and C in order for the above to work. More matrix rules, involving the transpose:

(6.7) (A′ )′ = A

(6.8) (A + B)′ = A′ + B′

(6.9) (AB)′ = B′ A′

Note the reversal of the order of the matrices in the last operation.

6.2.4. Rank of a matrix.

Definition 6.4. A set of vectors x1 , · · · , xn in Rm is linearly dependent if there exist λ1 , · · · , λn , not

all zero, such that

(6.10) λ1 x1 + · · · + λn xn = 0.

Definition 6.5. A set of vectors x1 , · · · , xn in Rm is linearly independent if it is not linearly depen-

dent.

Definition 6.6. The rank of a matrix A is the maximum number of linearly independent column

vectors of A. It is also equal to the number of linearly independent row vectors of A.

Example 6.1. Let

1 2 3

A= 0 1 0

2 4 6

6.2. Matrices 71

The first and the third columns are linearly dependent. The elements of column 3 are three

times the corresponding entry in the column 1. Now take Columns 1 and 2.

1 2 0

λ1 0 + λ2 1 = 0

2 4 0

λ1 + 2λ2 = 0

⇔ λ2 = 0

2λ1 + 4λ2 = 0

⇔ λ1 = 0, λ2 = 0

is the only solution. So the first two columns are linearly independent. We found two linearly

independent columns so the rank of matrix A is 2. We could have done the exercise taking rows

instead of columns and still got the same answer. (Please verify).

**Theorem 6.1. (i) Rank of [A] 6 {# rows, # columns} = min {n, k};
**

n×k

{ } { }

(ii) Rank of AB 6 min Rank (A) , Rank (B) 6 Rank (A) , Rank (B) .

**Definition 6.7. A square matrix [A] is called non-singular or of full rank if rank (A) = n.
**

n×n

It is called singular if rank (A) < n.

Exercise 6.1. Prove: If A is a 2 × 1 matrix and B is a 1 × 2 matrix, then matrix AB is singular.

**Definition 6.8. A square matrix [A] is invertible if there exist [B] such that [A] · [B] = [B] · [A] =
**

n×n n×n n×n n×n n×n n×n

[I] . Then B is called inverse of A.

n×n

**6.2.5. Rules for the inverse:
**

( )−1

(6.11) A−1 = A

(6.12) (AB)−1 = B−1 A−1

( ′ )−1 ( )′

(6.13) A = A−1

**Definition 6.9. A square matrix [A] is called orthogonal if A−1 = A′ , i.e., AA′ = I.
**

n×n

**Theorem 6.2. [A] is invertible ⇔ [A] is non-singular.
**

n×n n×n

72 6. Linear Algebra

6.3. Determinant of a matrix

**Determinant is defined only for square matrices. The determinant is a function depending on n that
**

associates a scalar, det (A), to an n × n square matrix A. The determinant of an 1-by-1 matrix A is

the only entry of that matrix: det (A) = A11 . The determinant of a 2 by 2 matrix

[ ]

a b

A=

c d

is det (A) = ad − bc.

Definition 6.10. The cofactor Ai j of the element ai j is defined as (−1)i+ j times the determinant of

the sub matrix obtained from A after deleting row i and column j.

Example 6.2. Let

[ ]

1 2

A =

3 4

A11 = (−1)1+1 · 4 = 4, A12 = (−1)1+2 · 3 = −3

A21 = (−1)2+1 · 2 = −2, A22 = (−1)2+2 · 1 = 1.

Definition 6.11. The determinant of an n × n matrix A is given by

n n

(6.14) det (A) = ∑ a1 j A1 j = ∑ ai1 Ai1 .

j=1 i=1

Example 6.3. Let

a b c

A = d e f .

g h i

Then

[ ] [ ] [ ]

e f d f d e

det (A) = a (−1)1+1 det + b (−1)1+2 det + c (−1)1+3 det

h i g i g h

= a (ei − f h) − b (di − f g) + c (dh − eg) .

6.3.1. Properties of Determinants.

(a)

( )

(6.15) det (A) = det A′

6.3. Determinant of a matrix 73

(b) Interchanging any two rows will alter the sign but not the numerical value of the determinant.

(c) Multiplication of any one row by a scalar k will change the determinant k− fold.

(d) If one row is a multiple of another row, the determinant is zero.

(e) The addition of a multiple of one row to another row will leave the determinant unchanged.

**(f) If A and B are n × n matrices, then
**

det (AB) = det (A) · det (B) .

(g) Properties (b) − (e) are valid if we replace row by columns everywhere.

[ ] [ ]

1 2 1 3 ( )

A = , det (A) = −2; A′ = det A′ = −2

3 4 2 4

[ ]

3 4

B = , det (B) = 2.

1 2

**Result 6.1. Let A be an n × n upper triangular matrix, i.e., ai j = 0 whenever i > j. The determinant
**

of the matrix A is given by:

det A = ∏i=1 aii

n

**Proof. The matrix A is upper triangular and is described as under:
**

a11 a12 · · · a1,n−1 a1n

0 a22 · · · a2,n−1 a2n

.

A= ..

.. . .

. .

..

.

..

.

0 0 · · · a n−1,n−1 a n−1,n

0 0 ··· 0 ann

We prove the result by induction.

**(1) Base case: Let n = 1. If A is a 1 × 1 matrix, then det A = a11 = ∏1i=1 aii by the definition of a
**

determinant.

**(2) Inductive case: Let n > 1. Assume that for any (n − 1) × (n − 1) matrix A with ai j = 0 for all
**

i=1 aii . Now consider any n × n matrix A with ai j = 0 for all i > j.

i > j, we have det A = ∏n−1

74 6. Linear Algebra

Expanding by the last row, we have

**det A = an1 An1 + · · · + ann Ann
**

a11 a12 · · · a1,n−1

n+n

0 a22 · · · a2,n−1

= ann (−1) .

.. . . ..

.. . . .

0 0 · · · an−1,n−1

n−1

= ann ∏ aii

i=1

n

= ∏ aii

i=1

where the third equality follows from the inductive hypothesis.

(3) The result holds for all n by inductive conclusion.

Using this result we can show the following.

**Result 6.2. The upper triangular square matrix A is non-singular if and only if aii ̸= 0 for each
**

i ∈ {1, · · · , n}.

As an ”if and only if” statement, this requires proofs in both directions.

Claim 6.1. If the upper triangular matrix A is non-singular, then aii ̸= 0 for all i = 1, . . . , n.

**Proof. Let A be non-singular. Then A has an inverse, A−1 . Since 1 = det I = det [A−1 A] =
**

(det A−1 )(det A), we know that det A ̸= 0. If aii = 0 for any i ∈ 1, . . . , n, then by the Result

(6.1) we would have det A = 0, a contradiction. So it must be that aii ̸= 0 for all i = 1, . . . , n.

**Claim 6.2. If A is upper triangular and aii ̸= 0 for all i = 1, . . . , n, then A is non-singular.
**

6.4. An application of matrix algebra 75

**Proof. Let aii ̸= 0 for all i = 1, . . . , n. Seeking contradiction, suppose A is singular. Without loss
**

of generality, we can write A1 = ∑ni=2 αi Ai . Let

B = A1 − ∑ni=2 αi Ai A2 · · · An

= 0 A2 · · · An

**We know, by the properties of determinants, that det B = det A. But, expanding B by the first
**

column, we have det B = 0. This gives det A = 0, a contradiction. So we have that A is non-

singular.

6.4. An application of matrix algebra

**We provide an application of matrix algebra is Markov process or Markov chain. Markov processes
**

are used to measure movements over time. It involves use of a Markov transition matrix. Each value

in the transition matrix is probability of moving from one state to another state. It also specifies a

vector containing the initial distribution across each of these states. By repeatedly multiplying the

initial distribution vector by the transition matrix, we can estimate changes across states over time.

**Consider the problem of movement of employees within a firm at different branches. In the
**

simple case, we take two locations, namely Ithaca and Cortland to demonstrate the basic elements

of a Markov process.

**To determine the number of employees in Ithaca tomorrow, we take the probability that the
**

employees will stay in Ithaca branch multiplied by the total number of employees currently in

Ithaca. We add to this the number of Cortland employees transferring to Ithaca, which is equal

to total number of employees in Cortland multiplied by the probability of Cortland employees

transferring to Ithaca.

**We follow the same process to determine the number of employees in Cortland tomorrow, made
**

up of the employees who choose to remain at Cortland and the Ithaca employees who transfer into

Cortland.

**There are four probabilities involved which can be arranged in a Markov transition matrix.
**

76 6. Linear Algebra

**Let At and Bt denote the populations of Ithaca and Cortland locations at some time t. The
**

transition probabilities are defined as follows.

pAA ≡ probability that a current A remains an A,

**pAB ≡ probability that a current A moves to B,
**

pBB ≡ probability that a current B remains a B,

pBA ≡ probability that a current B moves to A.

The distribution of employees at time t is denoted by the vector xt′ = [At Bt ] and the transition

probabilities in matrix form as

[ ]

pAA pAB

(6.16) M= .

pBA pBB

**Then the distribution of employees across the two locations next period (t + 1) is xt′ · M = xt+1
**

′ ,

which is

[ ]

pAA pAB

[At Bt ] = [(At pAA + Bt pBA ) (At pAB + Bt pBB )] = [At+1 Bt+1 ].

pBA pBB

**In the similar manner we can determine the distribution of employees after two periods.
**

′ ′

xt+1 · M = xt+2

[ ]

pAA pAB

[At+1 Bt+1 ] = [At+2 Bt+2 ]

pBA pBB

[ ][ ]

pAA pAB pAA pAB

[At Bt ] = [At+2 Bt+2 ]

pBA pBB pBA pBB

[ ]2

pAA pAB

[At Bt ] = [At+2 Bt+2 ]

pBA pBB

**In general, for n periods,
**

[ ]n

pAA pAB

(6.17) [At Bt ] = [At+n Bt+n ]

pBA pBB

**When n is exogenous, the process is known as finite Markov chain.
**

Example 6.4.

**Consider the initial distribution of employees across two locations at time t = 0 as
**

x0′ = [A0 B0 ] = [200 200]

6.4. An application of matrix algebra 77

Let [ ] [ ]

pAA pAB 0.8 0.2

M= = .

pBA pBB 0.4 0.6

Then the distribution of employees in the next period t = 1 is

[ ]

0.8 0.2

[200 200] = [240 160] = [A1 B1 ].

0.4 0.6

**The distribution after two periods is
**

[ ]2 [ ]

0.8 0.2 0.72 0.28

[200 200] = [200 200] = [256 144] = [A2 B2 ]

0.4 0.6 0.56 0.44

**The distribution after six periods is
**

[ ]6 [ ]

0.8 0.2 0.668 0.332

[200 200] = [200 200] = [266.4 133.6] = [A6 B6 ]

0.4 0.6 0.664 0.336

Observe that when the transition matrix is raised to higher powers, the new transition matrix con-

verges to a matrix whose rows are identical. This is referred to as the steady state. In this example,

the steady state would be

[ ]

2 1

M= 3 3 .

2 1

3 3

Try computing this value.

**6.4.1. Absorbing Markov Chains. We can extend the previous model by adding a third choice:
**

employees can exit the firm with

pAE ≡ probability that a current A choose to exit, E,

**pBE ≡ probability that a current B choose to exit, E.
**

Let us assume that

pEA = 0, pEB = 0, pEE = 1

where pEA , pEB , and pEE are the probabilities that an employee who is currently in state E will go

to A, B or E respectively. The values assigned to pEA , pEB , and pEE mean that nobody who leaves

the firm ever returns. It is also implied by these restrictions that the firm never replaces employees

that leave. Starting at time t = 0, the Markov chain becomes,

n

pAA pAB pAE

[A0 B0 E0 ] pBA pBB pBE = [An Bn En ]

pEA pEB pEE

78 6. Linear Algebra

or n

pAA pAB pAE

[A0 B0 E0 ] pBA pBB pBE = [An Bn En ]

0 0 1

**This type of Markov process is referred to as absorbing Markov chain. The values of transition
**

probabilities assigned in the third row are such that once an employee goes to state E, he or she

remains in that state for ever. As n goes to infinity, An , and Bn will approach zero and En will

approach the total number of employees at time zero (i.e., A0 + B0 + E0 ).

6.5. System of Linear Equations

**The system of linear equation is
**

(6.18) Ax = b

where matrix A is of dimension n × k, x is a column vector k × 1 and b is column vector n × 1. This

is a system of n equations with k unknowns.

Example 6.5. The system of two linear equations,

5x + 3y = 1

6x + y = 2

can be written as [ ][ ] [ ]

5 3 x 1

=

6 1 y 2

**When b = 0, the system is called a homogeneous system. When b ̸= 0, it is called a non-
**

homogeneous system.

Definition 6.12. Column vector x∗ is called a solution to the system if Ax∗ = b.

There are three important questions in this context.

(a) Does a solution exist?

(b) If there exists a solution, is it unique?

**(c) If a solution exists, how do we compute it?
**

6.5. System of Linear Equations 79

**Claim 6.3. A homogeneous system Ax = 0 always has a solution (Trivial x = 0). But there might
**

be other solutions (solution may not be unique).

Claim 6.4. For a non-homogeneous system Ax = b, a solution may not exist.

Example 6.6. Following system of two linear equations

2x + 4y = 5

x + 2y = 2

does not have a solution. Multiply second equation by 2. Then LHS of both equations become

same which leads to 5 = 4 which is a contradiction.

Example 6.7. Following system of two linear equations

2x + 4y = 2

x + 2y = 1

has many solution.

[ ]

Given [A] and {b}, the n × (k + 1) matrix [Ab ] = A1 A2 · · · Ak b is called the augmented

n×k k×1 n×(k+1)

matrix. Note Ai is the ith column of A.

[ ] [ ] [ ]

5 3 1 5 3 1

Example 6.8. Let A = , b= ⇒ Ab = .

6 1 2 6 1 2

**Theorem 6.3. The system of equations
**

[A] · {x} = {b}

n×k k×1 n×1

has a solution if and only if

(6.19) rank (A) = rank (Ab ) .

The solution is unique if and only if

(6.20) rank (A) = rank (Ab ) = k = # of columns of A = # of unknowns

**Consider the case of n equations in n unknowns. In this case, A is n × n. If Ax = b has a solution
**

and if det (A) ̸= 0 then the solution is characterized by

(6.21) {x∗ } = [A−1 ] · {b}

n×1 n×n n×1

**Example 6.9. The system of linear equations
**

2x + y = 0

2x + 2y = 0

80 6. Linear Algebra

gives us [ ] [ ] [ ]

2 1 0 2 1 0

A= , b = , Ab = .

2 2 0 2 2 0

It is easy to verify that

rank (A) = 2 = rank (Ab ) .

Hence solution exists and is unique.

Example 6.10. The system of linear equations

2x + y = 0

4x + 2y = 0

leads to [ ] [ ] [ ]

2 1 0 2 1 0

A= , b = , Ab = .

4 2 0 4 2 0

It is again easy to verify that

rank (A) = 1 = rank (Ab ) .

However

rank (A) = rank (Ab ) < k = 2.

Hence solution exists but is not unique1.

**Now, we revert to the problem of computing the inverse of a non-singular matrix. We first note
**

the following result.

( )

Theorem 6.4. Matrix [A] is invertible⇔ det (A) ̸= 0. Also if [A] is invertible then det A−1 =

n×n n×n

1

det(A) .

**Proof. Suppose A is invertible. Then
**

A · A−1 = I

so 1 = det I = det(AA−1 ) = det(A) det(A−1 ) using properties of determinants, noted above. Con-

sequently det(A) ̸= 0, and det(A−1 ) = [det(A)]−1 .

**Suppose, next, that A is not invertible. Then, A is singular and so one of its columns (say, A1 )
**

can be expressed as a linear combination of its other columns A2 , · · · , An . That is,

n

A1 = ∑ αi Ai

i=2

**1A row or column vector of zeros is always linearly dependent on the other vectors.
**

6.5. System of Linear Equations 81

[ ]

n

Consider the matrix, B, whose first column is A − ∑ αi A and whose other columns are the same

1 i

i=2

as those of A. Then, the first column of B is zero, and so |B| = 0. By the property of determinants,

|B| = |A|, and so |A| = 0.

**For a square matrix, [A] , we define the co-factor matrix of A to be the n × n matrix given by
**

n×n

A11 A12 ... A1n

. ..

C=

..

..

.

..

.

.

An1 An2 ... Ann

The transpose of C is called the adjoint of A, and denoted by adj A.

Now, by the rules of matrix multiplication,

n n n

∑ a1 j A1 j

j=1 ∑ a1 j A2 j · · · ∑ a1 j An j

j=1 j=1 |A| 0 · · · 0

AC′ = .. ..

=

..

n . . .

n n 0 0 · · · |A|

∑ an j A1 j ∑ an j A2 j · · · ∑ an j An j

j=1 j=1 j=1

This yields the equation

(6.22) AC′ = |A| I

If A is non-singular (that is invertible) then there is A−1 such that

(6.23) AA−1 = A−1 A = I

Pre-multiplying (6.22) by A−1 and using (6.23),

C′ = |A| A−1

Since A is non-singular, we have |A| ̸= 0, and

C′ ad j A

(6.24) A−1 = =

|A| |A|

82 6. Linear Algebra

**Thus (6.24) gives us a formula for computing the inverse of a non-singular matrix in terms of the
**

determinant and cofactors of A.

6.6. Cramer’s Rule

**Recall that we wanted to calculate the (unique) solution of a system of n equations in n unknowns
**

given by

(6.25) Ax = c

where A is an n × n matrix, and c is a vector in Rn .

**To obtain a unique solution, we saw that we must have A non-singular, which now translates to
**

the condition “|A| ̸= 0”. The unique solution to (6.25) is then

ad j A

(6.26) x = A−1 c = c

|A|

Let us evaluate x1 , using (6.26). This can be done by finding the inner product of x with the first

unit vector, e1 = (1, 0, · · · , 0). Thus,

e1 ad j A

x1 = e1 x = c

|A|

**[A11 A21 An1 ]c
**

=

|A|

= [c1 A11 + c2 A21 + ..... + cn An1 ]/ |A|

c1 a12 .. a1n

= ..

.

|A|−1

cn an2 .. ann

This gives us an easy way to compute the solution of x1 . In general, in order to calculate xi , replace

the ith column of A by the vector c and find the determinant of this matrix. Dividing this number

by the determinant of A yields the solution xi . This rule is known as Cramer’s Rule.

Example 6.11. General Market Equilibrium with three goods

6.6. Cramer’s Rule 83

**Consider a market for three goods. Demand and supply for each good are given by:
**

D1 =5 − 2P1 + P2 + P3

S1 = − 4 + 3P1 + 2P2

D2 =6 + 2P1 − 3P2 + P3

S2 =3 + 2P2

D3 =20 + P1 + 2P2 − 4P3

S3 =3 + P2 + 3P3

where Pi is the price of good i; i = 1; 2; 3. The equilibrium conditions are: Di = Si ; i = 1; 2; 3, that

is

5P1 + P2 − P3 = 9

−2P1 + 5P2 − P3 = 3

−P1 − P2 + 7P3 = 17

This system of linear equations can be solved at least in two ways.

**(a) Using Cramer’s rule:
**

9 1 −1

A1 = det 3 5 −1 = 356.

17 −1 7

5 1 −1

A = det −2 5 −1 = 178.

−1 −1 7

A1 356

P1∗ = = = 2.

A 178

Similarly P2∗ = 2 and P3∗ = 3. The vector of (P1∗ , P1∗ , P3∗ ) describes the general market equilib-

rium.

**(b) Using the inverse matrix rule. Let us denote
**

5 1 −1 P1 9

A = −2 5 −1 , P = P2 , B = 3

−1 −1 7 P3 17

**The matrix form of the system is AP = B, which implies P = A−1 B.
**

34 −6 4

1

A−1 = 15 34 7

det A

7 4 27

84 6. Linear Algebra

34 −6 4 9 2

1

P = 15 34 7 · 3 = 2

178

7 4 27 17 3

Again, P1∗ = 2, P1∗ = 2, and P3∗ = 3.

6.7. Principal Minors

**Let [A] be a square matrix.
**

n×n

**Definition 6.13. A principal minor of order k (1 6 k 6 n) of [A] is the determinant of the k × k sub
**

n×n

matrix that remains when (n − k) rows and columns with the same indices are deleted from A.

Example 6.12. Let

1 2 3

A= 0 8 1

2 5 9

(a) Principal minors of order 1 are 1, 8, 9.

**(b) Principal minors of order 2 are,
**

[ ] [ ] [ ]

1 2 8 1 1 3

det = 8; det = 67; det = 3.

0 8 5 9 2 9

**(c) Principal minor of order 3 is
**

1 2 3

det 0 8 1 = 23.

2 5 9

**6.7.1. Leading Principal Minor.
**

Definition 6.14. A leading principal minor of order k, (1 6 k 6 n) of [A] is the principal minor of

n×n

order k which has the last (n − k) rows and columns deleted.

In the previous example, leading principal minor of order 1 is 1.

**Leading principal minor of order 2 is
**

6.8. Quadratic Form 85

[ ]

1 2

det =8

0 8

and leading principal minor of order 3 is

1 2 3

det 0 8 1 = 23.

2 5 9

6.8. Quadratic Form

**A quadratic form consists of a square matrix [A] which is pre and post multiplied by a n vector. It
**

n×n

is a scalar.

(6.27) Q (x, A) = x′ Ax

Example 6.13. Let [ ] [ ]

a b x1

A= , x= .

c d x2

Then

[ ] [ ]

[ ] a b x1

Q (x, A) = x1 x2 · ·

c d x2

= ax12 + (b + c) x1 x2 + dx22 .

**6.8.1. Matrix Definiteness. Let [A] be symmetric.
**

n×n

(a) A is positive definite (PD) if

(6.28) Q (z, A) = z′ Az > 0, ∀ z ∈ Rn , z ̸= 0.

(b) A is negative definite (ND) if

(6.29) Q (z, A) = z′ Az < 0, ∀ z ∈ Rn , z ̸= 0.

**(c) A is positive semidefinite (PSD) if
**

(6.30) Q (z, A) = z′ Az > 0, ∀ z ∈ Rn .

**(d) A is negative semi definite (NSD) if
**

(6.31) Q (z, A) = z′ Az 6 0, ∀ z ∈ Rn .

86 6. Linear Algebra

(e) A is indefinite if none of the above conditions hold true.

**6.8.2. Test for definiteness of symmetric matrices:
**

[A] is PD if and only if all leading principal minors of A are strictly positive.

n×n

**[A] is ND if and only if all leading principal minors of A have sign (−1)k .
**

n×n

[A] is PSD if and only if all principal minors of A are non-negative.

n×n

**[A] is NSD if and only if all principal minors of A have sign (−1)k or are 0.
**

n×n

Example 6.14. Let [ ]

a11 a12

A= .

a21 a22

Then A is

positive definite: a11 > 0, a11 a22 − a12 a21 > 0.

negative definite: a11 < 0, a11 a22 − a12 a21 > 0.

positive semi-definite: a11 > 0, a22 > 0, a11 a22 − a12 a21 > 0.

negative semi-definite: a11 6 0, a22 6 0, a11 a22 − a12 a21 > 0.

Note that a negative definite matrix necessarily has full rank: indeed, if the zero vector can be

obtained by a linear combination of columns of A with weights α1 , · · · , αn (not all zero), then we

can define t = (α1 , · · · , αn ) to obtain t ′ At = 0.

Definition 6.15. Let A be a symmetric n × n matrix. Matrix A is diagonally dominant if for each

row i, we have |ai,i | ≥ ∑ j̸=i |ai, j |, and it is strictly diagonally dominant if the latter inequality holds

strictly for each row.

**Every symmetric, diagonally dominant matrix with non-positive entries along the diagonal is
**

negative semi-definite; and every symmetric, strictly diagonally dominant matrix with negative

entries along the diagonal is negative definite.

6.9. Eigenvalue and Eigenvectors

**Given an n × n real matrix A, an eigenvalue of A is a number λ which when subtracted from each
**

of the diagonal entries of A converts A into a singular matrix. Subtracting a scalar λ from each

diagonal entry of A is the same as subtracting λ times the identity matrix I from A. Hence, λ is a

eigenvalue of A if and only if A − λI is a singular matrix.

6.9. Eigenvalue and Eigenvectors 87

**This is also equivalent to asking for what non-zero vectors x ∈ Rn , and for what complex
**

numbers λ, is it true that

(6.32) Ax = λx

This is known as the the eigenvalue problem.

**If x ̸= 0 and λ satisfy equation (6.32), then λ is called a eigenvalue of A, and x is called a
**

eigenvector of A.

**Clearly (6.32) holds if and only if
**

(6.33) (A − λI)x = 0

But (6.33) is a homogeneous system of n equations in n unknowns. It has a non-zero solution for x

if and only if (A − λI) is singular; that is, if and only if

(6.34) |A − λI| = 0

**This equation is called the characteristic equation of A. If we look at the expression
**

(6.35) f (λ) ≡ |A − λI|

we note that f is a polynomial in λ; it is called the characteristic polynomial of A.

Example 6.15. Consider the 3 × 3 matrix A given by

4 1 1

A= 1 4 1

1 1 4

Then subtracting 3 from each diagonal entries transforms A into the singular matrix

1 1 1

1 1 1 .

1 1 1

Therefore, 3 is an eigenvalue of matrix A.

Example 6.16. Consider the 2 × 2 matrix A given by

[ ]

4 0

A=

0 2

Then subtracting 4 from each diagonal entries transforms A into the singular matrix

[ ]

0 0

.

0 −2

88 6. Linear Algebra

**Therefore, 4 is an eigenvalue of matrix A. Also, subtracting 2 from each diagonal entries transforms
**

A into the singular matrix [ ]

2 0

.

0 0

Therefore, 2 is also an eigenvalue of matrix A.

**The above example illustrates a general principal about the eigenvalues of a diagonal matrix.
**

Theorem 6.5. The diagonal entries of a diagonal matrix A are the eigenvalues of A.

Theorem 6.6. A square matrix A is singular if and only if 0 is an eigenvalue of A.

Example 6.17. Consider the 2 × 2 matrix A given by

[ ]

4 −4

A=

−4 4

Since the first row is negative of the second row, matrix A is singular. Hence 0 is an eigenvalue of

A. Also subtracting 8 from each diagonal entries transforms A into the singular matrix

[ ]

−4 −4

.

−4 −4

Therefore, 8 is also an eigenvalue of matrix A.

Example 6.18. Consider the 2 × 2 matrix A given by

[ ]

2 1

A=

1 2

Then equation (6.34) becomes

2−λ 1

(6.36)

1 2−λ

**So, (4 − 4λ + λ2 ) − 1 = 0, which yields
**

(1 − λ)(3 − λ) = 0

Thus, the eigenvalues are λ = 1 and λ = 3. In this case it was also possible to see that λ = 1 is a

eigenvalue as subtracting 1 from the diagonal entries converts matrix A into a singular matrix.

Putting λ = 1 in (6.33), we get

[ ] [ ] [ ]

1 1 x1 0

=

1 1 x2 0

6.10. Eigenvalues of symmetric matrix 89

which yields

x1 + x2 = 0

Thus the general solution of the eigenvector corresponding to the eigenvalue λ = 1 is given by

(x1 , x2 ) = θ(1, −1) for θ ̸= 0

Similarly, corresponding to the eigenvalue λ = 3, we have the eigenvector given by

(x1 , x2 ) = θ(1, 1) for θ ̸= 0.

Example 6.19. A square matrix A whose entries are non-negative and whose rows (or columns)

each add to 1 is called a Markov matrix. These matrices play a major role in economic dynamics.

Consider the 2 × 2 matrix A given by

[ ]

a 1−a

A=

b 1−b

where a ≥ 0 and b ≥ 0. Then subtracting 1 from the diagonal entries leads to the matrix

[ ]

a−1 1−a

A=

b −b

Notice that each row of the matrix adds to 0. But if the rows of a square matrix add to zero {0, 0},

the columns are linearly dependent and the matrix is singular. This shows that 1 is an eigenvalue

of the Markov matrix. This same argument shows that 1 is an eigenvalue of every Markov matrix.

6.10. Eigenvalues of symmetric matrix

**For the case of a symmetric matrix A, we can show that all the eigenvalues of A are real.
**

Theorem 6.7. Let A be a symmetric n × n matrix. Then all the eigenvalues of A are real.

**Proof. Suppose λ is a complex eigenvalue, with associated complex eigenvector, x. Then we have
**

(6.37) Ax = λx

Define x∗ to be the complex conjugate of x, and λ∗ to be the complex conjugate of λ. Then

(6.38) Ax∗ = λ∗ x∗

Pre-multiply (6.37) by (x∗ )′ and (6.38) by x′ to get

(6.39) (x∗ )′ Ax = λ(x∗ )′ x

(6.40) x′ Ax∗ = λ∗ x′ x∗

Subtracting (6.40) from (6.39)

(6.41) (x∗ )′ Ax − x′ Ax∗ = (λ − λ∗ )x′ x∗

90 6. Linear Algebra

since (x∗ )′ x = x′ x∗ . Also,

x′ Ax∗ = (x′ Ax∗ )′ = (x∗ )′ A′ x = (x∗ )′ Ax

since A′ = A (by symmetry). Thus (6.41) yields

(6.42) (λ − λ∗ )x′ x∗ = 0

Since x ̸= 0, we know that x′ x∗ is real and positive. Hence (6.42) implies that λ = λ∗ , so λ is

real.

6.11. Eigenvalues, Trace and Determinant of a Matrix

**If A is an n × n matrix, the trace of A, denoted by tr(A), is the number defined by
**

n

tr(A) = ∑ aii

i=1

The following properties of the trace can be verified easily [Here A, B and C are n × n matrices,

and λ ∈ R].

(a) tr(A + B) = tr(A) + tr(B)

(b) tr(λA) = λ tr(A)

(c) tr(AB) = tr(BA)

(d) tr(ABC) = tr(BCA) = tr(CAB)

**Let A be an n × n matrix. The characteristic polynomial of A, defined in (6.35) above can
**

generally be written as

(6.43) |A − λI| = (−λ)n + bn−1 (−λ)n−1 + .... + b1 (−λ) + b0

where b0 , ..., bn−1 are the coefficients of the polynomial which are determined by the coefficients

of the A-matrix.

**On the other hand, if λ1 , ..., λn are the eigenvalues of A, then the characteristic equation (6.34)
**

can be written as

(6.44) 0 = (λ1 − λ)(λ2 − λ)....(λn − λ)

**Using (6.34), (6.43), and (6.44) and “comparing coefficients” we can conclude that
**

bn−1 = λ1 + λ2 + ... + λn

6.11. Eigenvalues, Trace and Determinant of a Matrix 91

and

b0 = λ1 λ2 ...λn

**Also, by looking at the terms in the characteristic polynomial of A which would involve
**

(−λ)n−1 , we can conclude that

bn−1 = a11 + a22 + ... + ann

Finally, putting λ = 0 in (6.43), we get

b0 = |A|

**Thus we might note two interesting relationships between the characteristic values, the trace
**

and the determinant of A:

n

trA = ∑ λi

i=1

and

n

|A| = ∏ λi

i=1

**6.11.1. Eigenvalues and Definiteness of Quadratic Forms.
**

Theorem 6.8. Let A be a symmetric matrix. Then,

(1) A is positive definite if and only if all the eigenvalues of A are positive.

(2) A is negative definite if and only if all the eigenvalues of A are negative.

(3) A is positive semidefinite if and only if all the eigenvalues of A are non-negative.

(4) A is negative semidefinite if and only if all the eigenvalues of A are non-positive.

**(5) A is indefinite if and only if A has a positive eigenvalue and a negative eigenvalue.
**

Chapter 7

Problem Set 3

(1) Let

[ ] 9 6 5 4

1 −1 7

A= , B = 1 −2 −3 3

0 8 10

0 1 −1 2

Compute AB. Is BA defined?

( ) ( )

1 1

(2) Are the vectors and linearly independent?

2 3

(3) Let

[ ] 8 4

1 6 2

A= , B = 0 −2 .

−1 5 3

7 −3

Compute AB and BA.

**(4) What is the determinant of
**

1 2 3 4

1 2 1 2

A= ?

1 3 5 7

2 1 4 1

93

94 7. Problem Set 3

**(5) What is the rank of
**

3 2 1

A = 0 1 7 ?

5 4 −1

**(6) Consider the system of three linear equations in three unknowns:
**

x+y+z = 6

x + 2y + 3z = 10

x + 2y + λz = µ.

For what values of λ and µ, the system of equations have

(a) no solution,

(b) a unique solution,

(c) infinitely many solutions?

**(7) What is the definiteness of the following matrices? (Hint: Use the principal minors)
**

[ ] [ ] [ ] [ ]

2 −1 2 4 −3 4 −3 4

A= , B= C= , D= .

−1 1 4 8 4 5 4 −6

**(8) Consider the situation of a mass layoff (i.e. a firm goes out of business) where 2000 people
**

become unemployed and now begin a job search. There are two states: employed (E) and

unemployed (U) with an initial vector

x0′ = [E U] = [0 2000].

Suppose that in any given period an unemployed person will find a job with probability 0.7

and will therefore remain unemployed with a probability 0.3. Additionally, persons who find

themselves employed in any given period may lose their job with a probability of 0.1 (and will

continue to remain employed with probability 0.9).

(i) Set up the Markov transition matrix for this problem.

(ii) What will be the number of unemployed people after (a) two periods; (b) four periods;

(c) six periods; (d) ten periods.

(iii) What is the steady-state level of unemployment?

(9) (a) A n × n matrix A is called nilpotent if for some positive integer k

| ×A×

Ak = A {z· · · · · · A} = O

k times

**where O is the n × n null matrix. Prove that if A is nilpotent, then det A = 0.
**

7. Problem Set 3 95

**(b) A n × n matrix A is called skew- symmetric if A′ = −A. Prove that if A is skew-symmetric
**

and n is odd, then A is not invertible.

(c) A n × n matrix A is called orthogonal if AA′ = I. Prove that if A is orthogonal, then

det A = ±1.

(d) Let n × n matrices A and B be such that AB = −BA. Prove that if n is an odd number, then

either A or B is not invertible.

(e) Let n × n matrices A and B be such that AB = I. Use determinants to prove that A is

invertible (and hence B = A−1 .

**(10) (a) Prove that the eigenvalues of an upper or lower triangular matrix are precisely its diagonal
**

entries.

(b) Suppose that A is an invertible matrix. Show that (A−λI)x = 0 implies that (A−1 − λI )x = 0.

Conclude that for an invertible matrix A, λ is an eigenvalue of A if and only if λ1 is an

eigenvalue of A−1 .

(c) Let A be an invertible matrix and let x be an eigenvector of A. Show it is also an eigenvector

of A2 and A−2 . What are the corresponding eigenvalues?

Chapter 8

**Single and Multivariable
**

Calculus

8.1. Functions

**Recall the definition of functions discussed earlier. Now we discuss some features of function
**

which are useful in optimization exercise.

8.2. Surjective and Injective Functions

**Definition 8.1. A function f : D → R is called surjective (or is said to map) D onto R if f (D) = R,
**

i.e., if the image f (D) of the function is equal to entire range.

Definition 8.2. A function f : D → R is called injective or one to one if

(8.1) f (x) = f (y) ⇔ x = y.

**A function f : D → R is called a bijection if it is both surjective and injective.
**

Example 8.1. Consider function

f : R → R : f (x) = x2 .

It is not surjective as there exist no element in the domain which gets mapped into −1.

97

98 8. Single and Multivariable Calculus

**Let us restrict the range to R+ . So the new function is
**

g : R → R+ : f (x) = x2 .

Now this function is surjective as each non-negative real number has a pre-image (square root) in

R. However, this function is not injective as the pre-image of 4 is both −2 and 2.

Next let us also restrict the domain of the function to R+ . The function is

h : R+ → R+ : f (x) = x2 .

It is both surjective and injective. Hence it is bijective.

Example 8.2. Let A be a non-empty set and let S be a subset of A. We define a function χS : A →

{0, 1} by

{

1, if a ∈ S;

(8.2) χS (a) =

0, if a ∈

/ S.

**This function is called characteristic function or indicator function of S. It is widely used in
**

probability and statistics. If S is a non-empty proper subset of A, then χS is surjective. If S = 0/ or

S = A, then χS is not surjective.

Definition 8.3. Inverse Function: Consider f : D → R. If ∃g : R → D such that ∀x ∈ D,

( )

(8.3) g f (x) = x,

then g is called the inverse function of f and is written as f −1 : R → D. Alternatively we can also

define the inverse function as under. Let f : D → R be bijective. The inverse function of f is the

function f −1 : R → D such that ∀x ∈ D,

( )

(8.4) f −1 f (x) = x.

Theorem 8.1. Let f : D → R be bijective. Then f −1 : R → D is bijective.

( )

Example 8.3. f (x) = 2x, f −1 (x) = 2x , f −1 f (x) = f (x) 2x

2 = 2 = x.

Theorem 8.2. Suppose f : D → R. Let A, A1 , A2 be subsets of D and let B be a subset of R.

Then

[ ]

(a) If f is injective, then f −1 f (A) = A,

[ ]

(b) If f is surjective, then f f −1 (B) = B,

(c) If f is injective, then f (A1 ∩ A2 ) = f (A1 ) ∩ f (A2 ) .

8.3. Composition of Functions 99

Proof. You should try and prove (a) and (b) on your own. I will provide proof for (c) here. We

need to prove that f (A1 ∩ A2 ) ⊆ f (A1 ) ∩ f (A2 ) and f (A1 ) ∩ f (A2 ) ⊆ f (A1 ∩ A2 ).

Step 1. Show

f (A1 ∩ A2 ) ⊆ f (A1 ) ∩ f (A2 )

Let

y ∈ f (A1 ∩ A2 ) .

Then ∃x ∈ A1 ∩ A2 f (x) = y. Since x ∈ A1 ∩ A2 , x ∈ A1 and x ∈ A2 . But then f (x) ∈ f (A1 ) and

f (x) ∈ f (A2 ). So f (x) ∈ f (A1 ) ∩ f (A2 ). Observe that we have not used the fact that f is injective.

So this part of the result holds for any function.

Step 2. We need to show

f (A1 ) ∩ f (A2 ) ⊆ f (A1 ∩ A2 ) .

Let y ∈ f (A1 ) ∩ f (A2 ). Then y ∈ f (A1 ) and y ∈ f (A2 ). Hence there exist a point x1 ∈ A1 and a

point x2 ∈ A2 such that f (x1 ) = y and f (x2 ) = y. Or

f (x1 ) = y = f (x2 ) .

Since f is injective, we must have x1 = x2 , or x1 ∈ A1 ∩ A2 . But then y = f (x1 ) ∈ f (A1 ∩ A2 ).

**Here are some more definitions related to functions.
**

Definition 8.4.

(a) A function f is odd if and only if for every x, − f (x) = f (−x).

(b) A function f is even if and only if for every x, f (x) = f (−x).

(c) A function f is periodic if and only if there exists a k > 0 such that for every x, f (x + k) = f (x).

(d) A function f is increasing if and only if for every x and every y, if x ≤ y, then f (x) ≤ f (y).

(e) A function f is decreasing if and only if for every x and every y, if x ≤ y, then f (x) ≥ f (y).

8.3. Composition of Functions

**Definition 8.5. Composition of Functions: If f : A → B and g : B → C are two functions, then for
**

( a ∈)A, f (a) ∈ B. But B is the domain of g, so mapping g can be applied to f (a), which yields

any

g f (a) , an element in C. This establishes a correspondence between a in A and c in C. This

100 8. Single and Multivariable Calculus

**correspondence is called the composition function of f and g and is denoted by g ◦ f (read g of f ).
**

Thus we have

( )

(8.5) (g ◦ f ) (a) = g f (a) .

Remark 8.1. Composition of two functions need not be commutative,

(g ◦ f ) (a) ̸= ( f ◦ g) (a)

as the following example shows.

Let f (x) = x2 , g (x) = x + 1. Then

(g ◦ f ) (x) = x2 + 1 but

( f ◦ g) (x) = (x + 1)2 .

Theorem 8.3. Let f : A → B, and g : B → C.

(a) If f and g are surjective, then g ◦ f is surjective,

(b) If f and g are injective, then g ◦ f is injective,

(c) If f and g are bijective, then g ◦ f is bijective.

**Proof. (a) Since g is surjective, range of g = C. That is for any element c ∈ C, there exists an
**

element b ∈ B such that g (b) = c. Since f is also surjective, there exists an element a ∈ A such

that f (a) = b. But then

( )

(g ◦ f ) (a) = g f (a) = g (b) = c.

So, (g ◦ f ) is surjective.

( )

(b) Since g is injective, for all b and b′ in B, (if g)(b) = g b′ = c ∈ C then b = b′ and since f is

injective, for all a and a′ in A, if f (a) = f a′ = b ∈ B then a = a′ . Then

( )

(g ◦ f ) (a) = (g ◦ f ) a′

( ) ( ( ))

⇒ g f (a) = g f a′

( )

⇒ f (a) = f a′

⇒ a = a′

So, (g ◦ f ) is injective.

(c) Proof of this result follows from (a) and (b).

8.4. Continuous Functions 101

8.4. Continuous Functions

** f : D → R at the point c ∈ D if and only
**

Definition 8.6. The real number L is the limit ofthe function

if for each ε > 0, there exists a δ > 0 such that f (x) − L < ε whenever x ∈ D and 0 < |x − c| < δ.

**Definition 8.7. A function f : D → R is continuous at x0 ∈ D, if
**

( )

(8.6) ∀ε > 0, ∃δ > 0 d (x, x0 ) < δ ⇒ d f (x) , f (x0 ) < ε.

A function f : D → R is continuous if it is continuous at all x0 ∈ D.

**It is easy to draw examples of functions which are not continuous. An intuitive way of under-
**

standing continuity of function is that we should be able to draw its graph without lifting pencil

from paper. If a function has a point of discontinuity say, x0 , then as we approach x0 from the left

hand side and from right hand side, the function attains different values.

**For a function to be continuous at x0 , both the LHS and RHS limits must exist and converge to
**

the function value.

(8.7) lim f (x) = lim+ f (x) = f (x0 )

x→x0− x→x0

**Theorem 8.4. A function f : D → R is continuous if and only if for every convergent sequence of
**

points {xn } ∈ D with limit x ∈ D, the sequence f (xn ) → f (x).

Example 8.4. If

lim f (x) = lim+ f (x) ̸= f (x0 )

x→x0− x→x0

then the function is not continuous. Take

x for 0 6 x < 21

y= 0 for x = 12

1 − x for 1 < x 6 1

2

**Definition 8.8. Given f : D → R, let A ⊆ R be any subset of the range. The inverse image of A
**

under f , f −1 (A), is the set of points x in the domain D such that f (x) ∈ A

{ }

(8.8) f −1 (A) = x ∈ D | f (x) ∈ A .

**We give two more theorems on continuity of functions.
**

Theorem 8.5. A function f : D → R is continuous if and only if the inverse image of every open

set is open.

102 8. Single and Multivariable Calculus

**Proof. Suppose f is continuous on D and V is an open set in R. We have to show that f −1 (V ) is
**

open in D (i.e., every point of f −1 (V ) is an interior point of f −1 (V )). Let p ∈ D and f (p) ∈ V .

Since V is open, there exists ε > 0 such that y ∈ V if d( f (p), y) < ε. Also since f is continuous at p,

there exists a δ > 0, such that d( f (p), y) < ε if d(p, x) < δ. Thus x ∈ f −1 (V ) as soon as d(p, x) < δ

and hence f −1 (V ) is open.

**Conversely, assume that f −1 (V ) is open in D for every open set V in R. Fix p ∈ D and ε > 0,
**

and let V be the set of all y ∈ R such that d( f (p), y) < ε. Then V is open and hence f −1 (V ) is

open, and so there exists δ > 0 such that x ∈ f −1 (V ) as soon as d(p, x) < δ. But if x ∈ f −1 (V ), then

f (x) ∈ V , and so d( f (p), y) < ε.

**Next theorem (stated without proof) considers the inverse image of the closed subsets of the
**

range R to characterize continuous functions.

Theorem 8.6. A function f : D → R is continuous if and only if the inverse image of every closed

set is closed.

**This follows from Theorem 8.5, since a set is closed if and only if its complement is open, and
**

since f −1 (V c ) = [ f −1 (V )]c for every V ⊂ R.

**8.4.1. Properties of Continuous Functions.
**

Claim 8.1. If f and g are continuous functions then

f ±g

f( · g)

f

(if g ̸

= 0) are continuous.

g

max { f , g}

min { f , g}

**Claim 8.2. If f is a continuous function of two variables f (x1 , x2 ), then the functions of one
**

variable obtained by holding the other variable constant f (·, x̄2 ) and f (x̄1 , ·) are also continuous.

Theorem 8.7. Intermediate Value Theorem for continuous functions: Let f be a continuous func-

tion on a domain containing [a, b], with say f (a) < f (b). Then for any y in between, f (a) < y <

f (b), there exists c in (a, b) with f (c) = y.

**We can apply the Intermediate Value Theorem to prove the existence of a fixed point for fol-
**

lowing function.

Theorem 8.8. Consider a continuous function f : [0, 1] → [0, 1]. Then there exists c ∈ [0, 1] such

that f (c) = c.

8.5. Extreme Values 103

y = f (x)

f (b)

y=u

u

f (a)

a c b

Figure 8.1. Intermediate Value Theorem

**Proof. Define a function g(x) = f (x) − x. It is continuous since it is sum of two continuous func-
**

tions, f (x) and −x. If f (0) = 0, then x = 0 is a fixed point. If not, then f (0) > 0, or g(0) > 0.

If f (1) = 1, then x = 1 is a fixed point. If not, then f (1) < 1, or g(1) < 0.

**Now we apply the Intermediate Value Theorem to claim that there exists a point c ∈ [0, 1] such
**

that g(c) = 0. This implies g(c) = f (c) − c = 0 or f (c) = c or c is a fixed point.

8.5. Extreme Values

**Definition 8.9. The function f : D → R attains a local maximum at x0 if there exists a neighborhood
**

of x0 such that f (x) 6 f (x0 ) for all x in the neighborhood.

104 8. Single and Multivariable Calculus

**Definition 8.10. The function f : D → R attains a strict local maximum at x0 if there exists a
**

neighborhood of x0 such that f (x) < f (x0 ) for all x not equal to x0 in the neighborhood.

**Local minima are defined by reversing the inequalities.
**

Definition 8.11. The function f : D → R attains a global maximum at x0 if f (x) 6 f (x0 ) , ∀x ∈ D.

Definition 8.12. The function f : D → R attains a strict global maximum at x0 if f (x) < f (x0 ),

∀x ∈ D\x0 .

**Global minima are defined by reversing the inequalities.
**

Remark 8.2. A global maximum (minimum) is also a local maximum (minimum).

Theorem 8.9. Weierstrass Theorem: Suppose D is a non-empty closed and bounded subset of

Rn . If f : D → R is continuous on D, then there exists x∗ and x∗ in D such that

( )

(8.9) f x∗ > f (x) > f (x∗ ) , ∀ x ∈ D.

**Proof. We first claim that the function f is bounded on the domain D. If not, then there exists a
**

sequence {xn }∞ n=1 in D such that f (xn ) → ∞ as n → ∞. Since D is compact, there exists a subse-

quence {yn }∞ ∞ ∞

n=1 of sequence {xn }n=1 which converges to ȳ in D. Since {yn }n=1 is a subsequence

of sequence {xn }n=1 and f (xn ) → ∞, it must be true that f (yn ) → ∞. However, {yn }∞

∞

n=1 converges

to ȳ and f is a continuous function, f (yn ) must converge to the finite real number f (ȳ). These two

observations lead to a contradiction. Thus we have proved the claim.

**To prove the theorem, we again assume that f does not attain its maximum value in D. Since f
**

is bounded on D, let M be the least upper bound of the values f takes in D. Clearly M is finite. Also

there exists a sequence {zn }∞

n=1 in D such that f (zn ) → M. Note, even though f (zn ) approaches

towards the least upper bound M as n → ∞, the sequence {zn }∞ n=1 itself need not converge. Since

D is compact, there exists a subsequence {un }n=1 of sequence {zn }∞

∞

n=1 which converges to ū in

D. Since f is a continuous function, f (un ) must converge to the finite real number f (ū). Since a

convergent sequence has only one limit, f (ū) = M and ū is the point of global maximum of f in

D.

**This is the theorem we will be using to show the existence of optimal bundles for consumers
**

and producers. So we need to understand it and be comfortable with using it.

**The following examples show why the function domain must be closed and bounded in order
**

for the theorem to apply. In each of the following examples, the function fails to attain a maximum

on the given interval.

8.6. An application of Extreme Values Theorem 105

(a) f (x) = x defined over [0, ∞) (domain being unbounded) is not bounded from above.

x

(b) f (x) = 1+x defined over [0, ∞) (domain being unbounded) is bounded but does not attain its

least upper bound, i. e., 1.

1

(c) f (x) = x defined over (0, 1] (domain is bounded but not closed) is not bounded from above.

**(d) f (x) = 1 − x defined over (0, 1] (domain is bounded but not closed) is bounded but never attains
**

its least upper bound, i. e., 1.

**(e) Defining f (x) = 0 in the last two examples shows that both functions require continuity on
**

[a, b].

8.6. An application of Extreme Values Theorem

Result 8.1. Equivalence of norms in finite dimensional vector space.

**If we are given two norms ∥·∥a and ∥·∥b on some finite-dimensional vector space V over Rn ,
**

a very useful fact is that they are always within a constant factor of one another. In other words,

there exists a pair of real numbers 0 < C1 < C2 such that, for all x ∈ V , the following inequality

holds:

C1 ∥x∥b ≤ ∥x∥a ≤ C2 ∥x∥b .

Note that any finite-dimensional vector space, by definition, is spanned by a basis e1 , e2 , · · · , en

where n is the dimension of the vector space. The basis is often chosen to be orthonormal if we

have an inner product. That is, any vector x can be written

n

x = ∑ αi ei ,

i=1

where the αi are some real numbers depending on x.

**Now, we can prove equivalence of norms in four steps, the last of which requires application
**

of the Extreme Value Theorem.

**Step 1 It is sufficient to consider ∥·∥b = ∥·∥1 (Transitivity property for the norms hold.)
**

First, let us define a taxi-cab norm by

n

∥x∥1 = ∑ |αi |

i=1

106 8. Single and Multivariable Calculus

**We have seen earlier in a problem set that it is indeed a norm. The linear independence of
**

any basis {ei } implies that x ̸= 0 ⇐⇒ |α j | > 0 for some j ⇐⇒ ∥·∥1 > 0. The triangle

inequality and the scaling property are obvious and follow from the usual properties of N1

norms on x ∈ Rn .

We will show that it is sufficient for us to prove that ∥·∥a is equivalent to ∥·∥1 , because

norm equivalence is transitive: if two norms are equivalent to ∥·∥1 , then they are equivalent

to each other.

In particular, suppose both ∥·∥a and ∥·∥a′ are equivalent to∥·∥1 for constants 0 < C1 ≤ C2

and 0 < C1′ ≤ C2′ , respectively:

C1 ∥x∥1 ≤ ∥x∥a ≤ C2 ∥x∥1 ,

C1′ ∥x∥1 ≤ ∥x∥a′ ≤ C2′ ∥x∥1 .

Then it immediately follows that

C1′ C′

∥x∥a ≤ ∥x∥a′ ≤ 2 ∥x∥a ,

C2 C1

and hence ∥·∥a and ∥·∥a′ are equivalent.

**Step 2 It is sufficient to consider only x with ∥x∥1 = 1.
**

We want to show that

C1 ∥x∥1 ≤ ∥x∥a ≤ C2 ∥x∥1 ,

is true for all x ∈ V for some C1 ,C2 . It is trivially true for x = 0, so we need only consider

x ̸= 0, in which case we can divide by ∥x∥1 to obtain the condition

x

C1 ≤
≤ C2 ,

∥x∥1
a

where u ≡ x

∥x∥1 has norm ∥u∥1 = 1.

**Step 3 Any norm ∥x∥a is continuous under ∥x∥1 .
**

We wish to show that any norm ∥·∥a is a continuous function on V under the topology

induced by the norm ∥·∥1 . That is, we wish to show that for any ε > 0, there exists a δ > 0

such that

∥x − x′ ∥1 < δ → |∥x∥a − ∥x′ ∥a | < ε

We prove this in two steps. First, by the triangle inequality on ∥·∥a , it follows that

∥x∥a − ∥x′ ∥a = ∥x′ + (x − x′ )∥a − ∥x′ ∥a ≤ ∥x − x′ ∥a ,

and

∥x′ ∥a − ∥x∥a = ∥x − (x − x′ )∥a − ∥x∥a ≤ ∥x − x′ ∥a ,

and therefore,

|∥x∥a − ∥x′ ∥a | < ∥x − x′ ∥a .

8.7. Differentiability 107

**Second, applying the triangle inequality again, and writing x = ∑ni=1 αi ei and x′ = ∑ni=1 α′i ei ,
**

we obtain

n

∥x − x′ ∥a ≤ ∑ |αi − α′i |∥ei ∥a ≤ ∥x − x′ ∥1 (max ∥ei ∥a ).

i

i=1

Therefore, if we choose

ε

δ=

maxi ∥ei ∥a

it immediately follows that

∥x − x′ ∥1 < δ → |∥x∥a − ∥x′ ∥a | < ε

**Step 4 The maximum and minimum of ∥x∥a on the unit sphere
**

Now we have a continuous function (the norm ∥·∥a on a compact (closed and bounded)

non-empty domain, the unit sphere, and can apply Weierstrass Theorem. By the extreme

value theorem, the function must achieve a maximum and minimum value on the set (it

cannot merely approach them). Let

C1 = min ∥u∥a , and C2 = max ∥u∥a .

∥u∥1 =1 ∥u∥1 =1

**Since u ̸= 0 for ∥u∥1 = 1, it follows that C2 ≥ C1 > 0, and C1 ≤ ∥u∥a ≤ C2 as required in
**

Step 2. This completes the proof.

8.7. Differentiability

**Definition 8.13. A function f : R → R is differentiable at x0 ∈ R, if
**

f (x0 + h) − f (x0 )

(8.10) lim exists.

h→0 h

If this limit exists, we call it derivative of f at x0 and denote it by f ′ (x0 ) or d f (x)

dx |x=x0 .

We follow the steps listed below to determine whether a derivative exists and if yes, its value.

(a) ∆ f = f (x0 + h) − f (x0 ) is the change in functional value.

∆f f (x0 +h)− f (x0 )

(b) slope of the secant is h = h .

∆f

(c) If the secant h has a limit as h → 0, then f is differentiable at x0 , and the derivative is

equal to this limit.

108 8. Single and Multivariable Calculus

We can see that the derivative is equal to the slope of the tangent to the graph at x0 . Note that

the tangent can be used to approximate the function in the neighborhood of x0 .

f (x0 + h) = f (x0 ) + h · f ′ (x0 ) .

It is the best linear approximation.

Definition 8.14. A function f : R → R is differentiable on a set S ⊆ R, if it is differentiable at each

point x ∈ S. It is called differentiable if it is differentiable at each point of the domain.

Example 8.5. Let f (x) : R → R be f (x) = x2 . This function is differentiable at all x ∈ R.

f (x0 + h) − f (x0 ) (x0 + h)2 − x02

αsec = =

( h ) h

x02 + h2 + 2x0 h − x02 2x0 h + h2

= =

h h

= 2x0 + h

lim αsec = 2x0 ⇒ f ′ (x0 ) = 2x0 .

h→0

**Definition 8.15. Second derivative: Let function f : R → R be differentiable with f ′ (·) denoting
**

its first derivative. If f ′ (·) is differentiable, its derivative is denoted by f ′′ (·) and is called the

second derivative of f .

Definition 8.16. A function whose derivative exists and is continuous is called continuously dif-

ferentiable or of class C 1 . A function whose second derivative exists and is continuous is called

twice continuously differentiable or of class C 2 .

Result 8.2. If function f : R → R is differentiable at x0 then it is continuous at x0 .

**Proof. Since f : R → R is differentiable at x0 ∈ R, the limit,
**

f (x0 + h) − f (x0 )

lim

h→0 h

exists and is f ′ (x0 ). Consider,

[ ]

f (x) − f (x0 )

lim [ f (x) − f (x0 )] = lim [x − x0 ] ;

x→x0 x→x0 x − x0

[ ]

f (x) − f (x0 )

= lim [x − x0 ] lim

x→x0 x→x0 x − x0

′

= 0 · f (x0 ) = 0

lim f (x) = f (x0 ) .

x→x0

Hence f is continuous at x0 .

8.7. Differentiability 109

f (x) = |x|

f (x) is not differentiable at x = 0

x

Figure 8.2. Continuity does not imply differentiability

Note this claim does not hold in the other direction. Not all continuous functions are differen-

tiable. Consider the example of absolute value function f : R −→ R is defined by

f (x) = |x| .

The absolute value or |x| of x is defined by

{

x if x > 0

|x| =

−x if x < 0.

**It is easy to check that f is continuous on R. However, it is not differentiable at x0 = 0 (Please
**

verify).

8.7.1. Rules of Differentiation.

Theorem 8.10. If f and g are differentiable functions then

(8.11) f ± g is differentiable with ( f ± g)′ (x) = f ′ (x) ± g′ (x)

(8.12) f · g is differentiable with ( f · g)′ (x) = f ′ (x) g (x) + f (x) g′ (x)

( )′

f f f ′ (x) g (x) − f (x) g′ (x)

(8.13) If g ̸= 0, then is differentiable with (x) = ( )2 .

g g g (x)

110 8. Single and Multivariable Calculus

7

6

5

4

3

2

1

−3 −2 −1 1 2 3

−1

−2

−3

−4

−5

−6

−7

Figure 8.3. Graph of x3

**Theorem 8.11. Chain Rule: If f and g are differentiable, then
**

( )

(8.14) f ◦ g is differentiable with ( f ◦ g)′ (x) = f ′ g (x) · g′ (x)

( )

Example 8.6. Let f (y) = ln y and g (x) = x2 . Then, f ◦ g (x) = ln x2 and

1 2

( f ◦ g)′ (x) = · 2x =

x2 x

Theorem 8.12. If f is differentiable and has a local maxima or minima at x0 , then f ′ (x0 ) = 0.

**Note the converse is not true. Take f (x) = x3 (See Figure 8.3). The first derivative is zero at
**

x0 = 0 which is a point of inflection.

8.7. Differentiability 111

y

1

0 x

0.2 0.4 0.6 0.8

−1

Figure 8.4. f(x) = sin 1x

**Following two examples illustrate differentiability and continuous differentiability of a function.
**

Example 8.7. Let f be defined by

( )

x sin 1

x for x ̸= 0

f (x) =

0 for x = 0.

**We know that derivative of sin (x) is cos (x). Using it
**

( ) ( )( )

′ 1 1 1

f (x) = sin + x cos − 2

x x x

( ) ( )

1 1 1

= sin − cos for x ̸= 0.

x x x

At x = 0, this does not work as 1

x is not defined there. We use the definition, for h ̸= 0, the secant is

( )

( )

h −0

1

f (h) − f (0) h sin 1

= = sin .

h h h

( )

As h → 0, sin 1

h does not tend to any limit, so f ′ (0) does not exist.

**Example 8.8. Let f be defined by
**

( )

x2 sin 1

x for x ̸= 0

f (x) =

0 for x = 0.

112 8. Single and Multivariable Calculus

**We know that derivative of sin (x) is cos (x). Using it
**

( ) ( )( )

′ 1 1 1

f (x) = 2x sin 2

+ x cos − 2

x x x

( ) ( )

1 1

= 2x sin − cos for x ̸= 0.

x x

At x = 0, we use the definition as before, for h ̸= 0, the secant is

( )

h 2 sin 1 − 0 ( )

f (h) − f (0) h 1

= = h sin

h h h

( )

f (h) − f (0)

= h sin 1 6 |h|

h h

As h →( 0,)we see that f ′ (0) = 0. Thus f (x) is differentiable everywhere but f ′ (x) is not continuous

as cos 1x does not tend to a limit as x → 0.

**8.7.2. L’Hospital’s Rule. Sometimes we need to determine the value of a function where both the
**

numerator and the denominator go to zero. We use L’Hospital rule in such case. If f (a) = g (a) = 0

and g′ (a) ̸= 0, then

f (x) f ′ (a)

lim = ′ .

x→a g (x) g (a)

**Example 8.9. Find lim 4x√−16
**

2

x−8

.

x→4

√

f (x) = x2 − 16, g (x) = 4 x − 8

2

f (4) = g (4) = 0, f ′ (x) = 2x, g′ (x) = √ .

x

Then

x2 − 16 f ′ (4) 8

lim √ = ′ = = 8.

x→4 4 x − 8 g (4) 1

8.8. Monotone Functions

**Definition 8.17. Function f is monotone increasing at x0 if there exists a neighborhood of x0 such
**

that

f (x1 ) 6 f (x0 ) 6 f (x2 )

for all x1 , x2 in the neighborhood satisfying x1 < x0 < x2 .

8.8. Monotone Functions 113

**Definition 8.18. Function f is strictly increasing at x0 if there exists a neighborhood of x0 such
**

that

f (x1 ) < f (x0 ) < f (x2 )

for all x1 , x2 in the neighborhood satisfying x1 < x0 < x2 .

Definition 8.19. Function f is monotone increasing on an interval if for all points x1 , x2 in the

interval satisfying x1 < x2

f (x1 ) 6 f (x2 ) .

Definition 8.20. Function f is strictly increasing on an interval if for all points x1 , x2 in the interval

satisfying x1 < x2

f (x1 ) < f (x2 ) .

**We define monotone and strictly decreasing functions in the same way by reversing the in-
**

equalities. Some properties of derivative of monotone functions are

{ } { }

> strictly increasing

(8.15) f ′ (x0 ) 0 ⇒ f is at x0 .

< strictly decreasing

{ } { }

> monotone increasing

(8.16) f ′ (x0 ) 0 ⇔ f is at x0 .

6 monotone decreasing

**Theorem 8.13. Mean Value Theorem: Let f be a continuous function on the compact interval [a, b]
**

and differentiable on (a, b). Then there exists a point c ∈ (a, b) where

f (b) − f (a)

f ′ (c) = .

b−a

**Following claim is helpful in proving the Mean Value Theorem. The proof the claim relies on
**

the Weierstrass Theorem and thus is another example of application of the Weierstrass Theorem.

Claim 8.3. Let f (·) and g(·) be continuous functions on [a, b] and differentiable on (a, b). Then

there exist x ∈ (a, b) such that

[ f (b) − f (a)]g′ (x) = [g(b) − g(a)] f ′ (x).

Proof. Define,

h(s) = [ f (b) − f (a)]g(s) − [g(b) − g(a)] f (s).

Then, it is easy to check h(a) = f (b)g(a) − f (a)g(b) = h(b). We need to show that h′ (x) = 0 for

some x ∈ (a, b). If h(x) is a constant function, then h′ (x) = 0 for every point in (a, b). If not, then

consider without loss of generality, h(x) > h(a) for some x ∈ (a, b). Since h(·) is a continuous

function defined on a compact domain [a, b], Weierstrass Theorem can be applied to claim that it

attains a maximum at some point s ∈ (a, b). Also since h(·) is differentiable on (a, b) and attains its

114 8. Single and Multivariable Calculus

f (b)

f (b)− f (a)

b−a

f (a)

f ′ (c)

a c b

**Figure 8.5. Mean Value Theorem f ′ (c) = f (b)− f (a)
**

b−a

**maximum at s ∈ (a, b), h′ (s) = 0. The case where h(x) < h(a) for some x ∈ (a, b) can be proved in
**

similar manner as in this case, the function h(·) will attain a minimum at some interior point.

**To prove the Mean Value Theorem, we consider g(x) = x. Then, g′ (x) = 1 leads to
**

f (b) − f (a)

[ f (b) − f (a)](1) = [b − a] f ′ (x) or f ′ (x) = ,

b−a

for some x ∈ (a, b).

Theorem 8.14. [Darboux’s Theorem] Intermediate Value Theorem for derivative: If f is differ-

entiable on (a, b) then its derivative has the intermediate value property. If x1 < x2 are any two

points in the interval (a, b), and y lies between f ′ (x1 ) and f ′ (x2 ), then there exists a number x in

the interval [x1 , x2 ] such that f ′ (x) = y.

Proof. Using Weierstrass Theorem

**Assume y lies strictly between f ′ (x1 ), and f ′ (x2 ). Define a function g : (a, b) → R by
**

g(t) = f (t) − yt.

Then g′ (x1 ) = f ′ (x1 ) − y and g′ (x2 ) = f ′ (x2 ) − y. Then either (i) g′ (x1 ) > 0 and g′ (x2 ) < 0 or (ii)

g′ (x1 ) < 0 and g′ (x2 ) > 0. Take the first case, i.e. g′ (x1 ) > 0 and g′ (x2 ) < 0. It is clear that neither

x1 nor x2 can be a point where g attains even a local maximum. Since g is a continuous function, it

8.9. Functions of Several Variables 115

**must therefore attain its maximum at an interior point x of the closed and bounded interval [x1 , x2 ]
**

by Weierstrass Theorem. So we conclude that

0 = g′ (x) = f ′ (x) − y, or f ′ (x) = y.

Alternate proof using Mean Value Theorem [see ?]

**We can clearly assume that y lies strictly between f ′ (x1 ) and f ′ (x2 ). Define continuous func-
**

tions fx1 , fx2 : [a, b] → R by

{

f ′ (x1 ) for t = x1

fx1 (t) = f (x1 )− f (t)

x1 −t for t ̸= x1 .

and {

f ′ (x2 ) for t = x2

fx2 (t) = f (t)− f (x2 )

t−x2 for t ̸= x2 .

Observe that fx1 (x1 ) = f ′ (x1 ), fx2 (x2 ) = f ′ (x2 ) and fx1 (x2 ) = fx2 (x1 ). Hence, y lies between fx1 (x1 )

and fx1 (x2 ); or y lies between fx2 (x1 ) and fx2 (x2 ). If y lies between fx1 (x1 ) and fx1 (x2 ), then (by

continuity of fx1 ) there exists s in (x1 , x2 ] with

f (s) − f (x1 )

y = fx1 (s) = .

s − x1

Then by Mean Value Theorem there exists x ∈ [x1 , s] such that

f (s) − f (x1 )

y= = f ′ (x).

s − x1

Similarly if y lies between fx2 (x1 ) and fx2 (x2 ), then (by continuity of fx2 ) there exists s in [x1 , x2 )

and x ∈ [s, x2 ] such that

f (x2 ) − f (s)

y= = f ′ (x).

x2 − s

8.9. Functions of Several Variables

**Let f : D → R where D ⊆ Rn be a function of n variables.
**

(8.17) f (x) = f (x1 , x2 , · · · xn )

Examples of such functions are utility functions for several goods, the production functions for

many inputs etc.

116 8. Single and Multivariable Calculus

**Definition 8.21. The function f (x) is differentiable at the point x if there exists an n dimensional
**

vector D f (x), called the differential or total derivative of f at x, such that

∀ε > 0, ∃δ > 0 ∥x − y∥ < δ

⇒ f (x) − f (y) − D f (x) · (x − y) < ε · ∥x − y∥ .

**8.9.1. Partial Derivative. To us the more important concept is that of partial derivative which we
**

define now.

Definition 8.22. Let f : D → R where D ⊆ Rn be a function of n variables. If the limit

f (x1 , · · · , xi + h, · · · xn ) − f (x1 , · · · , xi , · · · xn )

lim

h→0 h

∂ f (x)

exists, it is called the ith (first order) partial derivative of f at x and is denoted by ∂xi or fi (x).

**The function f (x) is then said to be partially differentiable with respect to xi . The function
**

f (x) is said to be partially differentiable if it is partially differentiable with respect to every xi .

Note ∂ ∂x

f (x)

i

is the derivative of f (x1 , · · · , xn ) with respect to xi holding all other variables con-

stant. When all the partial derivatives exist, the vector of partial derivatives

[ ]

∂ f (x) ∂ f (x)

∇ f (x) = ,··· ,

∂x1 ∂xn

is called the Jacobian vector or the gradient vector. For functions of one variable, ∇ f (x) = f ′ (x).

Result 8.3. If a function is differentiable at x0 then it is partially differentiable at x0 .

**However, existence of all the partial derivatives do not guarantee even the continuity of the
**

function as the following example shows.

Example 8.10. Let f (x, y) be defined as

{

xy

x2 +y2

if (x, y) ̸= (0, 0)

f (x, y) =

0 otherwise.

We can prove that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in R2 , although

f is not continuous at (0, 0).

**If f is a real valued function defined on an open set D in Rn , and the partial derivatives are
**

bounded in D, then f is continuous on D.

Example 8.11. Let f : R2 → R be

f (x1 , x2 ) = x13 + 2x1 x2 + 3x23 .

8.9. Functions of Several Variables 117

Then

∂ f (x) ∂ f (x)

= 3x12 + 2x2 , = 2x1 + 9x22

∂x1 ∂x2

[ ]

∇ f (x) = 3x12 + 2x2 , 2x1 + 9x22 , ∀x ∈ R2

**For functions of one variable we have seen earlier that we could approximate the function
**

around a point by the tangent to the function at the point. We can do something similar in case of

functions of several variables. Instead of approximation by a line (the tangent), we now approxi-

mate by the tangent hyperplane.

Definition 8.23. Given f : D → R with gradient ∇ f (x) at x0 , the tangent hyperplane to f at x0 is

given by

f (x) = f (x0 ) + ∇ f (x0 ) · (x − x0 ) .

Note that in an n dimensional world, the tangent hyperplane is an (n − 1) dimensional object.

**8.9.2. Second Order Partial Derivatives. Let us look at the example above again. For
**

f (x1 , x2 ) = x13 + 2x1 x2 + 3x23 ,

∂ f (x)

∂x1 = 3x12 + 2x2 and ∂∂xf (x)

2

= 2x1 + 9x22 are differentiable functions of x1 and x2 themselves. When

we take partial derivatives of these functions we get the second partial derivatives.

∂2 f (x) ∂2 f (x) ∂2 f (x) ∂2 f (x)

= 6x1 , = 18x 2 , = = 2.

∂x12 ∂x22 ∂x1 ∂x2 ∂x2 ∂x1

This example can be generalized.

Definition 8.24. Let f : Rn → R be twice differentiable. For each of the n partial derivatives, we

get n partial derivative of second order,

( )

∂ ∂ f (x) ∂2 f (x)

= = fi j (x) .

∂x j ∂xi ∂x j ∂xi

We organize the second order derivatives in a matrix, called the Hessian Matrix.

∂2 f (x) ∂2 f (x)

∂x12 · · · · · · ∂xn ∂x1

∂2 f (x)

∂x ∂x · · · · · · ···

2

H f (x) = D f (x) = .

(8.18) .. .

1 2

.. ..

.. . . .

2

∂ f (x) ∂2 f (x)

∂x1 ∂xn · · · · · · ∂x 2

n

118 8. Single and Multivariable Calculus

If all the partial derivatives of the first order exist and are continuous then f is called C 1 or contin-

uously differentiable. If all the partial derivatives of second order exist and are continuous then f

is called C 2 or twice continuously differentiable and so forth.

Theorem 8.15. Young’s Theorem : If f is twice continuously differentiable then

∂2 f (x) ∂2 f (x)

= ,

∂x j ∂xi ∂xi ∂x j

i.e., the Hessian of f is a symmetric matrix.

Example 8.12. For the example above

[ ]

6x1 2

H f (x) = .

2 18x2

The off diagonal element of the Hessian are also called cross-partials. For functions of one

variable, H f (x) = f ′′ (x).

Example 8.13. Let f : R3 → R be

f (x) = 5x12 + x1 x23 − x22 x32 + x33 .

Then [ ]

∇ f (x) = 10x1 + x23 3x1 x22 − 2x2 x32 −2x22 x3 + 3x32

and

10 3x22 0

H f (x) = 3x22 6x1 x2 − 2x32 −4x2 x3

0 −4x2 x3 −2x22 + 6x3

**We now provide three very useful theorems on continuous and differentiable functions on
**

convex sets in Rn for n ≥ 1. They are the Intermediate Value theorem, the Mean Value theorem

and Taylor’s theorem.

Theorem 8.16 (Intermediate Value Theorem:). Suppose A is a convex subset of Rn , and f : A → R

is a continuous function on A. Suppose x1 and x2 are in A, and f (x1 ) > f (x2 ). Then given any

c ∈ R such that f (x1 ) > c > f (x2 ), there is 0 < θ < 1 such that f [θx1 + (1 − θ)x2 ] = c.

Example 8.14. Suppose X ≡ [a, b] is a closed interval in R (with a < b). Suppose f is a continuous

function on X. By Weierstrass theorem, there will exist x1 and x2 in X such that f (x1 ) ≥ f (x) ≥

f (x2 ) for all x ∈ X. If f (x1 ) = f (x2 ) [this is the trivial case], then f (x) = f (x1 ) for all x ∈ X, and

so f (X) is the single point, f (x1 ). If f (x1 ) > f (x2 ), then using the fact that X is a convex set, we

can conclude from the Intermediate Value Theorem that every value between f (x1 ) and f (x2 ) is

attained by the function f at some point in X. This shows that, f (X) is itself a closed interval.

8.10. Composite Functions and the Chain Rule 119

**Theorem 8.17 (Mean Value Theorem). Suppose A is an open convex subset of Rn , and f : A → R
**

is continuously differentiable on A. Suppose x1 and x2 are in A. Then there is 0 ≤ θ ≤ 1 such that

f (x2 ) − f (x1 ) = (x2 − x1 )∇ f (θx1 + (1 − θ)x2 )

Example 8.15. Let f : R → R be a continuously differentiable function with the property that

f ′ (x) > 0 for all x ∈ R. Then given any x1 , x2 in R, with x2 > x1 we have by the Mean-Value

Theorem (since R is open and convex), the existence of 0 ≤ θ ≤ 1, such that

f (x2 ) − f (x1 ) = (x2 − x1 ) f ′ (θx1 + (1 − θ)x2 )

Now f ′ (θx1 + (1 − θ)x2 ) > 0 by assumption, and x2 > x1 by hypothesis. So f (x2 ) > f (x1 ). This

shows that f is an increasing function on R.

**Observe that a function f : R → R can be increasing without satisfying f ′ (x) > 0 at all x ∈ R.
**

For example, f (x) = x3 is increasing on R, but f ′ (0) = 0.

Theorem 8.18 (Taylor’s Expansion up to Second-Order). Suppose A is an open, convex subset of

Rn , and f : A → R is twice continuously differentiable on A. Suppose x1 and x2 are in A. Then

there exists 0 ≤ θ ≤ 1, such that

1

f (x2 ) − f (x1 ) = (x2 − x1 )′ ∇ f (x1 ) + (x2 − x1 )′ H f (θx1 + (1 − θ)x2 )(x2 − x1 )

2

8.10. Composite Functions and the Chain Rule

**Let h : A → Rm be a function with component functions hi : A → R(i = 1, · · · , m) which are defined
**

on an open set A ⊂ Rn . Let f : B → R be a function defined on an open set B ⊂ Rm which contains

the set h(A). Then, we can define G : A → R by G(x) ≡ f [h(x)] ≡ f [h1 (x), · · · , hm (x)] for each

x ∈ A. This function is known as a composite function [of f and h].

**The “Chain Rule” of differentiation provides us with a formula for finding the partial deriva-
**

tives of a composite function, F, in terms of the partial derivatives of the individual functions, f

and h.

Theorem 8.19 (Chain Rule of differentiation). Let h : A → Rm be a function with component

functions hi : A → R(i = 1, · · · , m) which are continuously differentiable on an open set A ⊂ Rn .

Let f : B → R be a continuously differentiable function on an open set B ⊂ Rm which contains the

set h(A). If F : A → R is defined by F(x) = f [h(x)] on A, and a ∈ A, then F is differentiable at a

and we have for i = 1, · · · , n,

m

Di F(a) = ∑ D j f (h1 (a), · · · , hm (a))Di h j (a)

j=1

120 8. Single and Multivariable Calculus

**Example 8.16. Let m = 2, n = 1. Let h1 (x) = x3 on R, and h2 (x) = 10 + x on R; and let f (y1 , y2 ) =
**

y1 + y42 on R2 . Then

F(x) = f [h(x)] = f [h1 (x), h2 (x)] = h1 (x) + [h2 (x)]4 = x3 + (10 + x)4

is a composite function on R. If a ∈ R,

F ′ (a) = D1 F(a) = D1 f (h1 (a), h2 (a)) · D1 h1 (a) + D2 f (h1 (a), h2 (a)) · D1 h2 (a)

= 1 · (3a2 ) + 4(h2 (a))3 · 1 = 3a2 + 4(10 + a)3

Example 8.17. Take m = 1, n = 2. Let h1 (x) = h1 (x1 , x2 ) = x12 + x2 on R2 ; f (y) = 2y on R. Then

F(x) = F(x1 , x2 ) = f [h1 (x1 , x2 )] = 2[x12 + x2 ]. Then if a ∈ R2 ,

D1 F(a) = D1 f [h1 (a1 , a2 )]D1 h1 (a1 , a2 )

D2 F(a) = D1 f [h1 (a1 , a2 )]D2 h1 (a1 , a2 )

Thus, D1 F(a) = 2(2a1 ) = 4a1 ; and D2 F(a) = 2(1) = 2.

Chapter 9

Problem Set 4

**(1) Find the derivative of the following functions from R → R:
**

[ ]1

2x + 1 2

(9.1) f (x) =

x−1

(9.2) f (x) = ln(3x2 − 5x)

(2) Find the equation for the tangent to f (x) = 5x2 + 3x − 2 at x = 2.

(3) Let f : R → R be

{

x2 − 1, x 6 0

f (x) =

−x2 , x > 0.

and g : R → R

{

3x − 2, x 6 2

g (x) =

−x + 6, x > 2.

(a) Is f continuous at x = 0?

(b) Is g continuous at x = 2?

(4) Find

( )

f (x) exp x2 + exp (−x) − 2

(9.3) lim = lim

x→0 g (x) x→0 2x

121

122 9. Problem Set 4

**(5) Evaluate the Hessian of the function f : R2 → R
**

f (x, y) = x2 y + y2 x − 2xy + 3x

at the point (1, 2).

(6) Let f (x, y) be defined as

{

xy

x2 +y2

if (x, y) ̸= (0, 0)

f (x, y) =

0 otherwise.

Show that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in R2 , although f

is not continuous at (0, 0).

**(7) This exercise gives an example of a function with D12 f (x, y) ̸= D21 (x, y). Let f (x, y) be defined
**

as {

xy(x2 −y2 )

x2 +y2

if (x, y) ̸= (0, 0)

f (x, y) =

0 otherwise.

(a) We can prove that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in

(x, y) ∈ R2 and f is continuous on R2 .

(b) The partial derivatives D1 f (x, y) and D2 f (x, y) are continuous at every point in R2 .

(c) The second order cross partial derivatives D12 f (x, y) and D21 f (x, y) exist at every point in

R2 and are continuous everywhere in R2 except at (0, 0).

(d) D21 f (0, 0) = +1 and D12 f (0, 0) = −1.

Chapter 10

Convex Analysis

10.1. Concave, Convex Functions

**Definition 10.1. Function f : D → R is concave if ∀x, y ∈ D, ∀λ ∈ [0, 1],
**

[ ]

(10.1) λ f (x) + (1 − λ) f (y) 6 f λx + (1 − λ) y

Function f is strictly concave if the inequality is strict for all λ ∈ (0, 1).

Following theorem gives a characterization of concave functions.

**Theorem 10.1. Suppose A is a convex subset of Rn and f is a real-valued function on A. Then f
**

is a concave function if and only if the set

C ≡ {(x, α) ∈ A × R : f (x) > α}

is a convex set in Rn+1 .

( ) ( )

Proof. Let the function f be concave. Let x1 , α1 ∈ C and x2 , α2 ∈ C. Then f (x1 ) ≥ α1 and

f (x2 ) ≥ α2 . Since f is concave, and x1 , x2 ∈ A, for every λ ∈ [0, 1],

[ ] ( ) ( )

f λx1 + (1 − λ) x2 ≥ λ f x1 + (1 − λ) f x2 ≥ λα1 + (1 − λ) α2 ,

( )

which implies λx1 + (1 − λ) x2 , λα1 + (1 − λ) α2 ∈ C. Hence C is convex.

123

124 10. Convex Analysis

f(d)

f ′ (d)

f (d)− f (c)

d−c

f(c)

f ′ (c)

c d

Figure 10.1. A concave Function of one variable: f ′ (d) < < f ′ (c)

f (d)− f (c)

d−c

( ) ( )

Next we assume C to be convex. Note for x1 , x2 ∈ A, we have x1 , f (x1 ) ∈ C and x2 , f (x2 ) ∈

C. Since C is convex, for every λ ∈ [0, 1],

( ) ( )

λ · x1 , f (x1 ) + (1 − λ) · x2 , f (x2 ) ∈ C.

This implies

f (λx1 + (1 − λ)x2 ) ≥ λ · f (x1 ) + (1 − λ) · f (x2 ),

or f is concave.

**In general, a concave function on a convex set in Rn need not be continuous as the following
**

example shows.

10.1. Concave, Convex Functions 125

f (x)

b

f (x) is not continuous at x = 0

bC

x

Figure 10.2. Concave function need not be continuous

Example 10.1. Let {

1 + x for x > 0

f (x) =

0 for x = 0.

This function is concave but it is not continuous at x = 0.

However, if the set A is open and convex, then the concave function f is continuous on A.

**If the function is continuously differentiable on an open convex set, then following theorem
**

characterizes the concave functions.

Theorem 10.2. Suppose A ⊂ Rn is an open convex set, and f : A → R is continuously differentiable

on A. Then f is concave on A if and only if

(10.2) f (x2 ) − f (x1 ) 6 ∇ f (x1 )(x2 − x1 )

whenever x1 and x2 are in A.

Proof. We assume f to be a concave function and x2 ∈ A and x1 ∈ A. For λ ∈ [0, 1],

f (λx2 + (1 − λ)x1 ) ≥ λ · f (x2 ) + (1 − λ) · f (x1 ) = λ · f (x2 ) − λ · f (x1 ) + f (x1 ).

126 10. Convex Analysis

Then ( )

f (x1 + λ(x2 − x1 )) − f (x1 ) ≥ λ · f (x2 ) − f (x1 ) .

Dividing both sides by λ, we get

f (x1 + λ(x2 − x1 )) − f (x1 )

≥ f (x2 ) − f (x1 ).

λ

Taking λ → 0, we get

∇ f (x1 ) · (x2 − x1 ) ≥ f (x2 ) − f (x1 ),

which proves the inequality.

**Next we assume (10.2) holds true for all x2 , x1 ∈ A. Then for any λ ∈ [0, 1], let x = λx2 + (1 −
**

λ)x1 . Since A is convex, x ∈ A. Note

x2 − x = x2 − λx2 − (1 − λ)x1 = (1 − λ)(x2 − x1 ).

Also

x1 − x = x1 − λx2 − (1 − λ)x1 = −λ(x2 − x1 ).

Applying (10.2), we get

f (x2 ) − f (x) 6 ∇ f (x) · (x2 − x) = ∇ f (x) · (1 − λ)(x2 − x1 ),

and

f (x1 ) − f (x) 6 ∇ f (x) · (x1 − x) = ∇ f (x) · (−λ)(x2 − x1 ).

**We multiply the first inequality by λ and the second inequality by 1 − λ and add to obtain
**

λ · f (x2 ) + (1 − λ) · f (x1 ) − f (x) 6 0,

which implies

λ · f (x2 ) + (1 − λ) · f (x1 ) 6 f (x) = f (λx2 + (1 − λ)x1 ).

So f is concave.

**Also the function will be strictly concave if we change the weak inequality to strict inequality.
**

Theorem 10.3. Suppose A ⊂ Rn is an open convex set, and f : A → R is continuously differentiable

on A. Then f is strictly concave on A if and only if

f (x2 ) − f (x1 ) < ∇ f (x1 )(x2 − x1 )

whenever x1 and x2 are in A.

**Now we consider twice continuously differentiable functions. Following two theorems char-
**

acterize concave and strictly concave functions.

10.1. Concave, Convex Functions 127

−3 −2 −1 1 2 3

−1

−2

−3

−4

−5

−6

−7

Figure 10.3. Graph of −x4

**Theorem 10.4. Suppose A ⊂ Rn is an open convex set, and f : A → R is twice continuously
**

differentiable on A. Then f is concave on A if and only if H f (x) is negative semi-definite for all

x ∈ A.

**If H f (x) is negative definite whenever x ∈ A, then the function is strictly concave, but the
**

converse is not true.

Theorem 10.5. Suppose A ⊂ Rn is an open convex set, and f : A → R is twice continuously

differentiable on A. If H f (x) is negative definite for all x ∈ A then f is strictly concave on A.

**Following example shows that the converse implication does not hold.
**

Example 10.2. Let f : R → R be defined by f (x) = −x4 for all x ∈ R (See Figure 2). This is a

twice continuously differentiable function on the open, convex set R. We can verify that f is strictly

concave on R, but since f ′′ (x) = −12x2 , f ′′ (0) = 0. This shows that the converse implication is not

valid.

**Claim 10.1. If f : D → R is a function of one variable and is twice continuously differentiable then
**

∀ x ∈ D, f ′′ (x) 6 0 ⇔ f is concave.

Definition 10.2. Function f : D → R is convex if ∀x, y ∈ D, ∀λ ∈ [0, 1],

[ ]

(10.3) λ f (x) + (1 − λ) f (y) > f λx + (1 − λ) y

128 10. Convex Analysis

**Function f is strictly convex if the inequality is strict for all λ ∈ (0, 1).
**

Claim 10.2. If f : D → R is a function of one variable and is twice continuously differentiable then

∀ x ∈ D, f ′′ (x) > 0 ⇔ f is convex.

**Note that a local maxima (minima) of a concave (convex) function is a global maxima (minima)
**

as well.

**10.1.1. Hessian, Concavity and Convexity.
**

Theorem 10.6. Let f : D → R (where D ⊆ Rn is open and convex) be twice continuously differ-

entiable. Then,

**(10.4) f is concave if and only if H f (x) is NSD ∀x ∈ D.
**

(10.5) f is convex if and only if H f (x) is PSD ∀x ∈ D.

(10.6) H f (x) is ND ∀x ∈ D ⇒ f is strictly concave.

(10.7) H f (x) is PD ∀x ∈ D ⇒ f is strictly convex.

Corollary 1. For a function of one variable, this means,

**(10.8) f is concave if and only if f ′′ (x) 6 0 ∀x ∈ D.
**

(10.9) f is convex if and only if f ′′ (x) > 0 ∀x ∈ D.

(10.10) f ′′ (x) < 0 ∀x ∈ D ⇒ f is strictly concave.

(10.11) f ′′ (x) > 0 ∀x ∈ D ⇒ f is strictly convex.

Example 10.3. The implication

f is strictly convex ⇒ f ′′ (x) > 0, ∀x ∈ D

does not hold.

**Take f (x) = x4 , f ′′ (x) = 12x2 . It is strictly convex everywhere but f ′′ (0) = 0. We would need
**

f ′′ (x)> 0, ∀x ∈ D for the Hessian to be PD.

10.2. Quasi-concave Functions 129

7

6

5

4

3

2

1

−3 −2 −1 1 2 3

Figure 10.4. Graph of x4

**10.1.2. Some Useful Results.
**

Proposition 3.

**(a) If f and g are concave (convex)and a > 0, b > 0 then a f + bg is concave(convex).
**

( )

(b) If f (x) is concave (convex) and F (u) concave(convex) and increasing then U (x) = F f (x)

is concave(convex).

(c) Function f is concave if and only if − f is convex.

10.2. Quasi-concave Functions

**Definition 10.3. Function f : D → R is quasi-concave if ∀x, y ∈ D, ∀λ ∈ [0, 1]
**

( ) { }

f λx + (1 − λ) y > min f (x) , f (y) .

{ }

Theorem 10.7. Function f : D{→ R is quasi-concave

} if and only if ∀a ∈ R, the set f + = x ∈ D | f (x) > a

a

is a convex set. The set fa+ = x ∈ D | f (x) > a is called upper contour set.

Definition 10.4. Function f : D → R is quasi-convex if function − f is quasi-concave.

{

− = x ∈ D | f (x) 6 a

}

Theorem 10.8. Function f : D{→ R is quasi-convex} if and only if ∀a ∈ R, the set f a

is a convex set. The set fa− = x ∈ D | f (x) 6 a is called lower contour set.

130 10. Convex Analysis

Theorem 10.9.

**f : D → R concave ⇒ f is quasi-concave,
**

f : D → R convex ⇒ f is quasi-convex.

**Note that for functions of one variable, any monotone function is quasi-concave. This however
**

does NOT apply for functions of more than one variable. Also all quasi-concave functions need

not be concave. Take f (x) = x2 it is monotone increasing, hence quasi-concave. But it is not

concave, rather it is convex. For functions of one variable, following theorem characterizes the

quasi-concave functions.

**Theorem 10.10. A function f of a single variable is quasiconcave if and only if either (a) it is
**

non-decreasing, (b) it is non-increasing, or (c) there exists x∗ such that f is non-decreasing for

x < x∗ and non-increasing for x > x∗ .

**10.2.1. Bordered Hessian. To check quasi-concavity of a C 2 function, we use bordered Hessian
**

matrix.

Definition 10.5. Bordered Hessian Let f be a C 2 function.

**The bordered Hessian is
**

∂ f (x) ∂ f (x)

0 ∂x1 ··· ··· ∂xn

∂ f (x)

∂2 f (x)

··· ··· ∂2 f (x)

∂x1 ∂x12 ∂xn ∂x1

B (x) =

···

∂2 f (x)

∂x1 ∂x2 ··· ··· ··· .

. .. .. .. ..

.. . . . .

∂ f (x) ∂2 f (x) ∂2 f (x)

∂xn ∂x1 ∂xn ··· ··· ∂xn2

**Let Br (x) denote the sub matrix of the first (r + 1) rows and columns of Br (x), i.e., Br (x) is
**

(r + 1) × (r + 1) matrix.

( )

Condition 1. A necessary condition for f to be quasiconcave is that (−1)r det Br (x) > 0, ∀r =

1, 2, · · · , n; ∀x ∈ D.

( )

Condition 2. A sufficient condition for f to be quasiconcave is that (−1)r det Br (x) > 0, ∀r =

1, 2, · · · , n; ∀x ∈ D.

10.2. Quasi-concave Functions 131

**When we check for quasi-concavity, we have to check the sufficient conditions. We need
**

[ ]

0 f1

det < 0,

f1 f11

0 f1 f2

det f1 f11 f12 > 0, etc.

f2 f21 f22

**Remark 10.1. When we have to check whether a function is quasi-concave, start out checking
**

whether it is concave because it is easier to check for concavity and concavity implies quasi-

concavity.

Remark 10.2. Quasi-concavity is preserved under monotone transformation whereas concavity

need not be preserved.

√

Example 10.4. Let f (x, y) = xy for (x, y) ∈ R2++ .

Then √ √

y

− 1

4√ x3

1

4

1

xy

H f (x) = 1 1 √ .

4 xy − 14 x

y3

The principal minors of order one are negative and of order two is zero. Hence f (x) is concave and

so quasi-concave.

( )4

Let us take a monotone transformation g (x, y) = f (x, y) = x2 y2 , for (x, y) ∈ R2++ .

0 2xy2 2x2 y

B (x, y) = 2xy2 2y2 4xy

2x2 y 4xy 2x2

[ ]

( ) 0 2xy2

det B1 (x, y) = = −4x2 y4 < 0

2xy2 2y2

( )

⇒ (−1)1 det B1 (x, y) > 0, ∀ (x, y) ∈ R2++ .

( ) ( ) ( )

det B2 (x, y) = −2xy2 4x3 y2 − 8x3 y2 + 2x2 y 8x2 y3 − 4x2 y3

= 8x4 y4 + 8x4 y4 = 16x4 y4 > 0, (x, y) ∈ R2++

⇒ g(x, y) is quasi-concave.

Note however, g (x, y) is not concave.

[ ]

2y2 4xy

Hg (x, y) =

4xy 2x2

132 10. Convex Analysis

**Principal minors of order one are strictly positive and of order two is −12x2 y2 which is strictly
**

negative. Thus g (x, y) is not concave.

Chapter 11

Problem Set 5

(1) Prove or give a counterexample: The sum of two concave functions is concave.

**(2) Which of the following is true? Prove or give a counterexample.
**

(a) If A and B are convex sets, then A ∪ B is convex.

(b) If A and B are convex sets, then A ∩ B is convex.

**(3) Determine whether each of the following functions is quasiconcave:
**

(a) f (x) = 3x + 4;

(b) g(x, y) = yex , y > 0;

(c) h(x, y) = −x2 y3 .

**(4) Show using an example that the sum of two quasiconcave functions need not be quasiconcave
**

(in general).

**(5) Consider the functions:
**

(i) f (x, y, z) = 8x3 + 2xy2 − z3

(ii) g(x, y) = x + y − ex − ex+y

Write out the gradient vector and the Hessian matrices ∇ f (x, y, z) and H f (x, y, z) and ∇g(x, y)

and Hg (x, y). State if f concave, quasiconcave, quasiconvex? What about function g?

133

Chapter 12

**Inverse and Implicit
**

Function Theorems

12.1. Inverse Function Theorem

**Recall the earlier discussions on inverse functions. Consider a real-valued function, f : R → R
**

defined by f (x) = 4x. It is one - to - one on R; and also we can define a function g : R → R by

g(y) = 4y . The function g(y) satisfies the property g[ f (x)] = x and is called the inverse function of

f on R. Furthermore g′ [ f (x)] = f ′1(x) for all x ∈ R.

**This idea can be extended to the domains of the function, A, being subsets of Rn , with the
**

function f defined from A to R. Then f is one-to-one on A if for all x1 , x2 ∈ A, x1 ̸= x2 , we have

f (x1 ) ̸= f (x2 ). In this case, if there is a function g, from f (A) to A, such that g[ f (x)] = x for each

x ∈ A, then g is called the inverse function of f on f (A).

**More generally, let A be an open set in R, and f : A → R be continuously differentiable on A.
**

Let a ∈ A, and suppose that f ′ (a) ̸= 0. If f ′ (a) > 0, then there is an open interval B(a, r) such that

f ′ (x) > 0 for all x in B(a, r), and f is increasing on B(a, r). Thus, for every z ∈ f [B(a, r)], there

is a unique x in B(a, r) such that f (x) = y. Or there is a unique function h : f [B(a, r)] → B(a, r)

such that h[ f (x)] = x for all x ∈ B(a, r). Thus, h is an inverse function of f on f [B(a, r)]. In other

word, h is the inverse of f “locally” around the point f (a). We have not guaranteed that the inverse

function is defined on the entire set f (A). Similarly, if f ′ (a) < 0, an inverse function could be

135

136 12. Inverse and Implicit Function Theorems

**defined “locally” around f (a). The important restriction to carry out the kind of analysis noted
**

above is that f ′ (a) ̸= 0.

**To illustrate this, consider f : R → R+ given by f (x) = x2 . Consider the point a = 0. Clearly f
**

is continuously differentiable on R, but f ′ (0) = 0. Now, we cannot define a unique inverse function

of f even “locally” around f (0). If we choose any open ball B(0, r), and consider any point y ̸= 0

in the set f [B(0, r)], then there will be two values x, x′ in B(0, r), x ̸= x′ , such that f (x) = y = f (x′ ).

**We note here that f ′ (a) ̸= 0 is not a necessary condition to get a unique inverse function of f .
**

For example if f : R → R is defined by f (x) = x3 , then we have f to be continuously differentiable

on R, with f ′ (0) = 0. However f is an increasing function, and clearly has a unique inverse function

g(y) = y1/3 on R, and hence locally around f (0).

**Following theorem deals with the existence and properties of inverse functions.
**

Theorem 12.1 (Inverse Function Theorem). Let A be an open set of Rn , and f : A → Rn be con-

tinuously differentiable on A. Suppose a ∈ A and the Jacobian of f at a is non-zero. Then there is

an open set X ⊂ A containing a, and an open set Z ⊂ Rn containing f (a), and a unique function

h : Z → X, such that:

(i) f (X) = Z;

(ii) f is one-to-one on X;

(iii) h(Z) = X, and h[ f (x)] = x for all x ∈ X.

Further, h is continuously differentiable on Z.

**Following example shows that continuity of f ′ is needed in the inverse function theorem, even
**

in the case n = 1.

Example 12.1. Let ( )

1

f (t) = t + 2t 2 sin for t ̸= 0 and f (0) = 0,

t

then f ′ (0) = 1, f ′ is bounded in (−1, 1), but f is not one-to-one in any neighborhood of 0.

12.2. The Linear Implicit Function Theorem

**For the system of simultaneous linear equations Ax = b, we have seen earlier, that there exists a
**

unique solution for every choice of right hand side column vector b, if and only if the rank of A

12.2. The Linear Implicit Function Theorem 137

**is equal to the number of rows of A which is equal to the number of columns of the matrix A.
**

In economic models, the vector b represents some externally determined (exogenous) parameters

while the linear equations constitute some equilibrium conditions which determine the vector x

which is the set of internal (endogenous) variables.

**In this sense it is possible to divide the set of variables in two disjoint subsets of endogenous and
**

exogenous variables. Thus a general linear economic model will have m equations in n unknowns:

a11 x1 + a12 x2 + · · · + a1n xn = b1

··· ··· ··· ··· ···

am1 x1 + am2 x2 + · · · + amn xn = bm

**In general it will be possible to divide the set of variables into endogenous variables and exoge-
**

nous variables. Such a division will be useful only if after substituting the values of the exogenous

variables in the m equations, it is possible to obtain a solution of the system for the remaining en-

dogenous variables. For this two conditions must hold. The number of endogenous variables must

be equal to the number of equations m and the square matrix corresponding to the endogenous

variables must have maximal rank m.

**A formal statement of the above observation is known as the linear version of Implicit Function
**

Theorem.

**Theorem 12.2. Let x1 , · · · , x j ; x j+1 , · · · , xn be a partition of the n variables in the system of
**

equations (20.2) into endogenous and exogenous variables respectively. Then there exists, for

every choice of the exogenous variables, x̄ j+1 , · · · , x̄n , a unique set of the values, x̄1 , · · · , x̄ j , if and

only if

(a) j = m, i.e., number of endogenous variables = number of equations;

**(b) the rank of j × j square matrix
**

a11 a12 . . . a1 j

a21 a22 . . . a2 j

(12.1)

[A] = . ..

.. ..

j× j .. . . .

a j1 a j2 . . . a jk

corresponding to the endogenous variables is j.

Here is an example for this theorem.

Exercise 12.1.

138 12. Inverse and Implicit Function Theorems

**Let the system of equations be
**

x+ 2y+ z− w = 1

3x− y− 4z+ 2w = 3

0x+ y+ z+ w = 0

Determine how many variables can be endogenous at any one time and show a partition of the vari-

ables into endogenous and exogenous variables such that the system of equations have a solution.

**Find an explicit formula for the endogenous variables in terms of the exiguous variables.
**

Exercise 12.2.

**Let the system of equations be
**

−x+ 3y− z+ w = 0

4x− y+ 2z+ w = 3

7x+ y+ z+ 3w = 6

Is it possible to partition the variables into endogenous and exogenous variables such that the

system of equations have a unique solution.

12.3. Implicit Function Theorem for R2

**Consider the following example of a non-linear implicit function.
**

y2 − 6xy + 5x2 = 0.

Given any value of x, we can solve this equation for y. For example if x = 0, then y = 0; if x = 1

the equation takes the form y2 − 6y + 5 = 0 and yields y = 1 or y = 5 as solution. Observe that it

is possible to solve y explicitly in terms of x (it turns out to be a correspondence) by applying the

quadratic formula:

√

6x ± 36x2 − 20x2

y=

2

or y = 5x or y = x.

**It is possible to apply quadratic formula to the implicit function xy2 − 3y − 2 exp x = 0 to obtain
**

an explicit function for y as

√

3 ± 9 + 8 exp x

y= .

2x

However it could turn out to be the case that the explicit functions more difficult to work with than

the original implicit function.

12.3. Implicit Function Theorem for R2 139

**If we come across an implicit function
**

y5 − 5xy + 4x2 = 0

then it is not possible to solve it in explicit form as there is no general formula for solving a quintic

equation. Note however that the equation still defines y as an implicit function of x. For x = 0, we

get y = 0, for x = 1 we get y = 1 and so on.

Example 12.2. A profit maximizing firm uses single input x (with unit cost w per unit) to produce

an output y using production function y = f (x). Let the price of the output be p per unit. Then the

profit function for this firm given p and w is

Π(x) = p · f (x) − w · x.

To obtain the optimal input x which maximizes the profit , we take the first order condition, which

is

p · f ′ (x) − w = 0.

We can treat p and w as exogenous variables and then this equation defines x as a function of p and

w. The equation need not yield x as an explicit function of p and w. However, it does define x as an

implicit function of p and w and we can use it to estimate the change in x in response to changes in

p and w.

**Consider functions of the form
**

y = G(x1 , · · · , xn ).

In this the endogenous variable y is an explicit function of the exogenous variables (x1 , · · · , xn ).

Such an ideal situation need not occur in every case. More frequently we come across functions of

the form

(12.2) F(x1 , · · · , xn ; y) = 0.

If the function G determines value of y for each set of values (x1 , · · · , xn ), then we say that Eq. (12.2)

defines the endogenous variable y as an implicit function of the exogenous variables (x1 , · · · , xn ).

**We consider implicit functions in R2 of the form F(x, y) = c and analyze following question.
**

For a given implicit function F(x, y) = c and a specified solution (x0 , y0 ),

**(a) Does F(x, y) = c determine y as a continuous function of x for points (x, y) such that x is near
**

x0 and y is near y0 ?

(b) If so, how do the changes in x affect the corresponding values of y?

**More formally the two questions can be rephrased as under:
**

140 12. Inverse and Implicit Function Theorems

**(a) Given the implicit function F(x, y) = c, determine a point (x0 , y0 ) such that F(x0 , y0 ) = c, and
**

also does there exist a continuous function y = f (x) defined on an interval I around x0 so that:

(1) F(x, f(x))=c for all x ∈ I and

(2) y0 = f (x0 )?

**(b) If y = f (x) exists and is differentiable, what is f ′ (x0 )?
**

Theorem 12.3. Let F(x, y) = c be a continuously differentiable function on an open ball around

(x0 , y0 ) in R2 . Suppose F(x0 , y0 ) = c, and consider the expression

F(x, y) = c.

∂F(x,y)

If ̸= 0, then there exists a continuously differentiable function y = f (x) defined on an

∂y (x0 ,y0 )

open interval I around x0 such that:

(a) F(x, f (x)) = c for all x ∈ I,

(b) y0 = f (x0 ), and

(c)

∂F(x,y)

′ ∂x (x ,y )

f (x0 ) = − ∂F(x,y) 0 0 .

∂y (x0 ,y0 )

**Example 12.3. Consider the function F : R2 → R given by F(x, y) = x2 + y2 = 1 (the graph of this
**

function is a circle with radius r = 1. If we choose (a, b) with F(a, b) = 1, and a ̸= 1, a ̸= −1,

then there are open intervals I ⊂ R containing a, and Y ⊂ R containing b, such that if x ∈ I, there

is a unique y ∈ Y with F(x, y) = 0. Thus, we can define a√unique function f : I → Y such that

f (x, f (x)) = 0 for all x ∈ I. If a > 0 and b > 0, then g(x) = 1 − x2 on I. We say such a function

is defined implicitly by the equation f (x, y) = 0. Note that if a = 1, and b = 0, so that f (a, b) = 0,

we cannot find such a unique function, f .

Chapter 13

Homogeneous and

Homothetic Functions

13.1. Homogeneous Functions

**Most of us have come across homogeneous functions in the elementary algebra courses. For exam-
**

ple f (x) = ax is homogeneous of degree 1, f (x) = axm is homogeneous of degree m, f (x) = ax + 1

is not a homogeneous function, and so on. First we define the homogeneous function formally.

**Definition 13.1. For any scalar k, a real valued function f (x1 , · · · , xn ) is homogeneous of degree k
**

on Rn+ if for all x ∈ Rn+ , and all t > 0,

f (tx1 , · · · ,txn ) = t k f (x1 , · · · , xn ).

Some examples of homogeneous functions are:

**(a) Consider f : R2+ → R given by f (x1 , x2 ) = x12 x23 . Then if t > 0, we have f (tx1 , tx2 ) = (tx1 )2 (tx2 )3 =
**

t 2+3 x12 x23 = t 5 f (x1 , x2 ). So, f is homogeneous of degree 5.

(b) The function f (x1 , x2 ) = x1a x2b is homogeneous of degree a + b.

**(c) The function f : R2+ → R given by f (x1 , x2 ) = x12 x2 + 3x1 x22 + x23 is homogeneous of degree 3,
**

since each term is homogeneous of degree 3.

141

142 13. Homogeneous and Homothetic Functions

(d) A linear function, f (x1 , · · · , xn ) = a1 x1 + · · · + an xn , is homogeneous of degree 1.

(e) A quadratic for, Q(x, A) = x′ Ax = ∑ ai j xi x j is homogeneous of degree 2.

**(f) The function f : R2+ → R given by f (x1 , x2 ) = 3x12 x23 − 6x15 x22 is not homogeneous since the first
**

term is homogeneous of degree 5 but the second term is homogeneous of degree 7.

**Let us look at the function f (x1 , x2 ) = x1a x2b again. We can calculate the partial derivatives of f
**

on R2++ . Thus,

∂ f (x1 , x2 ) ∂ f (x1 , x2 )

= ax1a−1 x2b ; = bx1a , x2b−1 .

∂x1 ∂x2

Now, if t > 0, then

∂ f (tx1 ,tx2 ) ∂ f (x1 , x2 )

= a(tx1 )a−1 (tx2 )b = t a+b−1 ax1a−1 x2b = t a+b−1 .

∂x1 ∂x1

So ∂ f (x∂x11,x2 ) is homogeneous of degree (a + b − 1). Similarly, one can check that ∂ f (x∂x12,x2 ) is homo-

geneous of degree (a + b − 1). More generally, whenever a function, f , is homogeneous of degree

k, its partial derivatives are homogeneous of degree (k − 1).

Theorem 13.1. Suppose f is homogeneous of degree k on Rn+ , and continuously differentiable on

Rn++ . Then for each i = 1, · · · , n, ∂ f (x1∂x,···i ,xn ) is homogeneous of degree (k − 1) on Rn++ .

**Proof. To prove this let t > 0 be given. Then,
**

(13.1) f (tx1 , · · · ,txn ) = t k f (x1 , · · · , xn )

We can consider f (tx) to be a function of n + 1 variables, t, x1 , · · · , xn . We will show this result for

the partial derivative with respect to x1 . In this case the remaining variables t, x2 , · · · , xn are held

as constant. Applying the Chain Rule, we have for i = 1, the partial derivative of the expression on

the left hand side of (13.1) is

∂ f (tx1 , · · · ,txn ) ∂tx1

(13.2) · = D1 f (tx1 , · · · ,txn ) · t

∂tx1 ∂x1

The partial derivative of the function on the right hand side of (13.1) is t k ∂ f (x∂x

1 ,··· ,xn )

1

. Equality of the

two expressions lead to

∂ f (x1 , · · · , xn )

(13.3) D1 f (tx1 , · · · ,txn ) · t = t k

∂x1

Dividing by t, we get,

∂ f (x1 , · · · , xn )

(13.4) D1 f (tx1 , · · · ,txn ) = t k−1

∂x1

Thus the partial derivatives are homogeneous functions of degree k − 1.

13.1. Homogeneous Functions 143

**We can also verify that
**

x1 D1 f (x1 , x2 ) + x2 D2 (x1 , x2 ) = ax1a x2b + bx1a x2b = (a + b)x1a x2b = (a + b) f (x1 , x2 ).

More generally, when a function, f , is homogeneous of degree k, then x ∇ f (x) = k f (x), a result

known as Euler’s theorem.

Theorem 13.2 (Euler’s Theorem). Suppose f : Rn+ → R is homogeneous of degree k on Rn+ and

continuously differentiable on Rn++ . Then,

∂ f (x1 , · · · , xn ) ∂ f (x1 , · · · , xn )

x1 · + · · · + xn · = k f (x)

∂x1 ∂xn

x · ∇ f (x) = k f (x) for all x ∈ Rn++

**Proof. To prove this, let
**

f (tx) = t k f (x1 , · · · , xn )

Then, applying the Chain Rule, we have

d f (tx) ∂ f (tx) ∂ f (tx)

(13.5) = · x1 + · · · + · xn

dt ∂x1 ∂xn

But since f is homogeneous of degree k, we have

f (tx) = t k f (x1 , · · · , xn )

and,

d f (tx)

(13.6) = kt k−1 f (x1 , · · · , xn )

dt

Take t = 1 to complete the proof.

**Following is a converse of the Euler Theorem.
**

Theorem 13.3 (Euler’s Theorem). Suppose f : Rn+ → R is continuous function on Rn+ and contin-

uously differentiable on Rn++ . Also suppose,

∂ f (x1 , · · · , xn ) ∂ f (x1 , · · · , xn )

x1 · + · · · + xn · = k f (x)

∂x1 ∂xn

for all x ∈ Rn++ . Then, f is homogeneous of degree k.

**A useful geometric property of the homogeneous function is as follows. Let f (x) be a homo-
**

geneous function of degree one and consider the level set f (x) = 1. In the producers’ theory, the

function f could be a constant returns to scale production function and the level sets would then be

the iso - quants. Let x be a point on the iso-quant f (x) = 1. If we translate the point x by a factor r

along the ray joining point x and the origin, we obtain a point on the iso - quant f (z) = r.

144 13. Homogeneous and Homothetic Functions

**Similarly if the function f is homogeneous of degree k, then translation of points on iso - quant
**

q = 1 by a factor r along the ray joining point x and the origin would generate the iso - quant q = rk ,

since f (rx) = rk f (x) = rk as f (x) = 1. Thus the level sets of a homogeneous function are radial

expansions and contractions of each other. This observation leads to following consequence.

Theorem 13.4. Suppose f : Rn+ → R is a homogeneous function which is continuously differen-

tiable on Rn++ . Then, the tangent planes of the level sets of f have constant slope along each ray

from the origin.

13.2. Homothetic Functions

**Definition 13.2. A function f : Rn+ → R is homothetic function if it is a monotone transformation
**

of a homogeneous function.

**Thus if there is a monotone transformation, g : R → R and a homogeneous function h : Rn+ → R
**

such that f (x) = g(h(x)) holds for all x in the domain, then f is a homothetic function.

**The function, f (xy) = (xy)3 + xy is homothetic as h(xy) = z = xy is homogeneous function of
**

degree 2 and g(z) = z3 + z is a monotone transformation of z.

**Following theorem characterizes the homothetic function.
**

Theorem 13.5. Suppose f : Rn+ → R be a strictly monotonic function. Then, f is homothetic if

and only if for all x and y in Rn+ ,

f (x) ≥ f (y) ⇔ f (θx) ≥ f (θy) for all θ > 0.

**Following theorem provides a necessary condition for a function to be homothetic in terms of
**

the partial derivatives.

Theorem 13.6. Suppose f : Rn+ → R is continuously differentiable on Rn++ . If f is homothetic

then, the tangent planes to the level sets of f are constant along rays from the origin; i. e., in other

words, for every i and j and for every x in Rn++

∂ f (tx) ∂ f (x)

∂xi ∂xi

(13.7) ∂ f (tx)

= ∂ f (x)

for all t > 0.

∂x j ∂x j

**The converse of this theorem is also true and is stated here for the sake of completeness.
**

Theorem 13.7. Suppose f : Rn+ → R is continuously differentiable on Rn++ . If (13.7) holds for all

x in Rn++ , for every i and j and for all t > 0, then f is homothetic.

Chapter 14

Problem Set 6

**(1) Let the system of equations be
**

x+ 3y+ z− 2w = 1

2x+ 6y− 2z− 4w = 3

(a) Determine how many variables can be endogenous at any one time and show a partition of

the variables into endogenous and exogenous variables such that the system of equations

have a solution.

(b) Find an explicit formula for the endogenous variables in terms of the exiguous variables.

**(2) Let the system of equations be
**

−x+ 3y− z+ w = 0

4x− y+ z+ w = 3

7x+ y+ z+ 3w = 6

Is it possible to partition the variables into endogenous and exogenous variables such that the

system of equations have a unique solution.

**(3) Show that the equation x2 − xy3 + y5 = 19 is an implicit function of y in terms of x in the
**

neighborhood of (x, y) = (5, 2). Then estimate the value of y which corresponds to x = 4.9.

**(4) Consider the function f (x, y, z) = x2 − y2 + z3 .
**

(a) If x = 6 and y = 3, find a value of z which satisfies the equation f (x, y, z) = 0.

(b) Verify if this equation

( defines

) z as(an )implicit function of x and y near x = 6 and y = 3.

∂z ∂z

(c) If it does, compute ∂x and ∂y .

(6,3) (6,3)

(d) If x increases to 6.1 and y decreases to 2.8, estimate the corresponding change in z.

145

146 14. Problem Set 6

**(5) Consider the profit maximizing firm described in the Example 12.2. If p increases by ∆p and
**

w increases by ∆w, what will be the change in the optimal input amount x?

**(6) Consider 3x2 yz + xyz2 = 96 as defining x as an implicit function of y and z around the point
**

x = 2, y = 3, z = 2.

(a) If y increases to 3.1 and z remains same at 2, use the Implicit Function Theorem to estimate

the corresponding x.

(b) Use the quadratic formula to solve 3x2 yz + xyz2 = 96 for x as an explicit function of y and

z.

(c) Use the approximation by differentials on the explicit formula to estimate x when y = 3.1

and z = 2.

(d) Which of the two methods is easier?

**(7) Let f : R+ → R be homogeneous of degree one. Prove that for all x, y ∈ R+ ,
**

f (x + y) = f (x) + f (y).

**(8) Let f : Rn+ → R be a non-decreasing, quasi-concave and homogeneous of degree one function.
**

Show that f must be concave on Rn+ .

**(9) Let f be a continuous function from Rn+ to R, which is twice continuously differentiable on
**

Rn++ . Suppose f is homogeneous of degree m, where m is a positive integer ≥ 2. Show that

x′ H f (x)x = m(m − 1) f (x)

for all x ∈ Rn++ where H f (x) is the Hessian of f evaluated at x.

Chapter 15

Unconstrained

Optimization

15.1. Optimization Problem

We call

(15.1) max f (x) , x ∈ D ⊆ Rn ,

or

(15.2) min f (x) , x ∈ D ⊆ Rn ,

where domain D is an open set, unconstrained optimization problems. There are no restrictions on

x within the domain. Furthermore, there are no boundary solutions, because the domain does not

include its boundary (recall the definition of open set). Note max f (x) , x ∈ Rn or min f (x) , x ∈ Rn

are unconstrained optimization problems since Rn is an open set. While solving unconstrained op-

timization problem, we want to use the tools we developed earlier, i.e., find points where ∇ f (x) = 0

and investigate the curvature / shape of the function.

Remark 15.1. An unconstrained optimization problem may not have a solution.

Example 15.1. Let f (x) = x2 . Then,

(15.3) max f (x) , x ∈ R

does not have a solution. See the graph of f (x) = x2 .

147

148 15. Unconstrained Optimization

7

6

5

4

3

2

1

−3 −2 −1 1 2 3

Figure 15.1. Graph of x2

**Remark 15.2. A minimization problem can always be turned into a maximization problem and
**

vice versa:

(15.4) min f (x) ⇔ max − f (x) .

x∈D x∈D

**We will see several examples of unconstrained optimization in these notes. Also there are
**

additional exercises in the problem set.

15.2. Maxima / Minima for C 2 functions of n variables

**Theorem 15.1. First order necessary condition for local maxima / minima: Let A be an open
**

set in Rn , and let f : A → R be a continuously differentiable function on A. If function f has local

maximum / minimum at x∗ , then

( )

∇ f x∗ = 0

where 0 is a n × 1 null vector.

Remark 15.3. The converse is not true.

Theorem 15.2. Second order necessary condition for local maxima / minima: Let A be an open

set in Rn , and let f : A → R be a twice continuously differentiable function on A.

**(a) If function f has local maximum at x∗ then H f (x∗ ) is negative semi-definite.
**

15.2. Maxima / Minima for C 2 functions of n variables 149

(b) If function f has local minimum at x∗ then H f (x∗ ) is positive semi-definite.

The first order and second order necessary conditions are useful tools to help us in ruling out

the points where a local maximum or local minimum cannot occur. This narrows down our search

for points where a local maximum or local minimum does occur. Examples below explain this

further.

**Example 15.2. Let f : R → R be given by f (x) = 4 − x2 for all x ∈ R. Then A = R is an open
**

set, and f a continuously differentiable function on A with f ′ (x) = −2x.Considerthepointx∗ = 1.

Then f ′ (x∗ ) = f ′ (1) = −2(1) = −2. We apply Theorem 15.1 to conclude that x∗ = 1 is not a point

of local maximum of f .

**Example 15.3. Let f : R → R be given by f (x) = 4 − 4x + x2 for all x ∈ R. Then A = R is an open
**

set, and f a twice continuously differentiable function on A. Consider the point x∗ = 2. We can

calculate f ′ (x∗ ) = f ′ (2) = −4 + 2(2) = 0, so the necessary condition of Theorem 15.1 is satisfied.

However this theorem in itself fails to provide any additional information at this stage. In other

words, we cannot conclude from Theorem 15.1 that x∗ = 2 is a point of local maximum. Also, we

cannot conclude from Theorem 15.1 that x∗ = 2 is not a point of local maximum. Theorem 15.2 is

useful at this point. We can calculate

f ′′ (x∗ ) = f ′′ (2) = 2 > 0,

and so the necessary condition of Theorem 15.2 is violated. Consequently, by Theorem 15.2, we

can conclude that x∗ = 2 is not a point of local maximum of f .

**It is easy to see that the necessary first and second order conditions are not sufficient.
**

f (x) d 2 f (x)

Example 15.4. Let X = R be the domain and f (x) = x3 − x4 . Then d dx = 3x2 − 4x3 and dx2

=

6x − 12x2 are both 0 at x = 0. But x = 0 is not a local maximizer for f (x).

**Theorem 15.3. Sufficient conditions for local maxima / minima: Let A be an open set in Rn , and
**

let f : A → R be a twice continuously differentiable function on A.

**(a) If x∗ ∈ A is such that H f (x∗ ) is negative definite and ∇ f (x∗ ) = 0 then f has local maximum
**

at x∗ .

**(b) If x∗ ∈ A is such that H f (x∗ ) is positive definite and ∇ f (x∗ ) = 0 then f has local minimum at
**

x∗ .

**It should be noted that the sufficient condition in Theorem 15.3 cannot be weakened to the
**

necessary condition in the statement of Theorem 15.2. The following example explains this point.

150 15. Unconstrained Optimization

**Example 15.5. Let f : R → R be given by f (x) = x3 for all x ∈ R. Then A = R is an open set, and
**

f is a twice continuously differentiable function on A. At x∗ = 0,

f ′ (x∗ ) = f ′ (0) = 0, and f ′′ (x∗ ) = f ′′ (0) = 0,

so first order necessary condition and second order necessary condition are satisfied. But x∗ is

clearly not a point of local maximum of f since f is an increasing function on A.

**It may also be observed that the second order necessary condition in Theorem 15.2 cannot be
**

strengthened to the sufficient condition in the statement of Theorem 15.3. The following example

illustrates this point.

Example 15.6. Let f : R → R be given by f (x) = −x4 for all x ∈ R. Then A = R is an open set, and

f is a twice continuously differentiable function on R. Clearly, x∗ = 0 is a point of local maximum

of f , since f (0) = 0, while f (x) < 0 for all x ̸= 0. We can calculate that

f ′ (x∗ ) = f ′ (0) = 0, and f ′′ (x∗ ) = f ′′ (0) = 0.

Thus first order necessary condition (in Theorem 15.1) and second order necessary condition (in

Theorem 15.2) are satisfied are, but the second order sufficient condition (in Theorem 15.3) is

violated.

**The above discussion shows that the second-order necessary conditions for a local maximum
**

are different from (weaker than) the second-order sufficient conditions for a local maximum. This

demonstrates the fact that, in general, the first and second derivatives of a function at a point do not

capture all aspects relevant to the occurrence of a local maximum of the function at that point.

Theorem 15.4. Concavity (convexity) and global maxima (minima): Let A be an open and con-

vex set in Rn , and let f : A → R be a continuously differentiable function on A.

(a) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and f is concave on A, then f has global maximum at x∗ .

(b) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and f is convex on A, then f has global minimum at x∗ .

**This is very easy to show. Note that concavity alongwith continuous differentiability of f
**

implies that for all x ∈ A,

f (x) − f (x∗ ) 6 ∇ f (x∗ ) · (x − x∗ ).

So f (x) − f (x∗ ) 6 0 or x∗ is a point of global maximum of f on A.

Theorem 15.5. Let A be an open and convex set in Rn , and let f : A → R be a twice continuously

differentiable function on A.

**(a) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and H f (x) is negative semi-definite for all x ∈ A, then f
**

has global maximum at x∗ .

15.2. Maxima / Minima for C 2 functions of n variables 151

**(b) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and H f (x) is positive semi-definite for all x ∈ A, then f has
**

global minimum at x∗ .

**It is worth noting that Theorem 15.4 or Theorem 15.5 might be applicable in cases Theorem
**

15.3 is not applicable as the following example shows.

Example 15.7. Let f : R → R be given by f (x) = −x4 . Here, we note that f ′ (0) = 0 and f ′′ (x) =

−12x2 ≤ 0 for all x ∈ R. Thus we can apply Theorem 15.4 or Theorem 15.5 and conclude that

x = 0 is a point of global maximum, and hence also a point of local maximum. But the conclusion

that x = 0 is a point of local maximum cannot be derived from Theorem 15.3, since f ′′ (0) = 0.

**Now we explain the steps in applying these theorems via several examples.
**

Example 15.8. Consider X = R2+ and f (x) = x1 x2 − 2x14 − x22 . The optimization exercise is to

maximize the objective function f (x) by choosing x ∈ X. The two first order conditions are

x2 − 8x13 = 0, and x1 − 2x2 = 0.

Solving the second equation for x1 , we have x1 = 2x2 . Substituting this into the first equation, we

have x2 − 64x23 = 0, which has three solutions:

1 1

x2 = 0, , and − .

8 8

Then the first order conditions have three solutions,

( ) ( )

1 1 1 1

(x1 , x2 ) = (0, 0) , , , and − , − ,

4 8 4 8

but the last of these is not in the domain of f , and the first is on the boundary of the domain, giving

f (0, 0) = 0. Thus, we have a unique solution in the interior of the domain:

( )

( ∗ ∗) 1 1

x1 , x2 = , .

4 8

Example 15.9. Let us find maxima / minima (if any) for f : R3 → R

f (x, y, z) = x2 + 2y2 + 3z2 + 2xy + 2xz.

Step 3. Find ∇ f (x, y, z) and set it equal to zero vector.

[ ] [ ]

∇ f (x, y, z) = 2x + 2y + 2z 4y + 2x 6z + 2x = 0 0 0 .

The only solution is (x, y, z) = (0, 0, 0). So we have one candidate for local maximum or minimum.

Step 4. Compute H f .

2 2 2

H f (x, y, z) = 2 4 0 .

2 0 6

152 15. Unconstrained Optimization

**Note that in this example, H f is independent of (x, y, z). So whichever property of H f , we get, will
**

be global.

Step 5. Determine the curvature. Begin with computing the leading principal minors.

D1 = 2 > 0, D2 = 2 · 4 − 2 · 2 = 4 > 0 and

D3 = 2 (24 − 0) − 2 (12 − 0) + 2 (0 − 8) = 48 − 24 − 16 = 8 > 0

All leading principal minors are strictly positive→ H f is positive definite ∀ (x, y, z) including

(0, 0, 0) which implies that f is strictly convex.

Step 6. Conclude, using Theorem 15.4, that we have a global minimum at (0, 0, 0).

Example 15.10. Let us find maxima / minima (if any) for f : R2 → R

f (x, y) = −x3 + xy − y3 .

**Step 1 Find ∇ f (x, y) and set equal to zero vector.
**

[ ] [ ]

∇ f (x, y) = −3x2 + y −3y2 + x = 0 0 .

( )

There are two solutions (x, y) = (0, 0) ; (x, y) = 13 , 13 .

Step 2 Compute H f .

[ ] ( ) [ ] [ ]

−6x 1 1 1 −2 1 0 1

H f (x, y) = ⇒ Hf , = and H f (0, 0) = .

1 −6y 3 3 1 −2 1 0

)(

1 1

Step 3 Determine the curvature. For 3, 3

, the leading principal minors.

( )

1 1

D1 = −2 < 0, D2 = 3 > 0 ⇔ H f , is negative definite.

3 3

For (0, 0),the principal minors are

D1 = 0, 0; D2 = −1 < 0 ⇒ H f (0, 0) is neither negative semi-definite nor positive semi-definite.

**Step 4 Then Theorem
**

( ) 15.3 on second order necessary conditions applies and we have strict

1 1

local maximum at 3 , 3 . The contrapositive of the second order necessary conditions (Theorem

15.2) shows that (0, 0) is neither a point of local maximum nor local minimum. It is an inflection

point.

15.2. Maxima / Minima for C 2 functions of n variables 153

Example 15.11. Let us find maxima / minima (if any) for f : R2 → R

f (x, y) = 2x3 + xy2 + 5x2 + y2 .

**Step 1 Find ∇ f (x, y) and set equal to zero vector.
**

[ ] [ ]

∇ f (x, y) = 6x2 + y2 + 10x 2xy + 2y = 0 0 .

2xy + 2y = 0 ⇒ y = 0 ∨ x = −1,

for x = −1, 6x2 + y2 + 10x = y2 − 4 = 0 ⇒ y = 2 ∨ y = −2

5

for y = 0, 6x2 + y2 + 10x = 6x2 + 10x = 0 ⇒ x = 0 ∨ x = − .

3

( )

There are four solutions (x, y) = (0, 0) ; (−1, 2) ; (−1, −2) and − 53 , 0 .

Step 2 Compute H f .

[ ]

12x + 10 2y

Hf = .

2y 2x + 2

Step 3

[ ]

10 0

H f (0, 0) = , D1 = 10 > 0, D2 = 20 > 0

0 2

⇒ H f (0, 0) is positive definite.

[ ]

−2 4

H f (−1, 2) = , D1 = −2 < 0, and 0, D2 = −16 < 0

4 0

⇒ H f (−1, 2) is neither positive semi-definite nor negative semi-definite.

[ ]

−2 −4

H f (−1, −2) = , D1 = −2 < 0, and 0, D2 = −16 < 0

−4 0

⇒ H f (−1, −2) is neither positive semi-definite nor negative semi-definite.

( ) [ ]

5 −10 0 40

Hf − ,0 = D1 = −10 < 0, D2 = >0

3 0 −34

3

( )

5

⇒ H f − , 0 is negative definite.

3

154 15. Unconstrained Optimization

( )

Step 4 Then Theorem 15.3 on sufficient conditions apply for (0, 0) and − 53 , 0 . We have

( )

strict local minimum at (0, 0); and strict local maximum at − 53 , 0 . The contrapositive of the

second order necessary conditions (Theorem 15.2) implies that neither local maximum not local

minimum exist at (−1, 2) and (−1, −2). They are inflection points.

15.3. Application: Ordinary Least Square Analysis

**We describe a nice application of the unconstrained optimization technique in the determination of
**

regression coefficients in the method of ordinary least squares.

**Suppose there are n points (xi , yi ), i = 1, · · · , n in R2 . Let f : R → R be given by f (x) = ax + b
**

for all x ∈ R. Our objective is to find a function F (that is, we want to choose a ∈ R and b ∈ R)

such that the quantity

n

(15.5) ∑ [ f (xi ) − yi ]2

i=1

is minimized. Thus the coefficients are such that the sum of the squares of the residuals (error

terms, i.e., the difference between the estimates and the actual observations) is minimized.

We can set up the problem as an unconstrained maximization problem as follows.

Define f : R2 → R by

n

f (a, b) = − ∑ [axi + b − yi ]2

i=1

The maximization problem then is

max f (a, b).

(a,b)

15.3. Application: Ordinary Least Square Analysis 155

**The function f is twice continuously differentiable on R2 (being a polynomial function), and we
**

can calculate

n n

f1 = −2 ∑ [axi + b − yi ]xi = −2 ∑ [axi2 + bxi − xi yi ]

i=1 i=1

n

f2 = −2 ∑ [axi + b − yi ]

i=1

n

f11 = −2 ∑ xi2 ;

i=1

n

f12 = −2 ∑ xi ;

i=1

n

f21 = −2 ∑ xi ;

i=1

f22 = −2n

**The hessian matrix, H f , is
**

[ ]

−2 ∑ni=1 xi2 −2 ∑ni=1 xi

H f (a, b) = .

−2 ∑ni=1 xi −2n

The principal minors of order one for the Hessian are, f11 = −2 ∑ni=1 xi2 < 0 and f22 = −2n < 0.

We need to check the determinant of the principal minor of order two to be non-negative. Thus, the

determinant of the Hessian of f is

[ ]2

n n

det(H f (a, b)) = 4n ∑ xi2 − 4 ∑ xi

i=1 i=1

Recall the Cauchy-Schwarz inequality,

|x · y| ≤ ∥x∥ · ∥y∥.

We can take the two vectors x and the sum vector u and apply the inequality to get

|x · u| ≤ ∥x∥ · ∥u∥

|x · u|2 ≤ ∥x∥2 · ∥u∥2

n [ ]

| ∑ xi |2 ≤ ∑ xi2 · n.

i=1

**Therefore, det(H f (a, b)) ≥ 0. Since f11 (a, b) ≤ 0, f22 (a, b) ≤ 0, and det(H f (a, b)) ≥ 0, H f (a, b) is
**

negative semi-definite. Consequently, if (a∗ , b∗ ) satisfies the first-order conditions, then (a∗ , b∗ ) is

156 15. Unconstrained Optimization

**a point of global maximum of f . The first-order conditions are
**

n n n

a ∑ xi2 + b ∑ xi = ∑ xi yi

i=1 i=1 i=1

n n

a ∑ xi + bn = ∑ yi

i=1 i=1

n n

∑ xi ∑ yi

i=1 i=1

Denoting n by x and n by y (the mean of x and mean of y respectively), we get

(15.6) ax + b = y

Using this in the first equation leads to

n n

(15.7) a ∑ xi2 + (y − ax)nx = ∑ xi yi

i=1 i=1

Thus,

n

∑ xi yi

i=1

n −xy

n = a

∑ xi2

i=1

n − x2

y − ax = b

solves the problem. Note the solution is meaningful provided not all the xi are the same.

**In the next exercise, we provide an alternative proof of the determinant of the Hessian is non-
**

negative.

Exercise 15.1.

(a) Show that

2αβ ≤ α2 + β2 .

**(b) Use this to show that
**

(x1 + x2 + · · · + xn )2 ≤ n(x12 + x22 + · · · + xn2 ).

**(c) Show that the point (a, b) is a global minimizer of the objective function (15.5).
**

Solution 15.1.

(a) Observe,

(α − β)2 = α2 + β2 − 2αβ ≥ 0

which shows that the desired inequality holds.

15.3. Application: Ordinary Least Square Analysis 157

(b) We use induction to show this. For n = 2, from part (a) we know

**2x1 x2 ≤ x12 + x22
**

x12 + x22 + 2x1 x2 ≤ 2(x12 + x22 )

(x1 + x2 )2 ≤ 2(x12 + x22 ).

Next, we assume that the claim holds for some k ∈ N and show that it holds true for n = k + 1.

Let

(x1 + · · · + xk )2 ≤ k(x12 + · · · + xk2 ).

Then

(x1 + · · · + xk + xk+1 )2 = (x1 + · · · + xk )2 + 2(x1 + · · · + xk )xk+1 + xk+1

2

≤ k(x12 + · · · + xk2 ) + 2(x1 xk+1 + · · · + xk xk+1 ) + xk+1

2

**≤ k(x12 + · · · + xk2 ) + (x12 + xk+1
**

2

) + · · · + (xk2 + xk+1

2 2

) + xk+1

= (k + 1)(x12 + · · · + xk+1

2

).

Hence, the claim holds true for all n ∈ N.

**(c) Part (b) shows that the determinant of the Hessian matrix is non-negative. (This is an alterna-
**

tive proof, without using Cauchy-Schwarz inequality). Since the Hessian matrix is negative-

semidefinite for all points in R2 , the point (a, b) is a global minimizer of the objective function

(15.5).

There is yet another proof of this inequality which is quite short and one of you showed (thank

you) me in class today.

Observe that

[ ]2 ][ [ ]

n n n

∑ni=1 xi

n

4n ∑ xi2 − 4 ∑ xi = 4n ∑ ∑ xi

xi2 − 4 ·n

i=1 i=1 i=1 i=1 n

[ ]

n n

= 4n ∑ xi2 − ∑ xi · x .

i=1 i=1

**Hence, it is sufficient to show that
**

[ ]

n n

∑ xi2 − ∑ xi · x ≥ 0.

i=1 i=1

158 15. Unconstrained Optimization

Note

[ ]

n n n n

∑ xi2 − ∑ xi · x = ∑ xi (xi − x) = ∑ (xi − x + x)(xi − x)

i=1 i=1 i=1 i=1

n n

= ∑ (xi − x)2 + x · ∑ (xi − x) ≥ 0,

i=1 i=1

since the first term being sum of squares is non-negative and the second term is zero because

∑ni=1 (xi − x) = 0.

Chapter 16

Problem Set 7

**(1) Consider the function g defined for all x ≥ 0, y ≥ 0 by
**

g(x, y) = x3 + y3 − 3x − 2y.

Write out ∇g(x, y) and Hg (x, y). Show that g is convex in its domain and find its (global)

minimum.

**(2) Find all the local maxima and minima of
**

(16.1) f (x) = x4 − 4x3 + 4x2 + 4?

Which if any of them are global maxima or minima?

**(3) A monopolist producing a single output has two types of buyers. If it produces Q1 units for
**

buyers of type 1, then the buyers are willing to pay a price of 100 − 5Q1 dollars per unit. If it

produces Q2 units for buyers of type 2, then the buyers are willing to pay a price of 50 − 10Q2

dollars per unit. The monopolist’s cost of producing Q units of output is 50+10Q. How many

units the monopolist should produce to maximize profit?

**(4) Suppose that a perfectly competitive firm receives a price of P for its output, pays prices of
**

w, and r for its labor (L), and capital inputs (K), and operates with the production function

Q = La K b .

(a) Write profits as a function of L, and K. Derive the first order conditions. Provide an eco-

nomic interpretation of the first order conditions.

(b) Solve for the optimal levels of L, and K.

(c) Check the second order conditions. What restrictions on the values of a, and b are necessary

for a profit maximum. Provide an economic interpretation of these restrictions.

159

160 16. Problem Set 7

**(d) Find the signs of the partial derivatives of L with respect to P, w, and r.
**

(e) Derive the firm’s long run supply curve, i.e., Q as a function of the exogenous parameters.

Find the elasticities of supply with respect to w, r, and P. Do these elasticities sum to zero?

Provide an economic explanation for this fact.

**(5) Suppose that a perfectly competitive firm receives a price of P for its output, pays prices of
**

w, v, and r for its labor (L), natural resource (R) and capital inputs (K), and operates with the

production function Q = A(L)a (K)b + ln R.

(a) Write profits as a function of L, R and K. Derive the first order conditions. Provide an

economic interpretation of the first order conditions.

Now take A = 3, a = b = 31 for remainder of the problem.

(b) Check the second order conditions.

(c) [Optional)] Solve for L∗ . Find the change in L∗ for a change in r when all other parameters

are constant by taking the partial derivatives of L∗ with respect to r.

(d) [Optional)] Find the change in L∗ for a change in v when all other parameters are constant

by taking the partial derivatives of L∗ with respect to v.

(e) [Optional)] It is also possible to determine the changes in L∗ when r or v values change

without explicitly solving for L∗ by using the Implicit Function Theorem. You might like

to use a more general version of the Implicit Function Theorem (than what we stated in

class) to complete this exercise.

(i) Find the change in L for a change in r when all other parameters are constant.

(ii) Find the change in L for a change in v when all other parameters are constant.

Chapter 17

Optimization Theory:

Equality Constraints

17.1. Constrained Optimization

**The optimization problems we encounter in economics are, in general, constrained problems where
**

there are some restrictions on the set we can choose x from. Some examples of constrained opti-

mization problems we see are,

Example 17.1. Consumer Theory

max u (x)

(17.1) x

subject to x ∈ B (p, I)

where B (p, I) is the budget set.

Producer Theory

max py − w · x

(17.2) y,x

subject to (y, x) ∈ Y

where

{ }

Y = (y, x) ∈ R × Rn | y 6 f (x)

is the production possibility set with f (x) being the production function (one output, many inputs).

161

162 17. Optimization Theory: Equality Constraints

**We will work with maximization problem as it is easy to turn a minimization problem into a
**

maximization problem. A constrained maximization problem has the following form.

max f (x)

x

subject to x ∈ G (x)

f (x) is called the objective function,

where x is called the choice variable,

G (x) is called the constraint set.

**We assume the objective function to be C 2 so that we can use differential calculus techniques.
**

Example 17.2. Consider following optimization problem.

max f (x)

(17.3) x

subject to x ∈ [a, b]

**A solution to this problem is
**

( )

(17.4) x∗ ∈ X ∗ ⊂ [a, b] ∧ f x∗ > f (x) ∀x ∈ [a, b] .

First question to answer is,

**Does a solution exist? Note f is continuous (because it is C 2 ) and [a, b] is a non-empty compact
**

set. We can use Weierstrass Theorem to show existence of a maximum and minimum. Having

shown the existence, there are two possibilities:

(a) The solution is interior, x∗ ∈ (a, b);

(b) We have corner (boundary) solution, i.e. x∗ = a, or x∗ = b or both.

**Case (i) If solution is interior, then x∗ must also be a local maximum, i.e.,
**

( ) ( )

(17.5) f ′ x∗ = 0 ∧ f ′′ x∗ 6 0.

Hence we are able to apply earlier theorems to interior solutions.

Case (ii) Boundary solution:

If x∗ = a, then f ′ (a) 6 0.

If x∗ = b, then f ′ (b) > 0.

17.2. Equality Constraint 163

**In general, constrained optimization problems are of two categories, (a) with equality con-
**

straint and (b) with inequality constraint. We discuss them next.

17.2. Equality Constraint

**In this case the constraint set G (x) is described by k equality constraints.
**

g1 (x) = 0

··· where x ∈ Rn , or,

k

g (x) = 0

{ }

(17.6) G (x) = x ∈ Rn | g (x) = 0 .

**Note that g (x) = (g1 (x) , · · · , gk (x)) is k-dimensional row vector. The interesting case will be
**

k < n as the following example shows.

Example 17.3. Consider

max f (x)

x∈R2 {

x1 + x2 − 2 = 0 : g1 (x) = 0

subject to

3 x1 + x2 − 1 = 0 g (x) = 0.

1

: 2

( )

The only point in the constraint set is (x1 , x2 ) = 32 , 21 . Maximizing over this set is trivial. The

solitary point in the constraint set is also the solution.

**Definition 17.1. A point x∗ ∈ G (x) is point of local maximum of f subject to the constraint g (x) =
**

0, if there is δ > 0 such that x ∈ G (x) ∩ B (x∗ , δ) implies f (x) 6 f (x∗ ).

**Definition 17.2. A point x∗ ∈ G (x) is point of global maximum of f subject to the constraint
**

g (x) = 0, if x∗ solves the problem

max f (x)

subject to g (x) = 0.

**Theorem 17.1. Necessary condition for a constrained local maximum (Lagrange Theorem) Let
**

A ⊆ Rn be open and f : A → R, g : A → Rk be C 1 functions. Suppose x∗ is a point of local

maximum of f subject to the constraint g (x) = 0. Suppose further that ∇g (x∗ ) ̸= 0. Then there is

λ∗ ∈ Rk such that

( ) ( )

(17.7) ∇ f x∗ = λ∗ ∇g x∗ .

**Remark 17.1. The condition ∇g (x∗ ) ̸= 0 is called constraint qualification.
**

164 17. Optimization Theory: Equality Constraints

**It is important to check the constraint qualification condition ∇g(x∗ ) ̸= 0, for applying the
**

conclusion of Lagrange’s theorem. Without this condition, the conclusion of Lagrange’s theorem

would not be valid, as the following example shows.

Example 17.4.

Let f : R2 → R be given by

f (x1 , x2 ) = 4x1 + 3x2 for all (x1 , x2 ) ∈ R2 ;

and let g : R2 → R be given by

g(x1 , x2 ) = x12 + x22 .

Consider the constraint set C = {(x1 , x2 ) ∈ R2 : g(x1 , x2 ) = 0}. The only element of this set is (0,0),

so (x1∗ , x2∗ ) = (0, 0) is a point of local maximum of f subject to the constraint g(x) = 0. Observe that

the conclusion of Lagrange’s theorem does not hold here. For, if it did, there would exist λ∗ ∈ R

such that

∇ f (0, 0) = λ∗ ∇g(0, 0)

But this means that

(4, 3) = λ∗ (0, 0)

which is a contradiction. The problem here is that

∇g(x1∗ , x2∗ ) = ∇g(0, 0) = (0, 0),

so the constraint qualification condition is violated.

**In the next Theorem, we use notation C to denote the constraint set, i.e.,
**

{ }

C = x ∈ Rn : g(x) = 0 .

Theorem 17.2. Sufficient Conditions for a Global maximum: Let A ⊆ Rn be an open convex set

and f : A → R, g : A → Rk be C 1 functions. Suppose (x∗ , λ∗ ) ∈ C × Rk satisfies

( ) ( )

(17.8) ∇ f x∗ = λ∗ ∇g x∗ .

If L (x, λ∗ ) = f (x) − λ∗ · g (x) is concave in x on A, then x∗ is a point of global maximum of f

subject to constraint g (x) = 0.

**Proof. Let x ∈ C. Then,
**

L (x, λ∗ ) − L (x∗ , λ∗ ) ≤ [∇ f (x∗ ) − λ∗ ∇g(x∗ )] · (x − x∗ )

by concavity of L in x on A. Using the first-order condition, the term on the right hand side of the

inequality [∇ f (x∗ ) − λ∗ ∇g(x∗ )] · (x − x∗ ) is zero and we get

f (x) − λ∗ g(x) = L (x, λ∗ ) ≤ L (x∗ , λ∗ ) = f (x∗ ) − λ∗ g(x∗ ).

Since x ∈ C, and x∗ ∈ C, we have g(x) = g(x∗ ) = 0. Thus, f (x) ≤ f (x∗ ), and so x∗ is a point of

global maximum of f subject to the constraint g(x) = 0.

17.2. Equality Constraint 165

**We use the following steps to solve the optimization problem with equality constraint. Let f
**

and gi , i = 1, · · · , k, be C 1 functions.

Necessity Route:

**Step 1 Existence of solution can be shown by using Weierstrass Theorem. For this we need to
**

show that the constraint set is closed and bounded.

**Step 2 Define the Lagrangian function as
**

L (x, λ) = f (x) − λ · g(x) = f (x) − λ1 g1 (x) − · · · − λk gk (x)

where λi , i = 1, · · · , k are Lagrange multipliers.

**Step 3 Take the partial derivative with respect to each variable x1 , · · · xn , and Lagrange multi-
**

pliers λ1 , · · · , λk .

**Step 4 Solve the following equations:
**

∂L (x, λ)

= 0, i = 1, · · · , n;

∂xi

∂L (x, λ)

= 0, i = 1, · · · , k.

∂λi

These are n + k first order conditions (FOCs) for n + k unknowns.

Step 5 Let

{ }

M = (x, λ) ∈ Rn+k | x satisfies gi (x) = 0, i = 1, · · · , k and FOCs hold.

**Verify that ∇g (x∗ ) ̸= 0 holds at each point in the set M. Then evaluate f at each (x, λ) ∈ M and
**

find the maximum.

**Sufficiency Route: We know that if f and λ1 g1 (x) , · · · , λk gk (x) are such that L (x, λ) is con-
**

cave, then the FOCs are sufficient for a maximum. Hence if we can show concavity, then any point

satisfying the FOC will be a solution. We illustrate the use of the two routes through following

examples.

Remark 17.2. Note if f is not concave, we have to compare points in M (x, λ).

Example 17.5.

max f (x1 , x2 ) = −x12 − x22

x∈R2+

subject to 5x1 + 10x2 = 10

166 17. Optimization Theory: Equality Constraints

x2

1

2 x1

Figure 17.1. Constraint Set x2 = 1 − 0.5 · x1

**The constraint set consists of 1 − 0.5x1 and non-negative values of x1 and x2 subject to the equality
**

constraint. To get the constraint in g (x) = 0 form, we rearrange it

5x1 + 10x2 − 10 = 0.

Necessity Route Constraint set is non-empty as (2, 0) is contained in it.

**Constraint set is closed. Take any convergent sequence {xn } ∈ G (x) → x̄. Since 5x1n + 10x2n −
**

10 = 0, x1n > 0, x2n > 0, ∀n ∈ N, and weak inequalities are preserved in the limit,

5x̄1 + 10x̄2 − 10 = 0, x̄1 > 0, x̄2 > 0.

So x̄ ∈ G (x).

**Constraint set is bounded. Note for ∀x ∈ G (x),
**

√ √

x1 6 2 and x2 6 1 ⇒ ∥x∥ 6
(2, 1)
= 22 + 12 = 5.

√

So 5 will serve as a bound. So the constraint set is compact and non-empty and the objective

function f is continuous, hence Weierstrass theorem is applicable and a solution exists.

17.2. Equality Constraint 167

**The Lagrangian and the FOCs are
**

L (x, λ) = −x12 − x22 − λ (5x1 + 10x2 − 10)

∂L (x, λ)

= −2x1 − 5λ = 0

∂x1

∂L (x, λ)

= −2x2 − 10λ = 0

∂x2

∂L (x, λ)

= −(5x1 + 10x2 − 10) = 0.

∂λ

Now from the first two FOCs

4x1 = 2x2 ⇔ 2x1 = x2

and from the third FOC,

5x1 + 20x1 − 10 = 0

10 2 4 4

x1 = = , x2 = , λ = − .

25 5 5 25

We get a candidate for solution

( )

2 4 4

m1 = , ,− .

5 5 25

Since we know a solution exists, it must necessarily be either m1 or one of the corners (2, 0) or

(0, 1). The constraint qualification

( ) [ ]

∇g x∗ = 5 10 ̸= 0

is verified trivially.

**We also see that ( )
**

2 4 4

f (2, 0) = −4, f (0, 1) = −1, f , =− .

5 5 5

( )

The solution then is x∗ = 2 4

5, 5 .

Sufficiency Route

[ ]

∇ f (x) = −2x1 −2x2

[ ]

−2 0

H f (x) =

0 −2

D1 = −2 < 0, D2 = 4 > 0

So H f (x) is negative definite ∀x. Since H f (x) is negative definite ∀x, f is concave. The constraint

g(x) is concave as it is linear. Also −λ > 0. Then f (x) − λg (x) is concave as a sum of concave

168 17. Optimization Theory: Equality Constraints

functions.

( )Then we know that the FOCs are sufficient condition for a maximum. So the point

∗ 2 4

x = 5 , 5 is our solution.

Example 17.6. (Non-concave objective function)

max f (x1 , x2 ) = x12 x2

subject to 2x12 + x22 = 3.

**The constraint set is an ellipsoid and can be rewritten as 3 − 2x12 − x22 = 0. Here the sufficiency
**

route will not work as the objective function is not concave.

[ ]

2x2 2x1

H f (x) =

2x1 0

D1 = 2x2 , D2 = −4x12 , D2 < 0 ∀ x ̸= 0

which means that H f (x) is indefinite ∀x ̸= 0. So f is not concave. Hence we have to use the

necessity route.

**Constraint set is non-empty as (1, 1) is contained in it.
**

( )2

Constraint set is closed. Take any convergent sequence {xn } ∈ G (x) → x̄. Since 2 x1n +

( n )2

x2 = 3 ∀ n ∈ N, and weak inequalities are preserved in the limit,

2 (x̄1 )2 + (x̄2 )2 = 3.

So x̄ ∈ G (x).

**Constraint set is bounded. Note for ∀x ∈ G (x),
**

√

3 √ √

x1 6 < 3 and x2 6 3.

2

√ √
√ √

So ∥x∥ 6
3, 3
= 3 + 3 = 6. So the constraint set is compact and non-empty and the

objective function f is continuous, hence Weierstrass theorem is applicable and a solution exists.

**The Lagrangian and the FOCs are
**

( )

L (x, λ) = x12 x2 − λ 3 − 2x12 − x22

∂L (x, λ)

= 2x1 x2 + 4λx1 = 0

∂x1

∂L (x, λ)

= x12 + 2λx2 = 0

∂x2

∂L (x, λ)

= −(3 − 2x12 − x22 ) = 0.

∂λ

17.2. Equality Constraint 169

Now

x2

2x1 (x2 + 2λ) = 0 ⇔ x1 = 0 ∨ λ = − .

2

Case (i)

√

x1 = 0, x2 = ± 3, λ = 0.

We get two candidates for solution

( √ ) ( √ )

m1 = 0, 3, 0 , m2 = 0, − 3, 0 .

Case (ii)

x2

λ = −→ x12 − x22 = 0

2

→ x1 = x2 ∨ x1 = −x2

→ 3 − 2x12 − x22 = 0

gives x1 = 1 ∨ x1 = −1. If

1 1

x1 = 1 → x2 = 1 ∨ x2 = −1, λ = − ∨ λ = .

2 2

Similarly for x1 = −1. We get four more candidates for solution.

( ) ( )

1 1

m3 = 1, 1, − , m4 = 1, −1, ,

2 2

( ) ( )

1 1

m5 = −1, −1, , m6 = −1, 1, − .

2 2

Thus

M = {m1 , m2 , · · · , m6 } .

The constraint qualification

( ) [ ]

∇g x∗ = −4x1∗ −2x2∗ ̸= 0

for each mi ∈ M. Verify that

√ ( √ )

f (0, 3) = 0 = f 0, − 3 ,

f (1, 1) = f (−1, 1) = 1,

f (1, −1) = f (−1, −1) = −1.

The solution then is x = (1, 1) and x = (−1, 1).

Example 17.7.

max f (x1 , x2 ) = x1 x2

x∈R2+

subject to x1 + 4x2 = 16 or 16 − x1 − 4x2 = 0.

170 17. Optimization Theory: Equality Constraints

The Hessian is [ ]

0 1

H f (x) =

1 0

which is indefinite for all values of x ∈ R2+ . Hence the objective function is not concave.

**Observe that x is restricted to R2+ and the equality constraint holds. This constraint set is non-
**

empty as (0, 4) is contained in it, and compact. A solution to this problem exists as f is continuous

and the constraint set is non empty and compact, hence Weierstrass theorem is applicable.

**The Lagrangian and the FOCs are
**

L (x, λ) = x1 x2 − λ (16 − x1 − 4x2 )

∂L (x, λ)

= x2 + λ = 0

∂x1

∂L (x, λ)

= x1 + 4λ = 0

∂x2

∂L (x, λ)

= −(16 − x1 − 4x2 ) = 0.

∂λ

The FOCs will give us interior candidates. We will still need to compare with the corners. Now

x1 = 4x2 →

8x2 = 16 → x2 = 2 and

x1 = 8, λ = −8.

We get one candidate for solution

m1 = (8, 2, −8) .

The constraint qualification

( ) [ ]

∇g x∗ = −1 −4 ̸= 0

is satisfied trivially for m1 . Compare it with the corners (0, 4) , (16, 0) and verify that

f (0, 4) = 0 = f (16, 0) , f (8, 2) = 16.

The solution then is x = (8, 2).

Example 17.8.

max f (x1 , x2 ) = ln x1 + ln x2

x∈R2+

subject to x1 + 4x2 = 16 or 16 − x1 − 4x2 = 0.

Here the necessity route does not work as the objective function is not defined at the corners

of the constraint set, x = (16, 0) or x = (0, 4) as ln y is not defined for y = 0. Weierstrass Theorem

17.2. Equality Constraint 171

**cannot be applied. Let us use the sufficiency route. Since ln is not defined at the corners, the
**

problem can be modified as follows

max f (x1 , x2 ) = ln x1 + ln x2

x∈R2++

subject to 16 − x1 − 4x2 = 0.

The Lagrangian and the FOCs are

L (x, λ) = ln x1 + ln x2 − λ (16 − x1 − 4x2 )

∂L (x, λ) 1

= + λ = 0 → λx1 = −1

∂x1 x1

∂L (x, λ) 1

= + 4λ = 0 → 4λx2 = −1

∂x2 x2

∂L (x, λ)

= −(16 − x1 − 4x2 ) = 0.

∂λ

So x1 = 4x2 from the first two FOCs. Substituting it in the third FOC, we get x1 = 8, x2 = 2, λ = − 18 .

The Hessian is

− x12 0

H f (x) = 1

0 − x12

2

1 1

D1 = − 2

< 0, D2 = 2 2 > 0, ∀x ∈ R2++ .

x1 x1 x2

Hence H f (x) is negative definite ∀x ∈ R2++ , so f is concave. Also g (x) = 16 − x1 − 4x2 is linear,

hence concave. Lastly −λ > 0. So L (x, λ) is concave and the FOCs are sufficient for maximum.

Hence x∗ = (8, 2) is the solution.

**Example 17.9. Application: Arithmetic mean Geometric mean inequality Consider
**

max f (a, b) = ab

(17.9) (a,b)∈R2+

subject to a + b = 2.

**Note the constraint set C = {a > 0, b > 0, a + b = 2} is non empty, (2, 0) is contained in it,
**

√

closed since weak inequalities are preserved in the limit, and bounded as (a, b) 6 (2, 2) 2 2 =

√

2 2. The objective function is continuous. Hence by Weierstrass Theorem a solution exists.

**Note that at the solution a > 0, b > 0. Hence we can rewrite the problem as under
**

max f (a, b) = ab

(a,b)∈R2++

subject to g (a, b) = 2 − a − b = 0.

172 17. Optimization Theory: Equality Constraints

**The Lagrangian and the FOCs are
**

L (a, b, λ) = ab − λ (2 − a − b)

∂L (x, λ)

= b+λ = 0

∂a

∂L (x, λ)

= a+λ = 0

∂b

∂L (x, λ)

= −(2 − a − b) = 0.

∂λ

Now

a = b → a = b = 1 = −λ

We get one candidate for solution

m1 = (1, 1, −1) .

The constraint qualification

( ) [ ]

∇g x∗ = −1 −1 ̸= 0

is satisfied trivially for m1 . Compare it with the corners (0, 2) , (2, 0) and verify that

f (0, 2) = 0 = f (2, 0) , f (1, 1) = 1.

The solution then is (1, 1). In other words, we have shown that

(17.10) ab 6 1.

Now let x1 > 0, x2 > 0 be arbitrary with

x1 + x2 = x > 0.

Then

2x1 + 2x2 = 2x

2x1 2x2

+ = 2

x x

Note that a = 2xx1 > 0, b = 2xx2 > 0 and a + b = 2. So we can apply the result shown above.

( )( )

2x1 2x2

ab = 61

x x

( )

x2 x1 + x2 2

x1 x2 6 =

4 2

√ x1 + x2

x1 x2 6

2

which is the Arithmetic mean Geometric mean inequality.

Chapter 18

Optimization Theory:

Inequality Constraints

18.1. Inequality Constraint

**The more general constrained optimization problem deals with inequality constraint. Note that the
**

equality constraint g (x) = 0 can be expressed as g (x) > 0 and g (x) 6 0.

The constrained maximization problem with which we are concerned is the following:

max f (x)

subject to g j (x) ≥ 0 for j = 1, · · · , m

and x∈ Rn+ .

**We take set X to be a non-empty open subset of Rn containing Rn+ , and f , g j ( j = 1, · · · , m) are
**

continuously differentiable functions from X to R.

We define the following constraint functions on the domain X

G j (x) = g j (x) for j = 1, · · · , m, and

Gm+ j (x) = x j for j = 1, · · · , n.

173

174 18. Optimization Theory: Inequality Constraints

**Using these constraint functions, the maximization problem can be rewritten as
**

max f (x)

subject to G j (x) ≥ 0 for j = 1, · · · , m + n

and x ∈ X.

**We define the constraint set, C as follows:
**

C = {x ∈ X : G(x) ≥ 0}

where, G(x) = [G1 (x), · · · , Gm+n (x)].

Definition 18.1. Kuhn-Tucker Conditions: Let X be an open set in Rn , and f , G j ( j = 1, · · · , m +

n) be continuously differentiable on X. A pair (x∗ , λ∗ ) in X × Rm+n

+ satisfies the Kuhn-Tucker

conditions if

m+n

(i) Di f (x∗ ) + ∑ λ∗j · Di G j (x∗ ) = 0; i = 1, · · · , n

j=1

(ii) G(x ) > 0 and λ∗ · G(x∗ ) = 0.

∗

**Theorem 18.1. Let X be an open set in Rn , and f , G j ( j = 1, · · · , m + n) be continuously differ-
**

entiable on X. Suppose a pair (x∗ , λ∗ ) ∈ X × Rm+n + , satisfies the Kuhn-Tucker conditions. If X

is convex and f , G j ( j = 1, · · · , m + n) are concave on X, then x∗ is a point of constrained global

maximum.

**We illustrate the application of this Theorem through examples. First we take a linear objective
**

function.

Example 18.1. Solve

max f (x, y) = ax + by

(x,y)∈R2+

subject to p1 x + p2 y 6 M.

where a, b, p1 , p2 and M are positive parameters. Find a solution to the problem for the following

parameter configurations

a p1 a p1

(i) > (ii) < ,

b p2 b p2

using Kuhn Tucker sufficiency theorem.

We need to check all conditions of the Theorem are satisfied.

(i) Let { }

X = (x, y) ∈ R2 | x > −1, y > −1 .

18.1. Inequality Constraint 175

**Then X is open as its complement
**

{ }

X C = (x, y) ∈ R2 | x 6 −1, or y 6 −1

is closed.

**(ii) Function f (x, y) is continuous as ax and by are continuous and f (·, ·) is obtained by
**

taking sum of two continuous functions.

**Let g1 (x, y) = M − p1 x − p2 y, g2 (x, y) = x, g3 (x, y) = y are linear and hence continuous func-
**

tions. Further fx (x, y) = a, fy (x, y) = b are continuous functions. Hence f , g j ( j = 1, · · · , 3) are

continuously differentiable on X.

(iii) The set X is convex as (x1 , y1 ), (x2 , y2 ) ∈ X, then

x1 > −1, x2 > −1 → λx1 + (1 − λ) x2 > −1∀λ ∈ (0, 1)

y1 > −1, y2 > −1 → λy1 + (1 − λ) y2 > −1∀λ ∈ (0, 1)

( )

⇒ λx1 + (1 − λ) x2 , λy1 + (1 − λ) y2 ∈ X.

**Function f (x, y) is concave as sum of two concave functions and g j ( j = 1, · · · , 3) are concave being
**

linear functions. Hence for the following problem

max f (x, y) = ax + by

(x,y)∈X

subject to p1 x + p2 y 6 M, x > 0, y > 0.

**all conditions of Kuhn-Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈
**

X × R3+ , that satisfies the Kuhn-Tucker conditions.

m

(i) Di f (x∗ ) + ∑ λ∗j · Di g j (x∗ ) = 0; i = 1, · · · , n,

j=1

(ii) g(x ) > 0 and λ∗ · g(x∗ ) = 0.

∗

They are

a − λ1 p1 + λ2 = 0

b − λ1 p2 + λ3 = 0

M − p1 x − p2 y > 0, λ1 (M − p1 x − p2 y) = 0

x > 0, λ2 x = 0; y > 0, λ3 y = 0

If λ1 = 0, then a − λ1 p1 + λ2 = 0 → λ2 = −a < 0 which contradicts λ2 > 0. Hence

λ1 > 0 → M − p1 x − p2 y = 0.

176 18. Optimization Theory: Inequality Constraints

x2

M

p2

M x1

( ) p1

a p1 M

Figure 18.1. Case (i): b > p2 : Optimal Consumption Bundle = p1 , 0

**So x = y = 0 is ruled out. Take Case (i) ab > pp12 . Consider x > 0, y = 0. Note λ2 = 0, x = M
**

p1 ,

a a

= λ1 , b − p2 + λ3 = 0,

p1 p1

( )

a a p2

λ3 = p2 − b = b − 1 > 0,

p1 b p1

a p1 a p2

since b > p2 or b p1 > 1. Hence

( )

M a a p2

x = , y = 0, λ1 = , λ2 = 0, λ3 = b −1 > 0

p1 p1 b p1

p1

is a solution. Case (ii) ab < p2 . Consider x = 0, y > 0. Note λ3 = 0, y = M

p2 ,

b b

= λ1 , a − p1 + λ2 = 0

p2 p2

( )

b b p1

λ2 = p1 − a = a −1 > 0

p2 a p2

a p1 b p1

since b < p2 or 1 < a p2 . Hence

( )

M b b p1

x = 0, y = , λ1 = , λ2 = a − 1 > 0, λ3 = 0

p2 p2 a p2

is a solution.

18.1. Inequality Constraint 177

x2

M

p2

M x1

( ) p1

a p1

Figure 18.2. Case (ii): b < p2 : Optimal Consumption Bundle = 0, M

p2

Example 18.2. Solve

x

max f (x, y) = 1+x +y

(x,y)∈R2+

subject to x + 4y 6 16.

using Kuhn Tucker sufficiency theorem.

We need to check all conditions of the Theorem are satisfied.

(i) Let { }

X = (x, y) ∈ R2 | x > −1, y > −1 .

Then X is open as its complement

{ }

X C = (x, y) ∈ R2 | x 6 −1, y 6 −1

is closed. (ii) Function f (x, y) is continuous as x, y, 1 + x are continuous, 1 + x > 0 and f (·, ·) is ob-

tained by taking quotient of two continuous functions x and 1 + x, with non-vanishing denominator

and then adding a continuous function. Functions

g1 (x, y) = 16 − x − 4y; g2 (x, y) = x; g3 (x, y) = y

1

are linear and hence continuous. Further fx (x, y) = , f (x, y) = 1 are continuous functions.

(1+x)2 y

Hence f , g j ( j = 1, · · · , 3) are continuously differentiable on X.

178 18. Optimization Theory: Inequality Constraints

(iii) The set X is convex as (x1 , y1 ) , (x2 , y2 ) ∈ X, then

x1 > −1, x2 > −1 → λx1 + (1 − λ) x2 > −1∀λ ∈ (0, 1)

y1 > −1, y2 > −1 → λy1 + (1 − λ) y2 > −1∀λ ∈ (0, 1)

( )

→ λx1 + (1 − λ) x2 , λy1 + (1 − λ) y2 ∈ X.

**Function f (x, y) is concave as sum of two concave functions (exercise) and g j ( j = 1, · · · , 3) are
**

concave being linear functions. Hence for the following problem

x

max f (x, y) = 1+x +y

(x,y)∈X

subject to x + 4y 6 16, x > 0, y > 0.

**all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈
**

X × R3+ , that satisfies the Kuhn-Tucker conditions. They are

1

− λ1 + λ2 = 0

(1 + x)2

1 − 4λ1 + λ3 = 0

16 − x − 4y > 0, λ1 (16 − x − 4y) = 0

x > 0, λ2 x = 0; y > 0, λ3 y = 0.

If λ1 = 0, then 1 − 4λ1 + λ3 = 0 → λ3 = −1 < 0 which contradicts λ3 > 0. Hence

λ1 > 0 → 16 − x − 4y = 0

and x = y = 0 is ruled out. There are three remaining cases.

Case (i) x > 0, y = 0. Note λ2 = 0, x = 16,

1 4

2

= λ1 ; 1 − + λ3 = 0

(1 + 16) 289

285

λ3 = − <0

289

This contradicts λ3 > 0.

Case (ii) x = 0, y > 0. Note λ3 = 0, y = 4,

1

= λ1 ; 1 − λ1 + λ2 = 0

4

1 3

1 − + λ2 = 0; λ2 = − < 0.

4 4

This contradicts λ2 > 0.

18.1. Inequality Constraint 179

Case(iii) x > 0, y > 0. Note λ2 = 0, λ3 = 0,

1

= λ1 ; 1 − 4λ1 = 0,

(1 + x)2

(1 + x)2 = 4 → x = 1 > 0

15

16 − x − 4y = 0 → y = > 0.

4

( )

Note that all conditions are satisfied. The Theorem asserts that 1, 15

4 is a global maximum and

therefore solves both the problem.

Example 18.3.

**In the above example, let the price of good y be p > 0 and income be I > 0. We can redo the
**

exercise by going over the Kuhn Tucker conditions again. They are

1

− λ1 + λ2 = 0

(1 + x)2

1 − pλ1 + λ3 = 0

I − x − py > 0, λ1 (I − x − py) = 0

x > 0, λ2 x = 0; y > 0, λ3 y = 0.

If λ1 = 0, then 1 − pλ1 + λ3 = 0 → λ3 = −1 < 0 which contradicts λ3 > 0. Hence

λ1 > 0 → I − x − py = 0

and x = y = 0 is ruled out because I > 0. There are three remaining cases.

Case (i) x > 0, y = 0. Note λ2 = 0, x = I,

1

= λ1

(1 + I)2

p p

1− 2

+ λ3 = 0 → λ3 = − 1.

(1 + I) (1 + I)2

( )

If p

− 1 > 0 → p > (I + 1)2 , then λ3 > 0. So solution is I, 0, 1

, 0, p 2 − 1 if p >

(1+I)2 (1+I)2 (1+I)

2

(I + 1) .

180 18. Optimization Theory: Inequality Constraints

Case (ii) x = 0, y > 0. Note λ3 = 0, y = pI ,

1

= λ1

p

1

1 − λ1 + λ2 = 0 → 1 − + λ2 = 0

p

1

λ2 = − 1.

p

( )

If 1

p − 1 > 0 → 1 > p, then λ2 > 0. So solution is 0, pI , 1p , 1p − 1, 0 if p 6 1.

**Case(iii) x > 0, y > 0. Note λ2 = 0, λ3 = 0,
**

1

= λ1 , 1 − pλ1 = 0,

(1 + x)2

√

(1 + x)2 = p → x = p−1 > 0

√

I +1− p

I − x − py = 0 → y = > 0.

p

√ (√ √ )

I+1− p

Hence for p > 1 and I + 1 > p, the solution is p − 1, p , 1p , 0, 0 . Combining them the

( )

solution x∗ , y∗ , λ∗1 , λ∗2 , λ∗3 is

( )

I, 0, 1

2 , 0,

p

− 1 if p > (I + 1)2

( (1+I) (1+I)2

)

0, p , p , p − 1, 0 if p 6 1, and

I 1 1

(√ √ )

I+1− p

p − 1, p , 1p , 0, 0 if 1 < p < (I + 1)2 .

**The Kuhn Tucker Sufficiency Theorem asserts that this solution is a global maximum and therefore
**

solves both the problem.

18.2. Global maximum and constrained local maximum

**We know from the definitions, that if x̂ is a point of global maximum, then x̂ is also a point of local
**

maximum. The situations under which the converse is true are given by the following theorems.

Theorem 18.2. Suppose A is an open convex set in Rn , and f is a function from A to R.

**(a) Suppose x̄ in A is a point of local maximum of f , and f is concave on A. Then x̄ is a point of
**

global maximum of f on A.

18.2. Global maximum and constrained local maximum 181

**(b) Suppose x̄ in A is a point of local maximum of f , and f is strictly quasi-concave on A. Then x̄
**

is the unique point of global maximum of f on A.

**(c) Suppose x̄ is in A and there is δ > 0 such that
**

(i) B(x̄, δ) ⊂ A, and

(ii) x̄ is the unique point of maximum of f on B(x̄, δ).

If f is quasi-concave on A, then x̄ is the unique point of global maximum of f on A.

[Note we do not assume the function f to be differentiable on A].

Proof. We prove these claims by contradiction.

**(a) Assume that x̄ is not a global maximum of f on A. Then there exists another point x̂ ∈ A such
**

that x̂ ̸= x̄ and f (x̂) > f (x̄).

Since x̄ is a point of local maximum, there exists δ > 0 such that f (x̄) ≥ f (x) for all

x ∈ A ∩ B(x̄, δ).

Consider a point x ∈ A on the line joining the two points x̂ and x̄, i.e.,

x = λx̂ + (1 − λ)x̄,

for some λ ∈ [0, 1]. Since A is convex, we know x ∈ A. By concavity of f , we have for all

λ ∈ [0, 1]

f (λx̂ + (1 − λ)x̄) ≥ λ f (x̂) + (1 − λ) f (x̄).

Since f (x̂) > f (x̄), we also have for all λ ∈ (0, 1] that

f (λx̂ + (1 − λ)x̄) ≥ λ f (x̂) + (1 − λ) f (x̄) > λ f (x̄) + (1 − λ) f (x̄) = f (x̄).

We wish to take λ sufficiently close to zero (but not equal to zero) so that

x′ ≡ λx̂ + (1 − λ)x̄ ∈ B(x̄, δ).

For this, let us denote d(x̂, x̄) = d and note

d(x′ , x̄) = d(λx̂ + (1 − λ)x̄, x̄) = |λ|d(x̂, x̄) = λ · d.

δ

If we set λ = 2d , then we know

δ δ

d(x′ , x̄) = λ · d = ·d = ,

2d 2

or x′ ∈ B(x̄, δ).

Also x′ ∈ A since A is a convex set. Therefore, we have found a point x′ ∈ A ∩ B(x̄, δ) such

that f (x′ ) > f (x̄), which contradicts that x̄ was a point of local maximum. It follows that x̄

must be a global maximum of f on A.

182 18. Optimization Theory: Inequality Constraints

**(b) Assume that x̄ is not a point of global maximum of f on A. Then there exists another point
**

x̂ ∈ A such that x̂ ̸= x̄ and f (x̂) > f (x̄).

Since x̄ is a point of local maximum, there exists δ > 0 such that f (x̄) ≥ f (x) for all

x ∈ A ∩ B(x̄, δ).

Consider a point x ∈ A on the line joining the two points x̂ and x̄, i.e.,

x = λx̂ + (1 − λ)x̄,

for some λ ∈ [0, 1]. Since A is convex, we know x ∈ A. Since f is strictly quasi-concave, we

have for all λ ∈ (0, 1)

{ }

f (λx̂ + (1 − λ)x̄) ≥ min f (x̂), f (x̄) = f (x̄).

We wish to take λ > 0 sufficiently small so that

x′ ≡ λx̂ + (1 − λ)x̄ ∈ B(x̄, δ).

For this, let us denote d(x̂, x̄) = d and note

d(x′ , x̄) = d(λx̂ + (1 − λ)x̄, x̄) = |λ|d(x̂, x̄) = λ · d.

δ

If we set λ = 2d , then we know

δ δ

d(x′ , x̄) = λ · d = ·d = ,

2d 2

or x′ ∈ B(x̄, δ).

Also x′ ∈ A since A is a convex set. Therefore, we have found a point x′ ∈ A ∩ B(x̄, δ) such

that f (x′ ) > f (x̄), which contradicts that x̄ was a point of local maximum. It follows that x̄

must be a global maximum of f on A.

To show uniqueness, if not, then there exists x′′ ∈ A such that

f (x̄) = f (x′′ ).

But then, since f is strictly quasi-concave and A is convex,

{ }

f (0.5x̄ + 0.5x′′ ) > min f (x̄), f (x′′ ) = f (x′′ ) = f (x̄).

This contradicts the fact that x̄ is a point of global maximum.

**(c) Assume that x̄ is not the unique point of global maximum of f on A. Then there exists another
**

point x̂ ∈ A such that x̂ ̸= x̄ and f (x̂) > f (x̄).

Since x̄ is the unique point of local maximum in the open ball B(x̄, δ), f (x̄) > f (x) for all

x ∈ A ∩ B(x̄, δ).

Consider a point x ∈ A on the line joining the two points x̂ and x̄, i.e.,

x = λx̂ + (1 − λ)x̄,

for some λ ∈ [0, 1]. Since A is convex, we know x ∈ A. Since f is quasi-concave, we have for

all λ ∈ (0, 1) { }

f (λx̂ + (1 − λ)x̄) ≥ min f (x̂), f (x̄) = f (x̄).

18.2. Global maximum and constrained local maximum 183

**We wish to take λ > 0 sufficiently small so that
**

x′ ≡ λx̂ + (1 − λ)x̄ ∈ B(x̄, δ).

For this, let us denote d(x̂, x̄) = d and note

d(x′ , x̄) = d(λx̂ + (1 − λ)x̄, x̄) = |λ| d(x̂, x̄) = λ · d.

δ

If we set λ = 2d , then we know

δ δ

d(x′ , x̄) = λ · d = ·d = ,

2d 2

or x′ ∈ B(x̄, δ).

Also x′ ∈ A since A is a convex set. Therefore, we have found a point x′ ∈ A ∩ B(x̄, δ) such

that f (x′ ) ≥ f (x̄), which contradicts that x̄ was the unique point of local maximum. It follows

that x̄ must be the unique point of global maximum of f on A.

**This theorem shows that there is an important difference between concavity and quasi-concavity
**

in going from the local maximum property to the global maximum property. With quasi-concavity,

we need something more (some “strictness’”) to make the arguments work. In (b), this additional

condition takes the form of strict quasi-concavity. In (c), it takes the form of assuming that the

point of local maximum is unique. This underlying theme (that one needs something in addition

to quasi-concavity to make the arguments and results work) recurs in Arrow- Enthoven’s theory

of quasi-concave programming, where the attempt is made to replace the concavity conditions of

Kuhn-Tucker with quasi-concavity.

**The following example shows that in Theorem 18.2(a), we cannot replace concavity of f by
**

quasi-concavity of f , and still preserve the conclusion.

Example 18.4. Let A be the interval (0, 6) in R. Clearly, A is an open, convex set. Let f : A → R

be defined as follows:

x for x ∈ (0, 2)

f (x) = 2 for x ∈ [2, 4]

x−2 for x ∈ (4, 6)

Then, f is a non-decreasing function on A, and therefore quasi-concave. The point x̄ = 3 is clearly

a point of local maximum, since f (x̄) = 2 ≥ f (x) for all x ∈ A ∩ B(x̄, 1). However, x̄ is not a point

of global maximum of f on A, since (for example), f (5) = 3 > 2 = f (x̄).

**Following theorem describes the conditions in which x̂ is a point of constrained local maxi-
**

mum, then x̂ is also a point of constrained global maximum.

184 18. Optimization Theory: Inequality Constraints

**Theorem 18.3. Let X be a convex set in Rn . Let f , g j ( j = 1, · · · , m) be concave functions on
**

X. Suppose x̂ is a point of constrained local maximum. Then, x̂ is a point of constrained global

maximum.

**Proof. We prove this by contradiction. We denote the constraint set by
**

{ }

C = x ∈ X : g j (x) ≥ 0 .

Since x̂ is a point of constrained local maximum, there is δ > 0, such that for all x ∈ B(x̂, δ) ∩C, we

have f (x) ≤ f (x̂).

**Now, if x̂ is not a point of constrained global maximum, then there is some x̄ ∈ C, such that
**

f (x̄) > f (x̂). One can choose 0 < θ < 1 with θ sufficiently close to zero, such that

x̃ ≡ [θ x̄ + (1 − θ)x̂] ∈ B(x̂, δ).

For this, we need

∥θ x̄ + (1 − θ)x̂ − x̂∥ < δ.

This implies if

δ

θ<

∥x̄ − x̂∥

then x̃ ∈ B(x̂, δ). Take

δ

θ=

2∥x̄ − x̂∥

so that x̃ ∈ B(x̂, δ). Since X is convex and g ( j = 1, · · · , m) are concave, we claim that C is a convex

j

set, and x̃ ≡ [θ x̄ + (1 − θ)x̂] ∈ C.

**Let y ∈ C and y′ ∈ C be two arbitrary points. By definition of the constraint set C, y and y′ are
**

in X and therefore, ŷ ≡ [λ y + (1 − λ)y′ ] ∈ X for all λ ∈ [0, 1]. Also by concavity of the constraint

functions,

g j (ŷ) = g j (λ y + (1 − λ)y′ ) ≥ λ g j (y) + (1 − λ)g j (y′ ) ≥ λ · 0 + (1 − λ) · 0 = 0,

for all j = 1, · · · , m.

Thus, ŷ ∈ C and therefore C is a convex set.

Therefore, x̃ ≡ [θ x̄ + (1 − θ)x̂] ∈ C.

Thus

x̃ ≡ [θ x̄ + (1 − θ)x̂] ∈ B(x̂, δ) ∩C.

Also, since f is concave,

f (x̃) = f (θ x̄ + (1 − θ)x̂) ≥ θ f (x̄) + (1 − θ) f (x̂) > θ f (x̂) + (1 − θ) f (x̂) = f (x̂).

18.2. Global maximum and constrained local maximum 185

But this contradicts the fact that x̂ is a point of constrained local maximum.

**Observe that we did not need to assume that the objective function is differentiable on the
**

domain X in this proof.

Chapter 19

Problem Set 8

**(1) [Cauchy-Schwartz inequality]
**

Let C = (c1 , c2 , c3 ) be a non-zero vector in R3 . Consider the following constrained maxi-

mization problem:

max ∑3i=1 ci xi

(19.1) subject to ∑3i=1 xi2 = 1

and (x1 , x2 , x3 ) ∈ R3

**(a) Show, by using Weierstrass theorem, that there exists x̄ ∈ R3 which solves (29.1).
**

(b) Use Lagrange’s theorem to show that

3

(19.2) ∑ ci x̄i = ∥C∥.

i=1

**(c) Let p, q be arbitrary non-zero vectors in Rn . Using result in (b), show that |p·q| ≤ ∥p∥·∥q∥.
**

Solve the following constrained optimization problems.

(2) Let f : R2 → R.

max f (x, y) = x2 − 3xy

(19.3) (x,y)∈R2+

subject to x + 2y = 10.

(3) Let f : R2+ → R.

1 2

max f (x, y) = x 3 y 3

(19.4) (x,y)∈R2+

subject to 2x + y = 4.

187

188 19. Problem Set 8

(4) Let f : R2+ → R

√

max f (x, y) = xy

(19.5)

subject to x + y 6 6, x > 0, y > 0.

(5) Let f : R2+ → R

max f (x, y) = x + ln(1 + y)

(19.6)

subject to x ≥ 0, y ≥ 0 and x + py ≤ m.

**(6) Let X be a non-empty, convex set in R2 . Let g be a continuous function from X to R, and
**

let f be a strictly quasi-concave function from X to R. Consider the following constrained

optimization problem.

max f (x)

(19.7) subject to g(x) ≥ 0

and x∈X

**and the corresponding optimization problem:
**

}

max f (x)

(19.8)

subject to x ∈ X

**in which the constraint g(x) ≥ 0 has been omitted.
**

(a) Suppose that x̄ is a solution to (29.26), and g(x̄) > 0. Is x̄ also a solution to problem

(29.25)? Explain.

(b) Suppose that x̄ is a solution to (29.26), but x̄ is not a solution to (29.25). Show that if x̂ is

any solution to (29.25), then we must have g(x̂) = 0.

**(7) Suppose that a consumer has the utility function U(x, y) = xa yb and faces the budget constraint
**

px x + py y ≤ I.

(A) Utility Maximization

(a) What are the first order conditions for utility maximization?

(b) Solve for the consumer’s demands for goods x and y.

(c) Solve for the value of λ. What is the economic interpretation of λ? When is λ an

increasing, decreasing or constant function of income?

(d) Show that the second order conditions hold?

(e) Show that the implicit function theorem value of dx dI is identical to the value of taking

∗

the partial derivative of x with respect to I.

19. Problem Set 8 189

**(f) A consumer’s indirect utility function is defined to be utility as a function of prices
**

and income. Use x∗ and y∗ to solve for the indirect utility function. Is it true that the

partial of the indirect utility function with respect to income equals λ?

(B) Expenditure Minimization:

Now consider the “dual ”of the utility maximization problem. The dual problem is to min-

imize expenditures, Px x + Py y, subject to reaching a given level of utility, u0 (the constraint

is therefore U0 − xa yb = 0).

(a) What are the first order conditions for expenditure minimization?

(b) Use the first order conditions to solve for x∗ and y∗ (these are called the Hicksian or

compensated demand functions).

(c) Check the second order conditions.

(d) Write the level of income, I, necessary to reach U0 as a function of U0 , prices, and

parameters. How does this expenditure function relate to the indirect utility function?

(e) To avoid confusion, let us call solution for utility maximization of good x as x∗ and

solution for good x in expenditure minimization as h∗ . Prove that

∂x∗ ∂h∗ ∂x∗

= − x∗ .

∂Px ∂Px ∂I

Interpret this answer.

**(8) Suppose a consumer has the utility function U = a ln(x − x0 ) + b ln(y − y0 ) where a, b, x0 and
**

y0 are positive parameters. Assume that the usual budget constraint applies.

(a) Solve for the consumer’s demand for good x.

(b) Find the elasticities of demand for good x with respect to income and prices.

(c) Show that the utility function U = 45(x − x0 )3.5a (y − y0 )3.5a would have yielded the same

demand for good x.

**(9) Optimization with inequality constraints: Rationing.
**

Suppose a consumer has the utility function,

U(x, y, z) = a ln(x) + b ln(y) + c ln(z)

where a > 0, b > 0 and c > 0 are such that a + b + c = 1. The budget constraint is

px + qy + rz ≤ I.

**In other words, the prices of good x, y and z are p, q and r respectively and the consumer has
**

an income I. The prices and income are positive.

In addition, the consumer faces a rationing constraint. He is not allowed to buy more than

k > 0 units of good x.

(a) Solve the optimization problem.

(b) Under what condition on the various parameters, is the rationing constraint binding?

190 19. Problem Set 8

(c) Show that when the rationing constraint binds, the income that the consumer would have

liked to spend on good x but cannot do so now is split between good y and z in proportions

b : c.

(d) Would you expect rationing of bread purchases to affect demand for butter and rice in this

way? If not, how would you expect the bread-butter-rice case to differ from the result in

(c)?

Chapter 20

Envelope Theorem

20.1. Envelope Theorem for Unconstrained Problems

**Let f (x, α) be a continuously differentiable function of x ∈ Rn and a parameter α. For each choice
**

of α, consider the unconstrained maximization problem:

max f (x, α)

**where choice variable is x. It is of interest to us as to how the maximizer value x∗ changes as the
**

parameter value α changes.

**Theorem 20.1. Let x∗ (α) be a solution of this problem and also assume that x∗ (α) is a continuously
**

differentiable function of α. Then,

d ∂

f (x∗ (α), α) = f (x∗ (α), α)

dα ∂α

**Proof. We use the Chain Rule to get
**

d ∂ d ∗ ∂

f (x∗ (α), α) = ∑ f (x∗ (α), α) · xi (α) + f (x∗ (α), α)

dα i ∂x i dα ∂α

or

d ∂

f (x∗ (α), α) = f (x∗ (α), α)

dα ∂α

∂

since ∂xi f (x∗ (α), α) = 0 for i = 1, · · · , n by the First Order conditions for the solution.

191

192 20. Envelope Theorem

**Example 20.1. Consider the problem of maximizing the function f (x, a) = −2x2 + 2ax + 4a2 with
**

respect to x for any given value of a. What is the effect of a unit increase in the value of a on the

maximum value of f (x, a).

**This can be done directly by computing the x∗ which maximizes f . The first order condition
**

yields

f ′ (x) = −4x + 2a = 0.

So x∗ = 0.5a. We can plug this into f (x, a) which leads to

f (x∗ (a), a) = f (0.5a, a) = −0.5a2 + a2 + 4a2 = 4.5a2 .

**Observe that f (x∗ (a), a) increases at the rate of 9a as a increases. Alternatively we could apply the
**

Envelope Theorem to get

d f ∗ ∂ f (x∗ (a), a)

= = 2x∗ + 8a = 9a

da ∂a

since x∗ (a) = 0.5a.

Example 20.2. Consider the firms’ profit maximization problem.

(20.1) max π = p f (x) − wx.

(x)∈R+

**Let us denote the input level at which the maximum profit is attained by x∗ . We observe that x∗ is a
**

function of the parameters p and w. The maximum profit is the value function of this exercise and

we call it the profit function.

π∗ (p, w) = p f (x∗ (p, w)) − wx∗ (p, w).

**By Envelope Theorem
**

∂π∗ (p, w)

= f (x∗ (p, w)) > 0.

∂p

Thus the profit function is increasing in the price of the output. Also

∂π∗ (p, w)

= −x∗ (p, w) < 0.

∂w

So the profit function is decreasing in the price of the output. Further, it also shows that

∂π∗ (p, w)

x∗ (p, w) = − .

∂w

The profit maximizing input stock can be obtained by taking partial derivative of the profit function

with respect to w (a result known as Hotelling’s Lemma).

20.2. Meaning of the Lagrange multiplier 193

20.2. Meaning of the Lagrange multiplier

In this section we will see that the multipliers measure the sensitivity of the optimal value of the

objective function to the changes in the right-hand sides (parameters) of the constraints. In this

sense, they provide a natural measure of the value for scarce resources in economics maximization

problems.

**Consider a simple maximization problem with two variables and one equality constraint. Let
**

f R2 → R be denoted as f (x, y).

:

max f (x, y)

(20.2) (x,y)∈R2+

subject to h(x, y) = a.

Let (x∗ (a), y∗ (a)) be a solution to the above problem for any given parameter value a. Thus

f (x∗ (a), y∗ (a)) is the corresponding optimal value of the objective function. Let the Lagrange

multiplier be denoted by λ∗ (a). Following theorem shows that λ∗ (a) measures the rate of change

of the optimal value of the objective function f with respect to a.

Theorem 20.2. Let f and h be continuously differentiable functions of two variables. For any fixed

value of the parameter a, let (x∗ (a), y∗ (a)) be the solution of the optimization problem (20.2) with

the corresponding Lagrange multiplier λ∗ (a). Assume that x∗ (a), y∗ (a) and λ∗ (a) are continuously

differentiable functions of a and the constraint qualification holds at (x∗ (a), y∗ (a)). Then,

d f (x(a), y(a))

λ∗ (a) = .

da

**Proof. The Lagrangian for the problem (20.2) is
**

L ≡ f (x, y) − λ(h(x, y) − a)

where a is a parameter. The solution of this problem, (x∗ (a), y∗ (a)), λ∗ (a) satisfies the First Order

conditions.

∂L (x∗ (a),y∗ (a),λ∗ (a))

∂x =0

∂ f (x∗ (a),y∗ (a),λ∗ (a)) ∗ ∂h(x∗ (a),y∗ (a),λ∗ (a))

∂x − λ (a) ∂x =0

∂L (x∗ (a),y∗ (a),λ∗ (a))

∂y =0

∂ f (x∗ (a),y∗ (a),λ∗ (a)) ∗ ∗ ∗

∂y − λ∗ (a) ∂h(x (a),y∂y(a),λ (a)) =0

∂L (x∗ (a),y∗ (a),λ∗ (a))

∂λ =0

h(x∗ (a), y∗ (a)) − a =0

for all values of a. Also, since h(x∗ (a), y∗ (a)) = a for all a, we get,

∂h(x∗ (a), y∗ (a), λ∗ (a)) dx∗ (a) ∂h(x∗ (a), y∗ (a), λ∗ (a)) dy∗ (a)

+ =1

∂x da ∂y da

194 20. Envelope Theorem

for all a. Now we can use the Chain Rule and the two First Order conditions,

d f (x∗ (a),y∗ (a) ∂ f (x∗ (a),y∗ (a)) dx∗ (a) ∗ ∗ (a)) dy∗ (a)

da = ∂x · da + ∂ f (x (a),y ∂y · da

∂h(x ∗ (a),y∗ (a),λ∗ (a)) dx∗ (a) ∂h(x ∗ (a),y∗ (a),λ∗ (a)) dy∗ (a)

= λ∗ ∂x da + λ

∗

∂y da

∂h(x ∗ (a),y∗ (a),λ∗ (a)) dx∗ (a) ∂h(x ∗ (a),y∗ (a),λ∗ (a)) dy∗ (a)

= λ∗ [ ∂x da + ∂y da ]

= ∗

λ ·1 = λ ∗

20.3. Envelope Theorem for Constrained Optimization

**The general envelope theorem arise in the case of constrained optimizations where both the ob-
**

jective function as well as the constraint functions are functions of some parameters. Consider for

example the optimization exercise as follows:

max f (x, a)

subject to g j (x, a) = 0 for j = 1, · · · , m

(20.3) and x∈ Rn+ .

**In this case, the objective function f as well as the constraints g1 , · · · , gm depend on the
**

parameter a. Following theorem shows that the rate of change of f (x∗ (a), a) with respect to a

equals the partial derivative with respect to a, not of f but of the corresponding Lagrangian function

L.

Theorem 20.3. Let f , g1 , · · · , gm be continuously differentiable functions and let

x∗ (a) = (x1∗ (a), x2∗ (a), · · · , xn∗ (a))

denote the solution of the optimization problem (20.3) for any fixed value of the parameter a.

**Assume that x∗ (a), and the Lagrange multipliers λ∗1 (a), · · · , λ∗m (a) are continuously differen-
**

tiable functions of a and the constraint qualification condition holds. Then,

d f (x∗ (a), a) ∂L (x∗ (a), λ∗ (a), a)

(20.4) = .

da ∂a

Chapter 21

Elementary Concepts in

Probability

**Probability theory deals with random events, events whose occurrence cannot be predicted with
**

certainty. There are at least three sources of randomness. Firstly by nature many features of our

world are stochastic. Evolution of such a diverse variety of life is witness to unpredictability in

the universe and environment. Second source of randomness: Many events are the result of a very

large number of actions and decisions. Third source of randomness: Some variables may appear

random because they are measured with error.

**Even though we are not sure about the outcomes of a random event, we can attach to each
**

outcome a number called probability.

21.1. Discrete Probability Model

We first describe the set of outcomes of a random event, i.e., a set whose elements are all possible

outcomes of a random event. It is known as the sample space and denoted by Ω.

Example 21.1. The set of possible outcomes of flipping a fair coin is

Ω = {H, T }.

**The set of outcomes of rolling a die is
**

Ω = {1, 2, 3, 4, 5, 6}

195

196 21. Elementary Concepts in Probability

where the outcome i means that i appeared on the die, i = 1, 2, 3, 4, 5, 6.

**The set of outcomes for flipping two coins is
**

Ω = {HT, T H, T T, HH}.

It is easy to list the set of outcomes for flipping n coins, but very soon the list becomes too long.

**The set of outcomes of rolling two dice is
**

(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)

(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)

(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)

Ω=

(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)

(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)

(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)

**where the outcome (i, j) is said to occur if i appeared on the first die, and j appeared on the second
**

die.

**The set of outcomes for measuring the lifetime of a car, consists of non-negative real numbers.
**

Ω = [0, ∞).

Next we form the set F that contains all elements of the set Ω as well as their unions and

complements. Thus if A and B are in F , so does A ∪ B, Ac , and Bc . The set F , which is closed

under the operations of union and complements, is known as algebra.

Example 21.2. The algebra for the outcomes of flipping a fair coin is

/ Ω, {H}, {T }}.

F = {0,

The algebra for the outcomes for flipping two coins is

/ Ω, {T T }, {HH}, {HT, T H}, {HH, T T }, {HH, HT, T H}, {T T, HT, T H}}.

F = {0,

**We can now define a probability measure by assigning to each element of sample space Ω, a
**

probability P.

Definition 21.1. The set function P is called a probability measure if

/ = 0;

(i) P(0)

(ii) P(Ω) = 1;

21.1. Discrete Probability Model 197

/

(iii) P(A ∪ B) = P(A) + P(B) for all A, B ∈ Ω and A ∩ B = 0.

**The three conditions listed above are the axioms of probability theory.
**

Example 21.3. For the outcomes of flipping two fair coins,

P(HH) = P(HT ) = P(T T ) = P(T H) = 0.25.

**The triple of the set of outcomes, the algebra, and the probability measure (Ω, F , P) is referred
**

to as a probability model.

**In next step, we assign probabilities to the random events. Three sources of attaching proba-
**

bilities to the outcomes of random events are (a) equally likely events, (b) long run frequencies and

(c) degree of confidence (subjective or Bayesian approach). Observe that even though we assign

probabilities to different events, the mathematical theory for dealing with the random events and

their probabilities remain the same.

**We define random variable next. The rule that specifies a real number to the outcomes is called
**

a random variable. More formally,

Definition 21.2. A random variable is a set function that maps the set of outcomes of a random

event to the set of real numbers.

**Such a function is not unique and depending on the the purpose at hand, we may define one or
**

many random variables to the same random event.

Example 21.4. For the outcomes of flipping two fair coins, let us define a random variable X as

the number of heads. Then, we have

X(HH) = 2; X(HT ) = X(T H) = 1, X(T T ) = 0.

We could have defined the random variable X as the number of tails. Then, we have

X(HH) = 0; X(HT ) = X(T H) = 1, X(T T ) = 2.

**In collecting labor statistics, we are interested in the characteristics of the respondents. For
**

example, we may ask if a person is in the labor force or not, employed or unemployed. We could

also be interested to learn the demographic characteristics of the respondents like gender, race, age

etc. For each of these answers we can define one or more binary variables. For example let X = 1 if

a respondent who is in the labor force is unemployed and X = 0 if employed. We can define Y = 1

if the respondent is a woman and employed, Y = 0 otherwise.

**A random variable together with its probabilities is called a probability distribution.
**

198 21. Elementary Concepts in Probability

**Let us consider flipping a fair coin three times.
**

Example 21.5. For the outcomes of flipping a fair coin three times, let us define a random variable

X as the number of heads. The set of outcomes for flipping a fair coin three times is

Ω = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }.

Then, the probability distribution is

P(X = 0) = 0.125; P(X = 1) = 0.375; P(X = 2) = 0.375; P(X = 3) = 0.125.

**Probability distributions become unwieldy as the number of outcomes becomes large or infi-
**

nite. One way to summarize the information about a probability distribution is through its moments

as mean which measure the central tendency, and variance, which measures the dispersion or vari-

ability of the distribution. Another moment reflects the skewness of the distribution to the left or

to the right and kurtosis which is an indicator of the bundling of the outcomes near the mean : the

more values are concentrated near the mean, the taller is the peak of the distribution.

**The first moment of the distribution which is the expected value or the mean of the distribution
**

is defined as

n

E(X) = µ = ∑ xi P(xi ).

i=1

Example 21.6. For the distribution of the number of heads in three flips of a coin, we have,

µ = 0 · P(X = 0) + 1 · P(X = 1) + 2 · P(X = 2) + 3 · P(X = 3).

which yields the mean as

µ = 0 + 0.375 + 0.750 + 0.375 = 1.50

**In similar manner, we may define the rth moment of a distribution as
**

n

E(X r ) = mr = ∑ xir P(xi ).

i=1

Example 21.7. For the distribution of the number of heads in three flips of a coin, the second

moment is

E(X 2 ) = 02 · P(X = 0) + 12 · P(X = 1) + 22 · P(X = 2) + 32 · P(X = 3).

which yields the second moment as

µ = 0 + 0.375 + 1.50 + 1.125 = 3

**Another measure (which is of great importance) is the variance or the second moment around
**

the mean :

n

E(X − µ)2 = σ2 = ∑ (xi − µ)2 P(xi ).

i=1

21.2. Marginal and Conditional Distribution 199

The formula for the variance can be rewritten using the binomial expansion as

n

E(X − µ)2 = ∑ (xi − µ)2 P(xi )

i=1

n n

= ∑ xi2 P(xi ) − 2µ ∑ xi P(xi ) + µ2

i=1 i=1

n

= ∑ xi2 P(xi ) − µ2

i=1

Example 21.8. For the distribution of the number of heads in three flips of a coin, the variance is

σ2 = E(X 2 ) − µ2 = 3 − 1.52 = 0.75.

**Mean is a measure of central tendency of a distribution showing its center of gravity whereas
**

the variance and its square root, called the standard deviation measure the dispersion or the volatil-

ity of the distribution. The advantage of using the standard deviation is that it measures the disper-

sion in the same measurement units as the original variable. In finance, variance of returns of an

asset is used as a measure of risk.

21.2. Marginal and Conditional Distribution

**As we have observed before, a random event may give rise to a number of random variables each
**

defined by a different set function whose domains are the same set. In the Table below we present

such a situation where random variables X and Y and their probabilities are reported. Think of

Y as the annual income in units of thousand dollars of a profession and X as gender, with X = 0

denoting men and X = 1 denoting women. The information contained in the table is probability

of joint events, i.e., the probability of X and Y each taking a particular value. For instance the

probability of X = 1 and Y = 120 is 0.11, which is denoted as

P(X = 1,Y = 120) = 0.11.

**Such a probability is referred to as joint probability because it shows the probability of a woman
**

earning $120000 a year.

200 21. Elementary Concepts in Probability

X Y P

0 60 0.02

0 70 0.04

0 80 0.07

0 90 0.09

0 100 0.10

0 110 0.06

0 120 0.03

0 130 0.02

0 140 0.01

0 150 0.01

1 70 0.01

1 80 0.02

1 90 0.04

1 100 0.08

1 110 0.11

1 120 0.11

1 130 0.09

1 140 0.05

1 150 0.03

1 160 0.01

**If we are interested only in X, then we can sum up the overall relevant values of Y and get the
**

marginal probability of X. For example,

P(X = 1) = P(X = 1,Y = 70) + · · · + P(X = 1,Y = 160) = 0.01 + 0.02 + · · · + 0.03 + 0.01 = 0.55.

**In general we can write
**

n

P(X = xk ) = ∑ P(X = xk ,Y = y j )

j=1

**In similar manner, we can calculate the probability of X = 0 which would be 0.45. Thus the
**

marginal distribution of X is

X P(X)

0 0.45

1 0.55

A similar procedure yields the marginal probability of Y . For example,

P(Y = 90) = P(Y = 90, X = 0) + P(Y = 90, X = 1) = 0.09 + 0.04 = 0.13.

21.2. Marginal and Conditional Distribution 201

**Observe that in this example, the marginal distribution of X shows the distribution of men and
**

women in that profession (45% men and 55% women), whereas the marginal distribution of Y

would show the distribution of income for both men and women, i.e., profession as a whole.

**Sometimes we may be interested to know the probability of Y = 110 when we already know
**

that X = 1. Thus we want to know the conditional probability of Y = 110, given that X = 1.

P(Y = 110, X = 1) 0.11

P(Y = 110|X = 1) = = = 0.20

P(X = 1) 0.55

In general,

P(Y = y j , X = xk )

P(Y = y j |X = xk ) = .

P(X = xk )

We have computed the conditional distribution of Y |X = 0 and Y |X = 1.

Y P(Y— X=0) Y P(Y— X=1)

60 0.044 70 0.018

70 0.089 80 0.036

80 0.156 90 0.073

90 0.2 100 0.145

100 0.222 110 0.200

110 0.133 120 0.200

120 0.067 130 0.164

130 0.044 140 0.091

140 0.022 150 0.055

150 0.022 160 0.018

**A conditional distribution has a mean, variance and other moments. The mean is
**

n

E(Y |X = xk ) = ∑ y j P(y j |X = xk ).

j=1

Variance and other higher moments of the conditional distribution can be computed similarly.

**The conditional mean of the conditional distribution given above is
**

n

E(Y |X = 0) = ∑ y j P(y j |X = 0)

j=1

= 60 × 0.044 + 70 × 0.089 + 80 × 0.156 + 90 × 0.2 + 100 × 0.222

+ 110 × 0.133 + 120 × 0.067 + 130 × 0.044 + 140 × 0.022 + 150 × 0.022

= 101.4.

. We can compute the conditional mean for X = 1 to be

202 21. Elementary Concepts in Probability

E(Y |X = 1) = 111.4.

21.3. The Law of Iterated Expectation

**It relates the conditional mean and the unconditional mean. In general
**

n

E(Y ) = EX E(Y |X) = ∑ E(Y |X = x j )P(X = x j )

j=1

**For the example above,
**

E(Y ) = E(Y |X = 0)P(X = 0) + E(Y |X = 1)P(X = 1)

= 101.4 × 0.45 + 111.4 × 0.55 = 107.9

It is easy to infer that if E(Y |X = x j ) = 0 for all values of x, i.e., the conditional expectation of Y

equals zero, then the unconditional expectation E(Y ) = EX E(Y |x) = 0. However, the reverse is not

true. E(Y ) = 0 does not imply that E(Y |X) = 0 for all values of x.

21.4. Continuous Random Variables

**Many variables we come across in economics are continuous in nature as against discrete. In
**

assigning probabilities to continuous variables, we face the problem that no matter how small is

the interval of values of the continuous variable, there are infinitely many points in it. If we assign

positive probabilities to each point, the sum of such probabilities would diverge which violates the

axiom of probability theory, i. e., the sum of probabilities should add up to one.

**This problem is circumvented by assigning probabilities to the segments of the interval within
**

which the random variable is defined.

P(X ≤ 5), or P(−4 < X ≤ 2)

Example 21.9. A simple example of a continuous random variable is the uniform distribution.

Variable X can take any value between a and b and the probability of X falling within the segment

[a, c] is proportional to the length of the interval compared to the interval [a, b].

c−a

P(a < X ≤ c] =

b−a

**The probability distribution function of X is defined by
**

F(x) = P(X ≤ x)

and has to conform to the following conditions:

21.4. Continuous Random Variables 203

(a) F(x) is continuous.

(b) F(x) is non-decreasing, i.e.

F(x1 ) ≤ F(x2 ), if x1 < x2 .

(c)

F(−∞) = lim F(x) = 0, and F(∞) = lim F(x) = 1.

x→−∞ x→∞

These conditions are the counterpart of the discrete case and entail that probability is always posi-

tive and the sum of probabilities adds to one.

**Now we define the probability model for continuous random variables. Consider the extended
**

real line R = R ∪ {−∞, ∞} which shall play the same role for the continuous variables as Ω plays

for the discrete variables, (the set of all possible outcomes). Consider the half closed intervals on

R,

(a, b] = [x ∈ R : a < x ≤ b}]

and form finite sums of such intervals provided the intervals are disjoint:

n

A = ∑ (a j , b j ], n < ∞.

j=1

A set consisting of all such sums plus the empty set 0/ is an algebra, but it is not a σ-algebra. The

smallest σ-algebra that contains this set is called the Borel set and is denoted by B(R). Finally we

define the probability measure as

F(x) = P(−∞, x].

The triple (R, B(R), P) is our probability model for continuous random variables.

Chapter 22

Solution to PS 1

**(1) (a) ∼ (A ∧ B) ⇔ ∼ A ∨ ∼ B and ∼ (A ∨ B) ⇔∼ A ∧ ∼ B. We prove this claim using Truth
**

table.

A B A ∧ B A ∨ B ∼ (A ∧ B) ∼ A ∼ B ∼ A∨ ∼ B ∼ (A ∨ B) ∼ A∧ ∼ B

1 2 3 4 5 6 7 8 9 10

T T T T F F F F F F

T F F T T F T T F F

F T F T T T F T F F

F F F F T T T T T T

**Claim (a) is proved by comparing the columns 5 and 8.
**

(b) Claim (b) is proved by comparing columns 9 and 10.

(c) ∼ (A ⇒ B) ⇔ A ∧ ∼ B

A B A ⇒ B ∼ (A ⇒ B) ∼ B A∧ ∼ B

1 2 3 4 5 6

T T T F F F

T F F T T T

F T T F F F

F F T F T F

205

206 22. Solution to PS 1

(d) ((A ∨ B) ⇒ C) ⇔ ((A ⇒ C) ∧ (B ⇒ C))

A B C A ⇒ C B ⇒ C A ∨ B (A ⇒ C) ∧ (B ⇒ C) (A ∨ B) ⇒ C

1 2 3 4 5 6 7 8

T T T T T T T T

T T F F F T F F

T F T T T T T T

T F F F T T F F

F T T T T T T T

F T F T F T F F

F F T T T F T T

F F F T T F T T

(e) This claim is true.

If n is even, then n + 1 is odd. If n is odd, then n + 1 is even. Hence both cannot be even.

(f) Let n = 1, then n2 = 1 = n.

(g) Let x > 1.

x ∈ NO ⇔ ∃n ∈ N x = 2n + 1,

x 2

= (2n + 1)2

= 4n2 + 4n + 1

( )

= 2 2n2 + 2n + 1

(22.1) ⇒ x 2 ∈ NO .

For x = 1, x2 = 1 which is odd.

**(2) Recall that ∼ (A ⇒ B) is equivalent to A∧ ∼ B.
**

(a) Set S is closed and bounded, and S is not compact.

(b) Set S is compact, and S is either not closed or unbounded.

(c) Function f is continuous and not differentiable.

**(3) (a) If xy is NOT a rational number then x2 = 3 ∨ y2 < 5.
**

(b) If there does not exist a y such that xy = 1, then x = 0.

**(4) (a) The mistake is in assuming the same value of k for m and n. The correct proof should be
**

Proof. Since m is an even integer and n is an odd integer, m = 2k and n = 2p + 1 for some

integers k and p. Therefore, 2m + 3n = 2(2k) + 3(2p + 1) = 4k + 6p + 3 = 2(2k + 3p +

1) + 1 = 2l + 1; where l = 2k + 3p + 1. Since k, p ∈ Z, l ∈ Z. Hence, 2m + 3n = 2l + 1 for

some integer l, whence 2m + 3n is an odd integer.

(b) The mistake is in showing the claim for one particular value of n. The claim holds for all

positive integers. The correct proof should be

22. Solution to PS 1 207

**Proof. Observe that n2 + 2n + 1 = (n + 1)2 and (n + 1) · (n + 1) is a composite number for
**

all positive integers.

**(5) (a) (i) Contrapositive If a number is divisible by 4 then it is divisible by 2. Let y = 4m
**

where m ∈ N, then y = 2 (2m). Hence y is divisible by 2.

(ii) Contradiction There exist a number y which is not divisible by 2 but is divisible

by 4. Since y = 4m where m ∈ N, we know that y = 2 (2m) and so y is divisible by 2.

This contradicts our initial assumption.

(b) There is no greatest negative real number.

**Proof. Assume, to the contrary, that there is a greatest negative real number x. Then, x ≥ y
**

for every negative real number y. Consider the number 2x . Since x is a negative real number,

so too is 2x . Multiplying both sides of the inequality 12 < 1 by x, which is negative, gives

x x

2 > x. Hence, 2 is a negative real number that is greater than x, which is a contradiction.

Hence our assumption that there is a greatest negative real number is false. Thus there is

no greatest negative real number.

(c) The product of an irrational number and a nonzero rational number is irrational.

**Proof. Assume, to the contrary, that there exists a non-zero rational number p and an
**

irrational number q whose product is a rational number. Thus, by definition of rational

numbers, p = ab and p · q = r = dc for some integers a; b; c and d witha ̸= 0, b ̸= 0 and

d ̸= 0. Hence,

c

r d bc

q= = a =

p b ad

**Now, bc ∈ Z and ad ∈ Z since a; b; c and d ∈ Z. Since a ̸= 0 and d ̸= 0, ad ̸= 0. Hence,
**

r ∈ Q, which is a contradiction. Hence our assumption that there exists a non-zero rational

number and an irrational number whose product is a rational number is false. Thus, the

product of a rational number and an irrational number is irrational.

(6) (a) (i) Base of induction:

When n = 1, the statement P(1) : 1 = 12 holds trivially.

(ii) For every integer n > 1, let P(n) be the statement P(n) : 1 + 3 + · · · + (2n − 1) = n2 .

For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1

and assume that P(k) is true; that is, assume that 1 + 3 + · · · + (2k − 1) = k2 . For the

inductive step, we need to show that P(k + 1) is true. That is, we show that

1 + 3 + · · · + (2k − 1) + (2k + 1) = (k + 1)2 .

208 22. Solution to PS 1

**Evaluating the left-hand side of this equation, we have
**

1 + 3 + · · · + (2k − 1) + (2k + 1) = (1 + 3 + · · · + (2k − 1)) + (2k + 1)

= k2 + (2k + 1) (by the inductive hypothesis)

= (k + 1)2 ;

thus verifying that P(k + 1) is true.

(iii) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1;

that is,

1 + 3 + · · · + (2n − 1) = n2

is true for every positive integer n.

(b) (i) Base of induction:

When n = 1, the statement P(1) : 1 = 1(1+1) 2 is certainly true since 1(1+1)

2 = 22 = 1.

This establishes the base case when n = 1.

(ii) For every integer n > 1, let P(n) be the statement P(n) : 1 + 2 + · · · + n = n(n+1)

2 . For

the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 and

assume that P(k) is true; that is, assume that 1 + · · · + k = k(k+1)

2 . For the inductive

step, we need to show that P(k + 1) is true. That is, we show that

(k + 1)(k + 2)

1 + 2 + · · · + k + (k + 1) = .

2

Evaluating the left-hand side of this equation, we have

1 + 2 + · · · + k + (k + 1) = (1 + 2 + · · · + k) + (k + 1)

k(k + 1)

= + (k + 1) (by the inductive hypothesis)

2

k(k + 1) 2(k + 1)

= +

2 2

(k + 1)(k + 2)

= ;

2

thus verifying that P(k + 1) is true.

(iii) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1;

that is,

n(n + 1)

1+2+···+n =

2

is true for every positive integer n.

(c) (i) Base of induction:

[ ]2

When n = 1, the statement P(1) : 1 = 1(1+1) 2 = 1 is certainly true since 1(1+1)

2 =

2

= 1. This establishes the base case when n = 1.

2 [ ]2

(ii) For every integer n > 1, let P(n) be the statement P(n) : 13 +23 +· · ·+n3 = n(n+1)

2 .

For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1

22. Solution to PS 1 209

[ ]2

and assume that P(k) is true; that is, assume that 13 + · · · + k3 = k(k+1)

2 . For the

inductive step, we need to show that P(k + 1) is true. That is, we show that

[ ]

(k + 1)(k + 2) 2

1 + 2 + · · · + k + (k + 1) =

3 3 3 3

.

2

Evaluating the left-hand side of this equation, we have

13 + · · · + k3 + (k + 1)3 = (13 + · · · + k3 ) + (k + 1)3

[ ]

k(k + 1) 2

= + (k + 1)3 (by the inductive hypothesis)

2

[ ]

2 4(k + 1)

2 k

= (k + 1) +

4 4

[ ] [ ]

(k + 1) 2 2 (k + 1) 2

= [k + 4k + 4] = (k + 2)2

2 2

[ ]

(k + 1)(k + 2) 2

= ;

2

thus verifying that P(k + 1) is true.

(iii) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1;

that is,

[ ]

n(n + 1) 2

1 +···+n =

3 3

2

is true for every positive integer n.

(d) It is an example of an arithmetic - geometric series. Let us denote the sum by S, i.e.

S = a + (a + r)q + (a + 2r)q2 + · · · + (a + (n − 1)r)qn−1

Multiplying both sides by q, we get

qS = aq + (a + r)q2 + (a + 2r)q3 + · · · + (a + (n − 1)r)qn

Subtracting it from S, we get

S − qS = a + rq + rq2 + · · · + rqn−1 − (a + (n − 1)r)qn

All terms except the first and the last term on the right hand side constitute a geometric

series with first term being rq, the common ratio being q and the number of terms being

n − 1.

a − (a + (n − 1)r)qn rq + rq2 + · · · + rqn−1

S= +

1−q 1−q

210 22. Solution to PS 1

**The sum of the geometric series described above is
**

qr(1 − qn−1 )

.

1−q

We substitute this for the sum and get S as

rq(1−qn−1 )

a − (a + (n − 1)r)qn (1−q)

S= +

1−q 1−q

a − (a + (n − 1)r)qn rq(1 − qn−1 )

= + .

1−q (1 − q)2

**(7) To show that the formula holds for n = 0, we must show that
**

0

r0+1 − 1

∑ ri = r−1

.

i=0

**The left-hand side of this equation is ∑0i=0 ri = r0 = 1, while the right-hand side is r r−1−1 = 1,
**

0+1

**since r ̸= 1. Hence the formula holds for n = 0. For the inductive hypothesis, let k be an
**

arbitrary (but fixed) integer such that k ≥ 0 and assume that ∑ki=0 ri = r r−1−1 . For the inductive

k+1

k+1 i rk+2 −1

step, we need to show that ∑i=0 r = r−1 . Evaluating the left-hand side of this equation, we

have

k+1 k

∑ ri = ∑ ri + rk+1 (writing the (k + 1)st term separately)

i=0 i=0

rk+1 − 1

= + rk+1 (by the inductive hypothesis)

r−1

rk+1 − 1 (r − 1)rk+1

= +

r−1 r−1

r − 1 + r − rk+1

k+1 k+2

=

r−1

r −1

k+2

= ;

r−1

thus verifying the claim. Hence, by the principle of mathematical induction, the formula is true

for all integers n ≥ 0.

In the limiting case of n → ∞, the sum is well-defined for |r| < 1. Also the sum is 1−r 1

in

this case. In case of |r| ≥ 1 it is not well defined in case of n → ∞, though it is defined for all

n ∈ N.

**(8) (a) We proceed by mathematical induction. When n = 2, the result is true since in this case
**

n3 − n = 23 − 2 = 8 − 2 = 6 and 6 is divisible by 6. Hence, the base case when n = 2 is

true. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k ≥ 2

22. Solution to PS 1 211

**and assume that the property holds for n = k, i.e., suppose that k3 − k is divisible by 6. For
**

the inductive step, we must show that the property holds for n = k + 1. That is, we must

show that (k + 1)3 − (k + 1) is divisible by 6. Since k3 − k is divisible by 6, there exists,

by definition of divisibility, an integer r such that k3 − k = 6r. Now, by the laws of algebra

and the inductive hypothesis, it follows that

(k + 1)3 − (k + 1) = (k3 + 3k2 + 3k + 1) − (k + 1)

= (k3 − k) + 3(k2 + k)

= 6r + 3k(k + 1)

**Now, k(k + 1) is a product of two consecutive integers, and is therefore even. Hence,
**

k(k + 1) = 2s for some integer s. Thus, 6r + 3k(k + 1) = 6r + 3(2s) = 6(r + s), and so, by

substitution, (k + 1)3 − (k + 1) = 6(r + s), which is divisible by 6. Therefore, (k + 1)3 −

(k + 1) is divisible by 6, as desired. Hence, by the principle of mathematical induction, the

property holds for all integers n ≥ 2.

(b) We proceed, as before, by mathematical induction. When n = 3, the inequality holds since

in this case 2n = 23 = 8 and 2n + 1 = 2 · 3 + 1 = 7, and 8 > 7. Hence, the base case when

n = 3 is true. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that

k > 3 and assume that the inequality holds for n = k, i.e., suppose that 2k > 2k + 1. For

the inductive step, we must show that the inequality holds for n = k + 1. That is, we must

show that 2k+1 > 2(k + 1) + 1. Now,

2k+1 = 2 · 2k

> 2 · (2k + 1) (by the inductive hypothesis)

= 2(k + 1) + 2k

> 2(k + 1) + 1 (since k ≥ 3),

**as desired. Hence, by the principle of mathematical induction, the inequality holds for all
**

integers n ≥ 3.

**(9) By the Quotient Remainder theorem, for d = 6, for all natural number m, m = 6n + r where
**

n is integer and r ∈ {0, 1, 2, 3, 4, 5}. Since m is prime, it cannot be of the form 6n (divisible

by 6), 6n + 2, or 6n + 4 (divisible by 2) or 6n + 3 (divisible by 3). Thus the only remaining

possibilities are 6n + 1 and 6n + 5.

212 22. Solution to PS 1

|9 − 5x| ≤ 11

9 − 5x ≤ 11 or − (9 − 5x) ≤ 11

(10) 9 − 5x − 9 ≤ 11 − 9 −9 + 5x ≤ 11

−5x ≤ 2 −9 + 5x + 9 ≤ 11 + 9

1 1

− · −5x ≥ − · 2 5x ≤ 20

5 5

2 1 1

x≥− · 5x ≤ · 20

5 5 5

Chapter 23

Solution to PS 2

**(1) We need to verify that it satisfies three conditions of the distance function.
**

(a) (i) Non-negativity is obvious as the absolute value is non-negative. If x = y, then d(x, y) =

0. Also if

n

d (x, y) = ∑ | xi − yi |= 0

i=1

then xi − yi = 0 for all i = 1, · · · , n. This implies that x = y.

(ii) Symmetry is obvious too since absolute value function is symmetric,

| a − b |=| b − a | .

**(iii) Triangle Inequality: Note that
**

| xi − zi |6| xi − yi | + | yi − zi |

holds for all i = 1, 2, · · · , n. Hence

n n n

∑ |xi − zi | 6 ∑ |xi − yi | + ∑ |yi − zi |

i=1 i=1 i=1

or

d(x, z) 6 d(x, y) + d(y, z).

Hence it is a distance function.

(b) (i) Non-negativity is obvious as the maximum of two absolute values is non-negative. If

x = y, d(x, y) = 0. Also

d (x, y) = max{|x1 − y1 | , |x2 − y2 |} = 0

⇒ |x1 − y1 | = 0 = |x2 − y2 | ⇒ x = y.

213

214 23. Solution to PS 2

**(ii) Symmetry is obvious too since absolute value function is symmetric
**

| a − b |=| b − a | .

**(iii) Triangle Inequality I: Note that max{a, b} > a and max{a, b} > b. Using this we
**

have

d(x, y) > |x1 − y1 | and d(x, y) > |x2 − y2 |

d(y, z) > |y1 − z1 | and d(y, z) > |y2 − z2 |

d(x, y) + d(y, z) > |x1 − y1 | + |y1 − z1 | > |x1 − z1 |

d(x, y) + d(y, z) > |x2 − y2 | + |y2 − z2 | > |x2 − z2 | .

It follows that

d(x, y) + d(y, z) > max{|x1 − z1 | , |x2 − z2 |} = d (x, z) .

Hence it is a distance function.

(iv) Triangle Inequality II: Consider the case when d(x, z) = |x1 − z1 |, i.e., |x1 − z1 | ≥

|x2 − z2 |.

Then using triangle inequality for the absolute value function,

d(x, z) = |x1 − z1 | ≤ |x1 − y1 | + |y1 − z1 |

≤ d(x, y) + d(y, z)

The inequality in second line follows from the fact that either d(x, y) = |x1 − y1 | or

d(x, y) > |x1 − y1 |. Similar observations hold for d(y, z). The second case of d(x, z) =

|x2 − z2 | will be similar. Hence it is a distance function.

(c) (i) Non-negativity: d(x, y) ≥ 0 for all x, y in Rn , and thus 1 + d(x, y) ≥ 1 for all x, y in

Rn . As a result, d1 (x, y) ≥ 0 for all x, y in Rn .

By the definition of d1 (x, y), d1 (x, y) = 0 if and only if d(x, y) = 0. But d(x, y) = 0 if

and only if x = y.

(ii) Since d(x, y) = d(y, x), it is straightforward to see that d1 (x, y) = d1 (y, x).

(iii) Triangle Inequality I

d1 (x, z) ≤ d1 (x, y) + d1 (y, z) ⇔

d(x, z) d(x, y) d(y, z)

≤ + ⇔

1 + d(x, z) 1 + d(x, y) 1 + d(y, z)

d(x, z)[1 + d(x, y)][1 + d(y, z)] ≤ d(x, y)[1 + d(x, z)][1 + d(y, z)]

+ d(y, z)[1 + d(x, y)][1 + d(x, z)] ⇔

d(x, z) ≤ d(x, y) + d(y, z) + 2d(x, y)d(y, z) + d(x, y)d(y, z)d(x, z)

Since d(x, y) + d(y, z) ≥ d(x, z), d(a, b) ≥ 0 for any (a, b) ∈ Rn × Rn , the last inequal-

ity is always true. Thus d1 (x, z) ≤ d1 (x, y) + d1 (y, z) for all x, y, z in Rn .

23. Solution to PS 2 215

(iv) Triangle Inequality II

We use notation a ≡ d(x, z) and b ≡ d(x, y) + d(y, z). Then

a ≤ b → a + ab ≤ b + ab

a b

a(1 + b) ≤ b(1 + a) → ≤

1+a 1+b

d(x, z) d(x, y) + d(y, z)

≤

1 + d(x, z) 1 + d(x, y) + d(y, z)

d(x, z) d(x, y) d(y, z)

≤ +

1 + d(x, z) 1 + d(x, y) 1 + d(y, z)

d1 (x, z) ≤ d1 (x, y) + d1 (y, z).

∪∞ [ ]

(2) It is bounded. Take B = 2, ∥x∥ 6 2, ∀x ∈ n=1

1 2

n, n . But it is NOT closed as

∞ [ ]

∪ 1 2

, = (0, 2].

n=1

n n

So it is not compact.

(3)

(A ∪ B)c ⊆ Ac ∪ Bc is TRUE.

Let, x ∈ (A ∪ B)c

⇒ x∈

/ (A ∪ B)

⇒ x∈

/ A∧x ∈

/B

⇒ x ∈ AC ∧ x ∈ BC

⇒ x ∈ AC ∪ BC

(A ∪ B)c ⊇ Ac ∪ Bc is FALSE.

Let, x ∈ Ac ∪ Bc and let x ∈ Ac ∧ x ∈

/ Bc

⇒ x∈

/ A∧x ∈ B ⇒ x ∈ A∪B

/ (A ∪ B)C .

⇒ x∈

**(4) It is enough to show that one of the properties of the vector space is not satisfied by this space.
**

Take scalar multiplication by 2. Let (x1 , x2 ) ∈ C and let α = 2 be a scalar. Then

(2x1 , 2x2 ) ∈ R2 : (2x1 )2 + (2x2 )2 = 4.

Hence (2x1 , 2x2 ) ∈

/ C and so C is not a vector space.

216 23. Solution to PS 2

**(5) In this case the commutative property of the sum of vectors does not hold. Consider a = (2, 3)
**

and b = (4, 5).

Then a + b = (2 + 4, 3 − 5) = (6, −2) and b + a = (4 + 2, 5 − 3) = (6, 2). Hence

(2, 3) + (4, 5) ̸= (4, 5) + (2, 3).

So V is not a vector space.

**(6) In this case also, the commutative property of the sum of vectors does not hold. Consider as
**

before, a = (2, 3) and b = (4, 5).

Then a + b = (2 + 2 × 4, 3 + 3 × 5) = (10, 18) and b + a = (4 + 2 × 2, 5 + 3 × 3) = (8, 14).

Hence

(2, 3) + (4, 5) ̸= (4, 5) + (2, 3).

So V is not a vector space.

(7) (a) (J ∩ K)c = J c ∪ K c .

We split the proof in two parts.

(i)

(23.1) (J ∩ K)c ⊆ J c ∪ K c .

Let

x ∈ (J ∩ K)c

⇒ x∈

/ (J ∩ K)

⇒ x∈

/ J ∨x ∈

/K

⇒ x ∈ JC ∨ x ∈ K C

⇒ x ∈ JC ∪ K C .

(ii) Next

Jc ∪ Kc ⊆ Jc ∩ Kc

x ∈ JC ∪ K C

⇒ x ∈ JC ∨ x ∈ K C

⇒ x∈

/ J ∨x ∈

/K

⇒ x∈

/ J ∩K

⇒ x ∈ (J ∩ K)C .

(b) (J ∪ K)c = J c ∩ K c .

(23.2) (J ∪ K)c ⊆ J c ∩ K c

23. Solution to PS 2 217

Let

x ∈ (J ∪ K)c

⇒ x∈

/ (J ∪ K)

⇒ x∈

/ J ∧x ∈

/K

⇒ x ∈ JC ∧ x ∈ K C

⇒ x ∈ JC ∩ K C .

Next

J c ∩ K c ⊆ (J ∪ K)c

Let

x ∈ Jc ∩ Kc

⇒ x ∈ Jc ∧ x ∈ Kc

⇒ x∈

/ J ∧x ∈

/K

⇒ x∈

/ J ∪K

⇒ x ∈ (J ∪ K)c .

**(8) We need to show that ∀ ε > 0 ∃ N such that
**

∀n > N, (xn + yn ) − (x + y) < ε.

Note

|xn + yn − x − y| = |xn − x + yn − y| 6 |xn − x| + |yn − y| ,

by triangle inequality. Since

ε

xn → x, ∃N1 s.t. ∀n > N1 , |xn − x| <

2

ε

yn → y, ∃N2 s.t. ∀n > N2 , |yn − y| < .

2

Let N = max {N1 , N2 }. Hence

ε ε

∀n > N, |xn − x| + |yn − y| < + = ε,

2 2

⇒ |xn + yn − x − y| < ε.

So {xn + yn } → x + y.

**(9) We know that if a sequence is convergent then it is bounded. The contrapositive statement
**

will be, “If a sequence is not bounded then it is not convergent.”. The sequence xn = n, n ∈ N

is NOT bounded. No matter which B we choose as a bound, there will be a natural number

greater than it. We now use the contrapositive to conclude that {xn }∞

n=1 is not convergent.

218 23. Solution to PS 2

**(10) Since {xn } is a Cauchy sequence, for ∀ε > 0, there exist N ∈ N such that ∀m, n > N implies
**

that |xn − xm | < ε. Choose ε = 1, m = N, then

|xn − xN | < 1 ⇒ xn < 1 + |xN | , ∀n > N.

Let { }

B = max |x1 | , |x2 | , · · · , 1 + |xN | ,

then xn 6 B, ∀n ∈ N.

**(11) It is easy to check that the sequence {converges
**

} to 2 being sum of a constant sequence {xn } =

{2, 2, · · · } and the sequence {yn } = − n . We have already seen in the class notes that the

1

**second sequence {yn } converges to zero. Hence, the sequence being some of two convergent
**

sequences converges to the sum of the limits which is equal to 2 + 0 = 2. Since limit of

convergent sequence is unique, 1 cannot be a limit.

**(12) We consider monotone increasing sequence xn 6 xn+1 . Proof is analogous for the monotone
**

decreasing case. Let {xn } be a convergent sequence and let lim xn = x. From the definition of

n→∞

convergence, with ε = 1, we get N ∈ N such that ∀n > N implies that |xn − x| < 1. Then,

xn < 1 + |x| , ∀n > N.

Let { }

B = max |x1 | , |x2 | , · · · , 1 + |x| ,

then xn 6 B, ∀n ∈ N. Now let the sequence be bounded. Let x be the least upper bound. Then

xn 6 x ∀n ∈ N. For every ε > 0, there exists a N ∈ N, such that x − ε < xN 6 x. Otherwise x − ε

would be an upper bound for the sequence. Since xn is increasing, n > N implies

x − ε < xn 6 x

which shows that xn converges to x.

**(13) (i) S = (0, 1) Open: For any x ∈ (0, 1), open ball with radius min {x, 1 − x} is contained in S.
**

(ii) S = [0, 1] Closed: Use the theorem: A set S ⊆ Rn is closed if and only if every convergent

sequence of points {xn } ∈ S has its limit x ∈ A. Let {xn } be a convergent sequence with

limit x contained in S, then for all n, xn > 0, and xn 6 1. Since weak inequalities are

preserved in the limit, x 6 1 and x > 0. So x ∈ S and S is closed.

(iii) S{ = [0, }

1): Neither open nor closed: It is not closed since the limit of convergent sequence

1 − n is not contained in S and is not open since x = 0 is contained in S but it is not

1

**possible to have an open ball with centre x = 0 which is contained in S.
**

(iv) S = R; Both open and closed: Use the result in the notes that empty set is both open and

closed and R is complement of the empty set.

Chapter 24

Solution to PS 3

(1)

[ ] 9 6 5 4

1 −1 7

AB = . 1 −2 −3 3

0 8 10

0 1 −1 2

[ ]

1·9−1·1+7·0 1·6−1·2+7·1 1·5+1·3−7·1 1·4−1·3+7·2

=

0 · 9 + 8 · 1 + 10 · 0 0 · 6 − 8 · 2 + 10 · 1 0 · 5 − 8 · 3 − 10 · 1 0 · 4 + 8 · 3 + 10 · 2

[ ]

8 15 1 15

(24.1) =

8 −6 −34 44

Note BA is not defined in this case.

**(2) Set up the vector equation as
**

( ) ( ) ( )

1 1 0

λ1 + λ2 =

2 3 0

{

λ1 + λ2 = 0

⇔

2λ1 + 3λ2 = 0

{

λ1 = −λ2

(24.2) ⇔

λ1 = − 23 λ2

**The only solution is
**

(24.3) λ1 = 0, λ2 = 0.

219

220 24. Solution to PS 3

So the two vectors are linearly independent.

(3)

[ ] 8 4

1 6 2

AB = . 0 −2

−1 5 3

7 −3

[ ]

1·8+6·0+2·7 1·4−6·2−2·3

=

−1 · 8 + 5 · 0 + 3 · 7 −1 · 4 − 5 · 2 − 3 · 3

[ ]

22 −14

(24.4) =

13 −23

8 4 [ ]

1 6 2

BA = 0 −2 .

−1 5 3

7 −3

8·1−4·1 8·6+4·5 8·2+4·3

= 0·1+2·1 0·6−2·5 0·2−2·3

7·1+3·1 7·6−3·5 7·2−3·3

4 68 28

(24.5) = 2 −10 −6

10 27 5

(4)

1 2 3 4

1 2 1 2

A=

1 3 5 7

2 1 4 1

Let us expand the determinant by the first column.

2 1 2 2 3 4

|A| = 1 · (−1)1+1 3 5 7 + 1 · (−1)2+1 3 5 7

1 4 1 1 4 1

2 3 4 2 3 4

3+1 4+1

+1 · (−1) 2 1 2 + 2 · (−1) 2 1 2

1 4 1 3 5 7

24. Solution to PS 3 221

2 1 2

5 7 3 7 3 5

3 5 7 = 2 −1 +2 = −28

4 1 1 1 1 4

1 4 1

2 3 4

5 7 3 7 3 5

3 5 7 = 2 −3 +4 = −6

4 1 1 1 1 4

1 4 1

2 3 4

1 2 2 2 2 1

2 1 2 = 2 −3 +4 = 14

4 1 1 1 1 4

1 4 1

2 3 4

1 2 2 2 2 1

2 1 2 = 2 −3 +4 = −2

5 7 3 7 3 5

3 5 7

and therefore

|A| = −28 − 1 · (−6) + 1 · 14 + (−2) · (−2)

= −4

**(5) Recall the rank of a matrix A is the number of linearly independent column vectors of A. It is
**

also equal to the number of linearly independent row vectors of A.

3 2 1

A= 0 1 7

5 4 −1

**Take Columns 1 and 2.
**

3 2 0

λ1 0 + λ2 1 = 0

5 4 0

3λ1 + 2λ2 = 0

⇔ λ2 = 0

5λ1 + 4λ2 = 0

(24.6) ⇔ λ1 = 0, λ2 = 0

222 24. Solution to PS 3

is the only solution. So the first two columns are linearly independent. Now lets take all three

columns,

3 2 1 0

λ1 0 + λ2 1 + λ3 7 = 0

5 4 −1 0

3λ1 + 2λ2 + λ3 = 0 (i)

⇔ λ2 + 7λ3 = 0 (ii)

5λ1 + 4λ2 − λ3 = 0 (iii)

(i) − 2 (ii) : 3λ1 − 13λ3 = 0

(24.7) ⇔ (iii) − 4 (ii) : 5λ1 − 29λ3 = 0

So, λ1 = 0, λ3 = 0 → λ2 = 0

is the only solution. So all three columns are linearly independent. This implies that the rank

of matrix A is 3.

**(6) Recall that the system of equation
**

(24.8) A · x = b

3×3 3×1 3×1

**has a solution if and only if
**

(24.9) rank (A) = rank (Ab )

and the solution, if it exists, is unique if and only if

(24.10) rank (A) = rank (Ab ) = 3#of unknowns

In this question

1 1 1 1 1 1 6

A = 1 2 3 , Ab = 1 2 3 10

1 2 λ 1 2 λ µ

We can verify that the rank of A is at least 2 since the first two rows of A are linearly inde-

pendent. Similarly the rank of Ab is at least 2 since the first two rows of Ab are also linearly

independent.

(a) For no solution to exist, the ranks of A and Ab need to be different which will be possible

only in case rank of A is 2 and rank of Ab is 3. This is because if rank of A is 3 then so is

the rank of Ab .

For rank of A to be 2, λ = 3. Also for rank of Ab to be equal to 3, µ ̸= 10.

(b) For unique solution, the rank of A and Ab must be equal to 3. Rank of A is 3 if and only if

λ ̸= 3. In this case, rank of Ab is 3 for every value of µ ∈ R. Thus for λ ̸= 3 and µ ∈ R we

get unique solution.

24. Solution to PS 3 223

**(c) For infinitely many solution, rank of A and Ab need to be equal to 2. This is possible if and
**

only if λ = 3 and µ = 10.

You might consider writing down the solutions in the last two cases in terms of λ and µ

values.

(7)

A11 = 2 > 0, A11 A22 − A12 A21 = 2 · 1 − 1 = 1 > 0: PD

B11 > 0, B22 > 0, B11 B22 − B12 B21 = 2 · 8 − 16 = 0: PSD

C11 < 0, C11C22 −C12C21 = −3 · 5 − 16 < 0 : Indefinite

D11 < 0, D11 D22 − D12 D21 = −3 · (−6) − 16 > 0: ND

**(8) Let Et and Ut denote the number of people who have employment and unemployed people in
**

some period t. The transition probabilities are defined as follows.

pAA ≡ probability that a current A remains an A,

pAB ≡ probability that a current A moves to B,

pBB ≡ probability that a current B remains a B,

pbA ≡ probability that a current B moves to A.

The distribution of employees at time t is denoted by the vector xt′ = [At Bt ] and the transition

probabilities in matrix form as

[ ] [ ]

pAA pAB 0.9 0.1

(24.11) M= = .

pBA pBB 0.7 0.3

**Then the distribution of employees across the two locations next period (t + 1) is xt′ · M = xt+1
**

′ ,

which is

[ ]

0.9 0.1

[At Bt ] = [(0.9At + 0.7Bt ) (0.1At + 0.3Bt )] = [At+1 Bt+1 ].

0.7 0.3

In the similar manner we can determine the distribution of employees after two periods.

′ ′

xt+1 · M = xt+2

[ ]

0.9 0.1

[At+1 Bt+1 ] = [At+2 Bt+2 ]

0.7 0.3

[ ][ ]

0.9 0.1 0.9 0.1

[At Bt ] = [At+2 Bt+2 ]

0.7 0.3 0.7 0.3

[ ]2

0.9 0.1

[At Bt ] = [At+2 Bt+2 ]

0.7 0.3

224 24. Solution to PS 3

**In general, for n periods,
**

[ ]n

0.9 0.1

(24.12) [At Bt ] = [At+n Bt+n ]

0.7 0.3

The initial distribution of employees across two states at time t = 0 as

x0′ = [A0 B0 ] = [0 2000]

Then the distribution of employees in the next period t = 1 is

[ ]

0.9 0.1

[0 2000] = [1400 600] = [A1 B1 ].

0.7 0.3

The distribution after two periods is

[ ]2 [ ]

0.9 0.1 0.88 0.12

[0 2000] = [0 2000] = [1680 320] = [A2 B2 ]

0.7 0.3 0.84 0.16

The distribution after four periods is

[ ]4 [ ]

0.9 0.1 0.8752 0.1248

[0 2000] = [0 2000] = [1747 253] = [A4 B4 ]

0.7 0.3 0.8736 0.1264

The distribution after six periods is

[ ]6 [ ]

0.9 0.1 0.875005 0.124992

[0 2000] = [0 2000] = [1749 251]. = [A6 B6 ]

0.3 0.7 0.874944 0.125056

The distribution after eight periods is

[ ]8 [ ]

0.9 0.1 0.8750 0.1250

[0 2000] = [0 2000] = [1750 250] = [A8 B8 ]

0.3 0.7 0.8750 0.1250

The distribution after ten periods is

[ ]10 [ ]

0.9 0.1 0.8750 0.1250

[0 2000] = [0 2000] = [1750 250] = [A10 B10 ].

0.3 0.7 0.8750 0.1250

Observe that when the transition matrix is raised to higher powers, the new transition matrix

converges to a matrix whose rows are identical. This is referred to as the steady state. In this

example, the steady state would be

[ ]

7 1

M̄ = 8 8 .

7 1

8 8

**For this observe that [ ]
**

0.9 0.1

[A B] × = [A B],

0.7 0.3

24. Solution to PS 3 225

gives

0.9A + 0.7B = A,

and

A + B = 2000.

Then we get

7 1

A = 7B, or A = · (2000), B = · (2000).

8 8

(9) (a) We know

det AB = det A × det B.

Since the matrix A is nilpotent, we know

det Ak = [det A]k = det O = 0.

Hence, det A = 0.

(b) Note

det A′ = det A.

Also, matrix −A is obtained by multiplying each row (or each column) of matrix A by −1.

Hence,

det(−A) = (−1)n det A = − det A,

if n is an odd number. Thus

det A′ = det A = − det A.

This leads to

det A = 0,

and therefore A is not invertible.

(c) Note

det A′ = det A.

and

det[AA′ ] = det A × det A′ = det A × det A = [det A]2 = det I = 1,

we get

det A = ±1.

(d) As we have seen in part (b), for n an odd integer,

det AB = det A × det B = (−1)n det BA = (−1)n det B det A = − det A × det B,

implies

det A × det B = 0.

This means either det A = 0 (i. e., A is not invertible) or det B = 0 (i.e., B is not invertible).

226 24. Solution to PS 3

(e) Since

det AB = det A × det B = det I = 1,

det A ̸= 0 and therefore A is invertible. Pre-multiplying both sides by A−1 , we get

A−1 AB = IB = B = A−1 I = A−1 ,

showing that A−1 = B.

**(10) (a) We use the Result 6.1 to prove this. The determinant of the upper triangular matrix is equal
**

to the product of all the diagonal terms. By definition of eigenvalue, it is clear that if we

take λi = aii , then the determinant of the matrix [A − λi I] is zero since the diagonal entry

in row i or column i is zero.

Similar arguments can be used to prove the result for the lower triangular matrix.

(b) Since A is an invertible matrix, A−1 exists and we can pre-multiply the equation (A−λI)x =

0 by (A−1 . This yields (I − λA−1 )x = 0 or ( λ1 I − A−1 )x = 0 or (A−1 − λ1 I)x = 0 as desired.

Thus for an invertible matrix A, λ is an eigenvalue of A if and only if λ1 is an eigenvalue of

A−1 .

(c) Assume λ is the eigenvalue for the eigenvector x, we know

Ax = λx.

Pre-multiplying both sides by A, we get

A × Ax = A × λx = (λ)Ax = (λ)λx = (λ)2 x.

In other words, we get

A2 x = (λ)2 x.

Hence x is an eigenvector of A2 and the corresponding eigenvalue is λ2 . Using the exercise

in part (b) and similar argument, we can show that x is an eigenvector of A−2 and the

corresponding eigenvalue is λ−2 .

Chapter 25

Solution to PS 4

(1) (a)

[ ]1

2x + 1 2

f (x) =

x−1

[ ] 1

1 2x + 1 − 2 (x − 1) 2 − (2x + 1) 1

f ′ (x) =

2 x−1 (x − 1)2

[ ]1

3 x−1 2 1

= −

2 2x + 1 (x − 1)2

3 1

(25.1) = −

2 (2x + 1) 12 (x − 1) 32

(b)

f (x) = ln(3x2 − 5x)

1

f ′ (x) = (6x − 5)

3x2 − 5x

6x − 5

(25.2) = .

3x2 − 5x

(2) Recall the equation for the tangent to f (x) at x0 is

y = f (x0 ) + f ′ (x0 ) (x − x0 ) .

227

228 25. Solution to PS 4

Using f (2) = 24, f ′ (x) = 10x + 3, f ′ (2) = 23,

y = 24 + 23 (x − 2) ,

y = −22 + 23x.

(3) (a)

lim f (x) = lim+ f (x) ̸= f (x0 )

x→x0− x→x0

lim f (x) = −1, lim+ f (x) = 0 = f (x0 )

x→0− x→0

lim− f (x) ̸= lim+ f (x)

x→x0 x→x0

**Hence f (x) is not continuous at x = 0.
**

(b)

lim g (x) = 3 · 2 − 2 = 4; lim+ g (x) = −2 + 6 = 4 = g (x0 ) .

x→2− x→2

Hence g (x) is continuos at x = 2.

(4) Since both

f (0) = g (0) = 0,

we can use L’Hospital rule to find the limit.

( )

f ′ (x) = 2x · exp x2 − exp (−x) ⇒ f ′ (0) = −1

(25.3) g′ (x) = 2 ⇒ g′ (0) = 2.

Hence

( )

exp x2 + exp (−x) − 2 −1

(25.4) lim = .

x→0 2x 2

(5)

[ ]

∇ f (x, y) = 2xy + y2 − 2y + 3 x2 + 2xy − 2x

[ ]

2y 2x + 2y − 2

H f (x, y) =

2x + 2y − 2 2x

[ ]

4 4

(25.5) H f (1, 2) =

4 2

25. Solution to PS 4 229

(6) Let f (x, y) be defined as

{

xy

x2 +y2

if (x, y) ̸= (0, 0)

f (x, y) =

0 otherwise.

**Show that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in R2 , although f
**

is not continuous at (0, 0).

(a) Observe that for all (x, y) ̸= (0, 0), we get

(x2 + y2 )(y) − xy(2x)

D1 f (x, y) =

(x2 + y2 )2

y(y2 − x2 )

=

(x2 + y2 )2

and

(x2 + y2 )(x) − xy(2y)

D2 f (x, y) =

(x2 + y2 )2

x(x2 − y2 )

=

(x2 + y2 )2

Further,

f (h, 0) − f (0, 0)

D1 f (0, 0) = lim

h→0 h

0

= =0

h

and

f (0, h) − f (0, 0)

D2 f (0, 0) = lim

h→0 h

0

= =0

h

Thus the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point (x, y) ∈ R2 .

(b) Consider y = x. The function f (x, x) = 12 for all points x ̸= 0 and therefore f (0, 0) = 0 ̸=

limh→0 f (h, h). Hence it is not continuous at (0, 0).

**(7) This exercise gives an example of a function with D12 f (x, y) ̸= D21 (x, y). Let f (x, y) be defined
**

as

{

xy(x2 −y2 )

x2 +y2

if (x, y) ̸= (0, 0)

f (x, y) =

0 otherwise.

230 25. Solution to PS 4

(a) Observe that for all (x, y) ̸= (0, 0), we get

(x2 + y2 )(3x2 y − y3 ) − (x3 y − xy3 )2x

D1 f (x, y) =

(x2 + y2 )2

y(x4 + 4x2 y2 − y4 )

=

(x2 + y2 )2

and

(x2 + y2 )(x3 − 3xy2 ) − (x3 y − xy3 )2y

D2 f (x, y) =

(x2 + y2 )2

x(y4 + 4x2 y2 − x4 )

=

(x2 + y2 )2

Further,

f (h, 0) − f (0, 0)

D1 f (0, 0) =

h

0

= =0

h

and

f (0, h) − f (0, 0)

D2 f (0, 0) =

h

0

= =0

h

Further,

( )

y(x4 + 4x2 y2 − y4 ) 2x2 y2 − 2y4

D1 f (x, y) = = y 1+ 4

(x2 + y2 )2 x + 2x2 y2 + y4

( ) ( )

2x2 y2 2x2 y2

≤ y 1+ 4 = y 1+ 2 2 ≤ y(1 + 1).

x + 2x2 y2 + y4 2x y + (x4 + y4 )

**So D1 f (x, y) ≤ 2|y|. It is easy to verify that D2 f (x, y) ≤ 2|x| on similar lines. This shows
**

that D1 f (x, y)(x,y)→(0,0) → 0 = D1 f (0, 0) as lim(x,y)→(0,0) 2|y| = 0. Similarly, D2 f (x, y)(x,y)→(0,0) →

0 = D2 f (0, 0) as lim(x,y)→(0,0) 2|x| = 0. For all (x, y) ∈ R2 \ (0, 0) the partial derivatives

D1 f (x, y) and D2 f (x, y) are continuos functions being a ratio of two polynomials with

non-vanishing denominator.

Thus the partial derivatives D1 f (x, y) and D2 f (x, y) exist and are continuous at every point

(x, y) ∈ R2 .

(b) Since the real-valued function f has continuous partial derivatives at every point (x, y) ∈

R2 , it is continuous at every point (x, y) ∈ R2 .

25. Solution to PS 4 231

(c) Observe that

hy(h2 −y2 )

(h2 +y2 )

−0

D1 f (0, y) = lim

h→0 h

y(h2 − y2 )

= lim = −y

h→0 (h2 + y2 )

and

xh(x2 −h2 )

(x2 +h2 )

−0

D2 f (x, 0) = lim

h→0 h

x(x2 − h2 )

= lim 2 = x.

h→0 (x + h2 )

**(d) Since f (x, y) is a rational function with non-zero denominator for (x, y) ̸= (0, 0), the second
**

order cross partial derivatives D12 f (x, y) and D21 f (x, y) exist at every point in R2 and are

continuous everywhere in R2 except at (0, 0).

(e) Given D2 f (x, 0) = x we get D21 f (0, 0) = +1 and from D1 f (0, y) = −y we get D12 f (0, 0) =

−1.

Chapter 26

Solution to PS 5

**(1) Let f (x) and g (x) be two concave functions and let h (x) = f (x) + g (x). Concavity of f and g
**

imply, ∀x, y ∈ D, ∀λ ∈ [0, 1]

[ ]

λ f (x) + (1 − λ) f (y) 6 f λx + (1 − λ) y

[ ]

λg (x) + (1 − λ) g (y) 6 g λx + (1 − λ) y .

**Adding these two inequalities, we get
**

[ ] [ ]

λ f (x) + (1 − λ) f (y) + λg (x) + (1 − λ) g (y) 6

f λx + (1 − λ) y + g λx + (1 − λ) y

( ) ( ) [ ] [ ]

λ f (x) + g (x) + (1 − λ) f (y) + g (y) 6 f λx + (1 − λ) y + g λx + (1 − λ) y

[ ]

λh (x) + (1 − λ) h (y) 6 h λx + (1 − λ) y .

This proves that h (x) = f (x) + g (x) is concave.

**(2) (a) False. Consider A, B ⊆ R, A = [0, 2] , B = [4, 6] .A ∪ B = [0, 2] ∪ [4, 6]. Then 1 ∈ A ∪ B, 5 ∈
**

A ∪ B, but 12 · 1 + 12 · 5 = 3 ∈

/ A ∪ B.

(b) True. If A and B are convex sets, then A ∩ B is convex. Let x ∈ A ∩ B, y ∈ A ∩ B. Then,

(26.1) λx + (1 − λ) y ∈ A as x, y ∈ A

(26.2) λx + (1 − λ) y ∈ B as x, y ∈ B

(26.3) ⇒ λx + (1 − λ) y ∈ A ∩ B.

Hence A ∩ B is convex.

**(3) (a) Recall a monotone function of one variable is quasi-concave. Since f (x) = 3x + 4 is mono-
**

tone increasing, it is quasi-concave.

233

234 26. Solution to PS 5

(b) The bordered Hessian is

0 y exp (x) exp (x)

B (x, y) = y exp (x) y exp (x) exp (x)

exp (x) exp (x) 0

[ ]

( ) 0 y exp (x)

det B1 (x, y) = det

y exp (x) y exp (x)

= −y2 exp (2x) < 0;

0 y exp (x) exp (x)

det B2 (x, y) = det y exp (x) y exp (x) exp (x)

exp (x) exp (x) 0

= y exp (3x) > 0.

Recall the sufficient condition for f to be quasiconcave is that

( )

(−1)r det Br (x) > 0, ∀r = 1, 2, · · · , n; ∀x ∈ D.

**This holds true for the function. Hence it is quasiconcave.
**

(c)

0 −2xy3 −3x2 y2

B (x, y) = −2xy3 −2y3 −6xy2

−3x2 y2 −6xy2 −6x2 y

[ ]

( ) 0 −2xy3

det B1 (x, y) = det

−2xy3 −2y3

(26.4) = −4x2 y6 6 0

( ) 0 −2xy 3 −3x2 y2

det B2 (x, y) = det −2xy3 −2y3 −6xy2

−3x2 y2 −6xy2 −6x2 y

(26.5) = −30x4 y7

( )

Note the sign of det B2 (x, y) is not positive. Hence it is not quasi-concave.

26. Solution to PS 5 235

f (x)

x

Figure 26.1. Function f (x), Problem 4

g(x)

x

Figure 26.2. Function g(x), Problem 4

(4) Let

0 for x 6 0

x for 0 6 x 6 12

(26.6) f (x) =

1 − x for 21 6 x 6 1

0 for x > 1

0 for x 6 1

x − 1 for 1 6 x 6 32

(26.7) g (x) = and

2 − x for 23 6 x 6 2

0 for x > 2

(26.8) h (x) = f (x) + g (x)

**In the figures, Fig. 1 and Fig. 2 functions are quasiconcave (each of them is first non-
**

decreasing, then non-increasing), whereas Fig. 3 function, which is the sum of the top and

middle functions, is not quasiconcave (it is not non-decreasing, is not non-increasing, and is

not non-decreasing then non-increasing.

236 26. Solution to PS 5

f (x) + g(x)

x

Figure 26.3. Function f (x) + g(x), Problem 4

(5) (i)

[ ]

∇ f (x, y, z) = 24x2 + 2y2 4xy −3z2

48x 4y 0

(26.9) H f (x, y) = 4y 4x 0

0 0 −6z

**Then f (x, y) is not concave as the principal minor D1 = 48x > 0. The bordered Hessian
**

is

0 24x2 + 2y2 4xy −3z2

24x2 + 2y2 48x 4y 0

B (x, y) =

4xy 4y 4x 0

−3z2 0 0 −6z

[ ]

( ) 0 24x2 + 2y2

det B1 (x, y) = det 2 2

24x + 2y 48x

(26.10) = −576x4 − 96x2 y2 − 4y4 6 0

0 24x 2 + 2y2 4xy

( )

det B2 (x, y) = det 24x2 + 2y2 48x 4y

4xy 4y 4x

(26.11) = −2304x5 − 384x3 y2 + 48xy4

which could take both positive or negative values. Hence f (x, y, z) is neither quasiconcave

nor quasiconvex.

(ii)

[ ]

∇g (x, y) = 1 − exp (x) − exp (x + y) 1 − exp (x + y)

[ ]

− exp (x) − exp (x + y) − exp (x + y)

(26.12) Hg (x, y) = .

− exp (x + y) − exp (x + y)

26. Solution to PS 5 237

**Then the leading principal minors are
**

(26.13) D1 = − exp (x) − exp (x + y) < 0, D2 = ex ex+y > 0

implies that g (x, y) is concave. Hence it is also quasi-concave.

Chapter 27

Solution to PS 6

**(1) We can write the set of linear equations in the form
**

[ ]

x

[ ]

1 3 1 −2 y 1

(27.1) · =

2 6 −2 −4

z

3

w

**(a) The rank of matrix A can be at most 2. This means that there can be at most two endogenous
**

variables. Also the second column of matrix A is a multiple (three times) of the first column

and the fourth column is a multiple of column one (−2 times). The remaining two columns

are one and three. The sub-matrix consisting of columns one and three has full rank as it’s

determinant is −4. So we can choose x and z as endogenous variables and the remaining

two y and w as exogenous variables.

(b) The system of linear equations can be rewritten as under (with the exogenous and endoge-

nous variables choice made above.

[ ] { } [ ]

1 1 x 1 − 3y + 2w

(27.2) · =

2 −2 z 3 − 6y + 4w

Multiply the first equation by two and add it to the second to get,

(27.3) 4x = 5 − 12y + 8w

5 − 12y + 8w 5

(27.4) x= = − 3y + 2w

4 4

Substitute the value of x in the first equation to get

( )

5 1

z = 1 − 3y + 2w − − 3y + 2w = − .

4 4

239

240 27. Solution to PS 6

**(2) The system of linear equations is
**

x

−1 3 −1 1 0

y

(27.5) 4 −1 1 1 · = 3

z

7 1 1 3

w

6

**The rank of matrix A can be at most 3. However, we observe that the third row is equal to the
**

sum of twice the second row and the first row. This means that the rank of matrix A cannot be

three. The sub matrix obtained by eliminating the third row of A (call it matrix B) is

[ ]

−1 3 −1 1

(27.6)

4 −1 1 1

The determinant of the sub matrix of B obtained by eliminating the third and fourth column is

−11 which is non-zero. This sub-matrix has full rank. So we can choose x and y as endogenous

variables and the remaining two z and w as exogenous variables.

We can solve the set of equations to obtain

[ ] { } [ ]

−1 3 x z−w

(27.7) · =

4 −1 y 3−z−w

Solving the two equations we get

9 − 2z − 4w

x= ,

11

and

3 + 3z − 5w

y= .

11

**(3) Observe that we can write the equation as
**

F(x, y) = x2 − xy3 + y5 − 17 = 0,

which is a continuous function being polynomial. Also,

D2 F(x, y) = −3xy2 + 5y4 = −3(5)(4) + 5(2)4 = 20 ̸= 0.

Hence, by Implicit Function Theorem, there exist a function y = f (x) in terms of x, which is

continuously differentiable, in the neighborhood of (x, y) = (5, 2). Further,

( ) ( )

′ D1 F(x, y) 2x − y3 2 1

f (x)x=5 = − =− =− =− .

D2 F(x, y) (x,y)=(5,2) −3xy + 5y

2 4 20 10

(x,y)=(5,2)

Then

( )

′ 1 199

y = f (4.9) = f (5) + (5 − 4.9) · f (x)x=5 = 2 + (0.1) · − = .

10 200

27. Solution to PS 6 241

**(4) Consider the function f (x, y, z) = x2 − y2 + z3 .
**

(a) Check that z = 3 satisfies the equation f (x, y, z) = 0 for x = 6 and y = 3.

(b) Observe that

D3 f (x, y, z) = 3z2 = 3(−3)2 = 27 ̸= 0.

**In addition, the function f is a continuous function being polynomial. Hence, by Implicit
**

Function Theorem (IFT), there exist a function z = h(x, y) in terms of x and y, which is

continuously differentiable, in the neighborhood of (x, y) = (6, 3).

(c) By IFT, we have

( ) ( ) ( )

dz D1 f (x, y, z) 2x 2(6) 4

=− =− =− =− ,

dx (6,3,−3) D3 f (x, y, z) (6,3,−3) 3z2 (6,3,−3) 3(9) 9

and

( ) ( ) ( )

dz D2 f (x, y, z) −2x −2(3) 2

=− =− =− = .

dy (6,3) D3 f (x, y, z) (6,3,−3) 3z2 (6,3,−3) 3(9) 9

(d) If x increases to 6.1 and y decreases to 2.8, then

( ) ( )

dz dz

z = g(6, 3) + (6.1 − 6) + (2.8 − 3)

dx (6,3) dy (6,3)

( ) ( )

4 2 135

= −3 + − · (0.1) + · (−0.2) = − .

9 9 45

**(5) Consider the profit maximizing firm described in the Example 12.2. If p increases by ∆p and
**

w increases by ∆w, what will be the change in the optimal input amount x?

Note the first order condition for the profit maximization is

p f ′ (x) − w = 0.

**It can be written as a function of p, w and x, as F(p, w, x) = p f ′ (x)−w = 0. Then D3 F(p, w, x) =
**

p f ′′ (x) < 0 since f (x) is strictly concave. Also, F(p, w, x) is a continuously differentiable func-

tion. Hence we can apply IFT to claim that there exists a function x = f (p, w) which is contin-

uously differentiable, in the neighborhood of (p, w, x∗ ), where x∗ is the profit maximizing input

242 27. Solution to PS 6

quantity. Then

) ( ( )

∗dx dx

x=x + · ∆p + · ∆w

dp dw

( ) ( )

∗ D1 F(p, w, x) D2 F(p, w, x)

=x − · ∆p − · ∆w

D3 F(p, w, x) D3 F(p, w, x)

( ′ ) ( )

f (x) −1

= x∗ − ′′

· ∆p − · ∆w

p f (x) p f ′′ (x)

( ′ ) ( )

∗ f (x) 1

=x − · ∆p + · ∆w.

p f ′′ (x) p f ′′ (x)

**(6) Consider 3x2 yz + xyz2 = 96 as defining x as an implicit function of y and z around the point
**

x = 2, y = 3, z = 2.

(a) Let F(x, y, z) = 3x2 yz + xyz2 − 96 = 0. Then

D1 F(x, y, z) = 6xyz + yz2 = 6(2)(3)(2) + 3(4) = 84 ̸= 0,

and F(x, y, z) is a continuously differentiable function (being polynomial). Hence we can

apply IFT to claim that there exists a function x = f (y) in terms of y which is continuously

differentiable, in the neighborhood of (x, y, z) = (2, 3, 2). Also

( ) ( ) ( )

dx D2 F(x, y, z) 3x2 z + xz2

=− =−

dy (2,3,2) D1 F(x, y, z) 6xyz + yz2

( )

3x2 + xz 3(4) + 2(3) 3

=− =− =− ,

6xy + yz 6(2)(3) + 3(2) 7

Then ( ) ( )

dx 3 137

x = 2+ (3.1 − 3) = 2 + − (0.1) = .

dy (2,3,2) 7 70

(b) √

z2

√ √

− 3z ± 9 + 128

yz z z2 32 1 1 16

x= =− ± + =− ± + ,

2 6 36 yz 3 9 y

which implies that

√

1 1 16

x=− + +

3 9 y

in the neighborhood of (2, 3, 2).

(c)

( ) ( ) ( )

dx 1 16 1 16 8

= √ · − 2 = 7· − =− .

dy (2,3,2) 2 19 + 16 y 2· 3 9 21

y

27. Solution to PS 6 243

Then ( ) ( )

8 8 412

x = 2+ − (3.1 − 3) = 2 − − = .

21 210 210

(d) The second method involves more computations.

**(7) Let both x and y be positive. Then
**

( )

x+y

f (x + y) = f x·

x

x+y

= f (x)

x

x

f (x) = f (x + y) .

x+y

Similarly,

y

f (y) = f (x + y) .

x+y

So

x y

f (x) + f (y) = f (x + y) + f (x + y)

x+y x+y

x+y

= f (x + y)

x+y

= f (x + y) .

If x is zero, then

f (2x) = 2 f (x) = 2 f (0) and

f (2x) = f (0) ⇒ 2 f (0) = f (0)

f (0) = 0.

Then

f (x + y) = f (0 + y) = f (y) = 0 + f (y) = f (0) + f (y) = f (x) + f (y) .

Same arguments hold if both x and y are zero.

Another method of proof is as follows: Let x > 0 and y > 0. Then x = yt for some t > 0.

f (x + y) = f (yt + y) = f [(1 + t)y] = (1 + t) f (y)

= f (y) + t f (y) = f (y) + f (ty) = f (y) + f (x)

The remaining cases of x = 0 or y = 0 are considered as in the earlier proof.

**(8) First observe that if x is the null vector, then
**

f (2x) = 2 f (x) = 2 f (0) and

f (2x) = f (0) ⇒ 2 f (0) = f (0)

f (0) = 0.

244 27. Solution to PS 6

( )

Take x and x′ such that f (x) = y > 0 and f x′ = y′ > 0. Then

f (x) = y

1

f (x) = 1

y

( )

x

f = 1.

y

Similarly

( ′)

x

f = 1.

y′

Take λ ∈ (0, 1) and define

λy

θ= .

λy + (1 − λ) y′

Then

(1 − λ) y′

1−θ =

λy + (1 − λ) y′

and θ ∈ (0, 1) . Function f is quasi-concave. So

( ( ) ( ′ )) { ( ) ( )}

x x x x′

f θ + (1 − θ) ′ > min f ,f

y y y y′

( ( ) ( ′ ))

λy x (1 − λ) y′ x

f ′

+ > min {1, 1}

λy + (1 − λ) y y λy + (1 − λ) y y′

′

( )

λx (1 − λ) x′

f + >1

λy + (1 − λ) y′ λy + (1 − λ) y′

( )

λx + (1 − λ) x′

f >1

λy + (1 − λ) y′

1 ( ′

)

f λx + (1 − λ) x >1

λy + (1 − λ) y′

( )

f λx + (1 − λ) x′ > λy + (1 − λ) y′

( )

= λ f (x) + (1 − λ) f x′

( )

it is concave. If f x′ is zero, since f is non-decreasing,

( )

f λx + (1 − λ) x′ > f (λx)

= f (λx) + 0

( )

= λ f (x) + (1 − λ) f x′ .

27. Solution to PS 6 245

( )

If both f (x) and f x′ are zero, then

( ) { ( )}

f λx + (1 − λ) x′ > min f (x) , f x′

( )

= 0 = λ f (x) + (1 − λ) f x′ .

**(9) Since function f is homogeneous of degree m and is twice continuously differentiable, each of
**

the partial derivatives are homogeneous of degree m − 1.

Further, the partial derivatives are also continuously differentiable and are homogeneous

of degree m − 2 > 0.

Applying Euler’s theorem for second order partial derivatives of the partial derivative

D1 f (x), we get,

**x1 D11 f (x) + x2 D12 f (x) + · · · + xn D1n f (x) = (m − 1)D1 f (x).
**

In general, applying Euler’s theorem to the second order partial derivatives of the partial

derivative Di f (x), we get,

**x1 Di1 f (x) + x2 Di2 f (x) + · · · + xn Din f (x) = (m − 1)Di f (x).
**

for i = 1, · · · , n.

We can write these n equalities in matrix notation as

x1 D11 f (x) + x2 D12 f (x) + · · · + xn D1n f (x) (m − 1)D1 f (x)

··· ···

x1 Di1 f (x) + x2 Di2 f (x) + · · · + xn Din f (x) = (m − 1)Di f (x)

··· ···

x1 Dn1 f (x) + x2 Dn2 f (x) + · · · + xn Dnn f (x) (m − 1)Dn f (x)

This is equivalent to

D11 f (x) D12 f (x) · · · D1n f (x)

x1

D1 f (x)

··· ··· ··· ··· ···

···

Di1 f (x) Di2 f (x) · · · Din f (x) · xi = (m − 1) Di f (x)

··· ··· ··· ···

···

···

Dn1 f (x) Dn2 f (x) · · · Dnn f (x) xn Dn f (x)

The n × n square matrix on the left hand side ios the Hessian matrix H f (x) for the function f .

Thus the LHS is H f (x) · x. Pre-multiplying both sides by the row vector x′ , we get

[ ]

x′ H f (x) · x = (m − 1) x1 D1 f (x) + · · · + xn Dn f (x) .

Applying Euler’s theorem to the sum on the RHS, we get

x′ H f (x) · x = (m − 1)[m f (x)] = m(m − 1) f (x).

246 27. Solution to PS 6

Chapter 28

Solution to PS 7

(1)

[ ]

∇g (x, y) = 3x2 − 3 3y2 − 2

[ ]

6x 0

(28.1) Hg (x, y) =

0 6y

Then

(28.2) D1 = 6x > 0, D2 = 36xy > 0

**implies that g (x, y) is convex. Using
**

[ ] [ ]

(28.3) ∇g (x, y) = 3x2 − 3 3y2 − 2 = 0 0

√

2

(28.4) x∗ ∗

= 1, y =

3

**is the unique solution.
**

( √ ) the theorem on convexity and global minima, g (x, y) attains

Using

global minimum at 1, 23 . Consider the function g defined for all x > 0, y > 0 by

( √ ) √

∗ 2 4 2

g 1, = −2 − = −3.09.

3 3 3

247

248 28. Solution to PS 7

7

6

5

4

3

2

1

−1 1 2 3

Figure 28.1. Graph of f (x) = x4 − 4x3 + 4x2 + 4

**(2) We know that f ′ (x) = 0 is a necessary condition for f to have a local maxima or minima. Find
**

all the local maxima and minima of

(28.5) f ′ (x) = 4x3 − 12x2 + 8x = 0

( )

(28.6) 4x x2 − 3x + 2 = 0,

(28.7) x = 0, x = 1, x = 2

If we plot the graph of this function, we can see that x = 0, and x = 2 are local minima and

x = 1 is local maxima. Also x = 0, and x = 2 are global minima and there is no global maxima.

**(3) The profit function for the monopolist is
**

π(Q1 , Q2 ) = Q1 (100 − 5Q1 ) + Q2 (50 − 10Q2 ) − (50 + 10Q1 + 10Q2 ).

The first order conditions for the profit maximization are

D1 π(Q1 , Q2 ) = 100 − 10Q1 − 10 = 0, or Q1 = 9,

D2 π(Q1 , Q2 ) = 50 − 20Q1 − 10 = 0, or Q2 = 2.

We need to check the second order conditions. Note

D11 π = −10, D22 π = −20, andD12 π = D21 π = 0,

which gives the first order leading principal minor to be −10 and the second order leading

principal minor to be 200. So the Hessian is negative definite for all outputs in the positive

orthant. Therefore, the function π is concave function. Then Q1 = 9 and Q2 = 2 is a profit

28. Solution to PS 7 249

**maximizing supply plan for the firm. The maximum profit is π∗ = 9 × 55 + 2 × 30 − 50 − 110 =
**

395.

(4) (a) The profit for the firm, when it uses K and L units of capital and labor to produce output

Q = La K b , given the out and input prices (P,w,r) is

Π(K, L) = (P · Q − wL − rK).

The firm maximizes it’s profit by choosing K and L such that both the FOC and SOC are

satisfied.

The FOCs are as under.

dΠ ( )

= P · aLa−1 K b − w = 0

dL

P · aLa−1 K b = w;

dΠ ( )

= P · La bK b−1 − r = 0,

dK

P · L bK b−1 = r.

a

**The FOC with respect to L leads to the condition that the value of the marginal product
**

of labor is equal to the wage rate w. Similarly, the FOC with respect to K leads to the

condition that the value of the marginal product of capital is equal to the rental rate r.

(b) To solve for the optimal levels of L and K, we divide the first FOC by the second and get,

**P · MPL MPL dK P · aLa−1 K b w
**

= = = = ;

P · MPK MPK dL P · La bK b−1 r

aK w

= ;

bL r

wb

K= L.

ra

Observe that the ratio of the MPL and MPK is the MRTS (Marginal rate of technical substi-

tution, i.e., the rate at which one can substitute labor for capital along an iso-quant.) The

250 28. Solution to PS 7

**value of K can be substituted in any of the two FOC to get the expression for L.
**

P · aLa−1 K b = w;

( )b

a−1 wb

P · aL L = w;

ra

( )b

a+b−1 wb w

P·L = ;

ra a

( )1−b ( )b

a b

P· = L1−a−b

w r

( ) 1−a−b

1−b ( ) b

∗ a b 1−a−b 1

L = P 1−a−b .

w r

We compute the optimal value of K ∗ from the last equation as under:

wb

K∗ = L;

ra

( ) 1−b ( ) b

wb a 1−a−b b 1−a−b 1

= P 1−a−b ;

ra w r

( ) 1−a−b

1−b

−1 ( ) b

a b 1−a−b +1 1

= P 1−a−b

w r

( ) 1−a−b

a ( ) 1−a−b

1−a

a b 1

= P 1−a−b .

w r

(c) For SOC, we first write down the Hessian (the matrix of second order partial derivatives

using the FOCs.

[ ] [ ]

PFLL PFLK Pa(a − 1)La−2 K b PabLa−1 K b−1

H= = .

PFKL PFKK PabLa−1 K b−1 Pb(b − 1)La K b−2

For the SOC to be satisfied, the leading principal minor of order one needs to be negative

and the leading principal minor of order two needs to be positive. Thus, Pa(a−1)La−2 K b <

0, which implies that a − 1 < 0 or a < 1. The LPM of order two is the determinant of the

Hessian matrix.

[ ]

Pa(a − 1)La−2 K b PabLa−1 K b−1

det H = det

PabLa−1 K b−1 Pb(b − 1)La K b−2

= P2 ab(a − 1)(b − 1)L2a−2 K 2b−2 − (PabLa−1 K b−1 )2 ,

= P2 ab[(a − 1)(b − 1) − ab]L2a−2 K 2b−2 ;

= P2 ab[1 − a − b]L2a−2 K 2b−2 > 0,

28. Solution to PS 7 251

**which holds true if and only if 1 − a − b > 0. Note that this condition also implies that
**

b < 1.

Thus the production function is such that it displays diminishing marginal product in each

of the two inputs (a < 1 and b < 1) and also it displays diminishing returns to scale as the

production function is homogeneous of degree a + b < 1.

(d) We use the expression for L∗ derived earlier to find the partial derivatives.

( ) ( ) 1−a−b

1−b ( ) b

∂L∗ 1 a b 1−a−b

P 1−a−b −1 > 0,

1

=

∂P 1−a−b w r

( ) ( ) 1−a−bb

∂L∗ 1−b 1−b

− 1−a−b

1−b

−1 b 1

= − (a) 1−a−b (w) P 1−a−b < 0,

∂w 1−a−b r

( ) ( ) 1−a−b1−b

∂L∗ b a

(b) 1−a−b (r)− 1−a−b −1 P 1−a−b < 0.

b b 1

= −

∂r 1−a−b w

(e) The output is obtained by noting that the profit maximizing inputs are K ∗ and L∗ .

Q∗ = (L∗ )a (K ∗ )b ,

a b

( ) 1−a−b 1−b ( ) b ( ) 1−a−b

a ( ) 1−a−b

1−a

a b 1−a−b a b

= P 1−a−b P 1−a−b ,

1 1

w r w r

( ) a(1−b)+ab ( ) ab+b(1−a)

a 1−a−b b 1−a−b a+b

= P 1−a−b ,

w r

( ) 1−a−b

a ( ) 1−a−b

b

a b a+b

= P 1−a−b ,

w r

[( ) ( ) ] 1−a−b

1

a a b b a+b

= P .

w r

**For computing the price elasticity of supply with respect to out put price, note that
**

[( ) ( ) ] 1−a−b

1

a b

a b

Q∗ = Pa+b ,

w r

a+b

= AP 1−a−b ,

[( ) ( ) ] 1−a−b

1

a b

a b

where A = w r is a constant independent of P. It is easy to see that the

elasticity will be εP = a+b

1−a−b . [Note that for Q = APb , εP = dQ

dP · QP = AbPb−1 QP = b.]

252 28. Solution to PS 7

Similarly, εw = − 1−a−b

a

and εr = − 1−a−b

b

. Thus,

a+b −a −b

εP + εw + εr = + + ,

1−a−b 1−a−b 1−a−b

a+b−a−b

=− = 0.

1−a−b

The economic interpretation is that if we change all the prices by same factor, then the

profit maximizing quantity does not change. In other words, the profit maximizing output

is homogeneous of degree zero in the prices (P,w,r).

(f) You may like to write down the expression for the profit function explicitly in terms of P,

w and r, on your own.

(5) (a) The profit for the firm, when it uses K, L and R units of capital, labor and natural resources

to produce output Q = ALa K b + ln R, given the output and input prices (P,w,v,r), is

Π(K, L) = P · Q − wL − rK − vR = P · ALa K b + P ln R − wL − rK − vR.

The firm maximizes it’s profit by choosing K, L and R such that both the FOC and SOC

are satisfied.

The FOCs are as under.

dΠ

= P · aLa−1 K b − w = PFL − w = 0,

dL

P · AaLa−1 K b = w;

dΠ

= P · ALa bK b−1 − r = PFK − r = 0,

dK

P · ALa bK b−1 = r,

dΠ P

= − v = PFR − v = 0,

dR R

P

= v.

R

The FOC with respect to L leads to the condition that the value of the marginal product of

labor is equal to the wage rate w̄. Similarly, the FOC with respect to K leads to the condition

that the value of the marginal product of capital is equal to the rental rate r. Lastly, the FOC

with respect to R leads to the condition that the value of the marginal product of natural

resource is equal to the price of the natural resource v.

Now take A = 3, a = b = 31 for remainder of the problem.

(b) With the given parameter values, the FOCs are, (note Aa=1=Ab)

P · L− 3 K 3 = w;

2 1

P · L 3 K − 3 = r,

1 2

P

= v.

R

28. Solution to PS 7 253

**For SOC, we first write down the Hessian (the matrix of second order partial derivatives
**

using the FOCs.

− 2 P · L− 3 K 3 − 23 − 23

5 1

3P·L

1

PFLL PFLK PFL R K 0

1 3 −2 −2

− 3 P · L 3 K− 3

1 5

H = PFKL PFKK PFK R = P · L 3 K 3

3

2

0 .

PFRL PFRK PFR R 0 0 − RP2

For the SOC to be satisfied, the leading principal minor of order one needs to be negative,

the leading principal minor of order two needs to be positive and the leading principal

minor of order three needs to be negative.

The LPM of order 1 is negative as, − 32 P · L− 3 K 3 < 0 (given that P > 0 and K > 0, L > 0).

5 1

**The LPM of order two is the determinant of the matrix obtained by removing the third row
**

and the third column.

[ ]

− 23 P · L− 3 K 3 31 P · L− 3 K − 3

5 1 2 2

det H2 = det 1 − 23 − 23

− 23 P · L 3 K − 3

1 5

3P·L K

4 1

= P2 · L− 3 K − 3 − P2 · L− 3 K − 3 ,

4 4 4 4

9 9

1

= P2 · L− 3 K − 3 > 0.

4 4

3

The LPM of order three is the determinant of the Hessian matrix. We compute the deter-

minant using the third row to get,

− 32 P · L− 3 K 3 13 P · L− 3 K − 3

5 1 2 2

0

det H = det 13 P · L− 3 K − 3 − 23 P · L 3 K − 3

2 2 1 5

0

0 0 − RP2

[ ]

P 4 2 −4 −4 1 2 −4 −4

=− 2 P ·L K − P ·L K

3 3 3 3 ,

R 9 9

1 P3 − 4 − 4

=− · L 3 K 3 < 0.

3 R2

Hence the SOC is satisfied.

(c) To solve for the optimal levels of L and K, we divide the first FOC by the second and get,

(note a = b = 13 )

P · MPL MPL dK P · aLa−1 K b w

= = = = ;

P · MPK MPK dL P · La bK b−1 r

aK w

= ;

bL r

w

K = L.

r

Observe that the ratio of the MPL and MPK is the MRTS (Marginal rate of technical substi-

tution, i.e., the rate at which one can substitute labor for capital along an iso-quant.) The

254 28. Solution to PS 7

**value of K can be substituted in any of the two FOC to get the expression for L.
**

P · L− 3 K 3

2 1

= w;

( ) 13

− 23 w

P·L L = w;

r

( ) 13

w 1

P· = wL 3 ;

r

( )1

1 3 1

P· = L3 ;

rw2

P3

L∗ =

.

rw2

Taking the derivative of L with respect to r we obtain

dL∗ P3

= − 2 2.

dr r w

**(d) (i) We first totally differentiate the FOCs to get,
**

2 1

dP · L− 3 K 3 − P · L− 3 K 3 dL + P · L− 3 K − 3 dK = dw

2 1 5 1 2 2

3 3

1 2

dP · L 3 K − 3 + P · L− 3 K − 3 dL − P · L 3 K − 3 dK = dr

1 2 2 2 1 5

3 3

dP P

− dR = dv.

R2 R3

We can write this in matrix form as under:

− 32 P · L− 3 K 3 13 P · L− 3 K − 3

5 1 2 2

0

dL

dw − dP · L − 23 31

K

A = 13 P · L− 3 K − 3 − 23 P · L 3 K − 3 0 , q = dK b = dr − dP · L 3 K − 3 .

2 2 1 5 1 2

0 0 − RP2 dR dv − dP

R2

Then Aq = b. Note that the matrix A is same as the Hessian. Solving for dL, when

dP = dw = dv = 0 and dr ̸= 0, using Cramer’s Rule, we get,

0 13 P · L− 3 K − 3

2 2

0

det dr − 23 P · L 3 K − 3

1 5

0

0 0 − RP2

dL =

− 23 P · L− 3 K 3 31 P · L− 3 K − 3

5 1 2 2

0

det 13 P · L− 3 K − 3 − 23 P · L 3 K − 3

2 2 1 5

0

0 0 − RP2

(− RP2 )(−dr) 13 P · L− 3 K − 3

2 2 2 2

dr · L 3 K 3

= =− < 0.

− 13 PR2 · L− 3 K − 3

3 4 4

P

28. Solution to PS 7 255

dL∗

2 2

L3 K 3

=− < 0.

dr P

dL∗

Thus, L∗ decreases as r increases. To see that we obtain identical expression for dr

as in the previous part, observe,

P3 P6 P4

K∗ = ; L ∗

· K ∗

= ; (L ∗

· K ∗ 32

) =

r2 w r3 w3 r2 w2

dL∗ (L∗ K ∗ ) 3

2

P3

=− = 2 2

dr P r w

dL ∗ 2

L3 K 3

2

P3

=− = − 2 2.

dr P r w

(ii) Solving for dL, when dP = dw = dr = 0 and dv ̸= 0, using Cramer’s Rule, we get,

0 13 P · L− 3 K − 3

2 2

0

det 0 − 23 P · L 3 K − 3

1 5

0

dv 0 − RP2

dL =

− 23 P · L− 3 K 3 13 P · L− 3 K − 3

5 1 2 2

0

det 13 P · L− 3 K − 3 − 23 P · L 3 K − 3

2 2 1 5

0

0 0 − RP2

(− RP2 )(0) 31 P · L− 3 K − 3

2 2

=

− 13 RP2 · L− 3 K − 3

3 4 4

0

=− 4 = 0.

− 13 PR2 · L− 3 K − 3

3 4

dL∗

= 0.

dv

Since L∗ does not depend on v, this conclusion is obvious.

Chapter 29

Solution to PS 8

**1. (a) The Constraint set C can be rewritten as
**

{ ( ) }

C = (x1 , . . . , x3 ) ∈ R3 : d 2 (0, 0, 0) , (x1 , x2 , x3 ) = 1 ,

therefore C is ( )

(i) bounded since C ⊂ B (0, 0, 0) , 2 : indeed, x ∈ C ⇒ d(x, 0) = 1 < 2 ⇒ x ∈

B(0, 2),

(ii) closed in R3 since it is defined as a level set in R3 of polynomial and therefore con-

tinuous function ∑3i=1 xi2 (use the characterization of closed set in terms of convergent

sequences),

(iii) non-empty since (1, 0, 0) ∈ C.

Since objective function ∑3i=1 ci xi is linear, and therefore continuous on R3 , Weierstrass

theorem is applicable and yields x̄ ∈ C such that ∑3i=1 ci xi ≤ ∑3i=1 ci x̄i for any (x1 , x2 , x3 ) ∈ C.

(b) The optimization problem can be rewritten as

max f (x)

(29.1) subject to g(x) = 0

and x ∈ R3

where

3 3

f (x) = ∑ ci xi and g(x) = ∑ xi2 − 1.

i=1 i=1

**Both functions f and g are polynomial and therefore continuously differentiable on an open
**

set R3 . Since x̄ is a point of global maximum of f subject to the constraint g(x) = 0, it is

also a local maximum of f subject to the constraint g(x) = 0. Since g(0) = −1 ̸= 0 we have

257

258 29. Solution to PS 8

x̄ ̸= 0. Now

∇g(x) = 2 (x1 , x2 , x3 )′ ̸= 0 for x ̸= 0,

and x̄ ̸= 0, hence constraint qualification ∇g(x̄) ̸= 0 holds. Therefore by Lagrange’s theorem

there exists λ ∈ R such that ∇ f (x̄) = λ∇g (x̄), or

(29.2) (c1 , c2 , c3 )′ = λ2 (x̄1 , x̄2 , x̄n )′

If we premultiply (29.2) by the row vector (x̄1 , x̄2 , x̄n ), we will get

3 3 ( )

(29.3) ∑ ci x̄i = 2λ ∑ x̄i2 = 2λ g(x̄) + 1 = 2λ (0 + 1) = 2λ

i=1 i=1

If we premultiply (29.2) by row vector (c1 , c2 , c3 ), equation (29.3) yields

( )2

3 3 3

(29.4) ∥c∥2 = ∑ c2i = 2λ ∑ ci x̄i = ∑ ci x̄i

i=1 i=1 i=1

**( ∑)i=1 ci x̄i ≥ 0. Indeed, since
**

3

To conclude that the result holds we only need to show that

|c |

(c1 , c2 , c3 ) ̸= (0, 0, 0), we have ci ̸= 0 for some i. Since g ei cii = 0 and x̄ solves (29.1 ),

by definition of the solution to the constrained maximization problem

( )

i |ci |

3

∑ iic x̄ = f (x̄) ≥ f e

ci

= |ci | > 0.

i=1

Now taking square roots in (29.4) yields the results.

q

(c) Let us define c = p, and consider x̂ = ∥q∥ . Then ∥x̂∥ = 1, hence g (x̂) = 0 and the definition

of the solution of the constrained maximization problem yields

3 3

1 3 1 3 pq

∥p∥ = ∥c∥ = ∑ ci x̄i = f (x̄) ≥ f (x̂) = ∑ ci x̂i = ∑ ci qi = ∑ pi qi = ∥q∥

i=1 i=1 ∥q∥ i=1 ∥q∥ i=1

q

Analogously, for x̌ = − ∥q∥

we have ∥x̌∥ = 1 hence g (x̌) = 0 and the definition of the solution

of the constrained maximization problem yields

3 3

1 3

∥p∥ = ∥c∥ = ∑ ci x̄i = f (x̄) ≥ f (x̌) = ∑ ci x̌i = − ∑ ci qi

i=1 i=1 ∥q∥ i=1

3

1 pq

= − ∑

∥q∥ i=1

pi qi = −

∥q∥

.

**Therefore, since ∥p∥ ∥q∥ > 0, we have
**

−∥p∥ ∥q∥ ≤ pq ≤ ∥p∥ ∥p∥ ⇔ |pq| ≤ ∥p∥ ∥q∥.

{ }

2. Necessity Route : Function f (x, y) = x2 −3xy is continuous and the constraint set (x, y) ∈ R2+ | x + 2y = 10

which we denote by G(x, y) is non-empty, (10, 0) is contained in it, closed

as
the set

√ is defined

by weak inequalities which are preserved in the limit and bounded as (x, y) 6 102 + 52 =

29. Solution to PS 8 259

√

125. So the constraint set is compact and non-empty and the objective function f is contin-

uous, hence Weierstrass theorem is applicable and a solution exists. The Lagrangian and the

FOCs are

(29.5) L (x, y, λ) = x2 − 3xy + λ (2y + x − 10)

∂L (x, y, λ)

(29.6) = 2x − 3y + λ = 0

∂x

∂L (x, y, λ) 3

(29.7) = −3x + 2λ = 0 → λ = x

∂y 2

∂L (x, λ)

(29.8) = 2y + x − 10 = 0.

∂λ

Now

3 7 7

2x − 3y + λ = 2x − 3y + x = 0 → x = 3y → y = x

2 2 6

7 10

2y + x − 10 = 0 → x + x − 10 = 0 → x = 10 → x = 3

3 3

7 7 9

y = ·3 = ,λ = .

6 2 2

We get an interior candidate for solution

( )

7 9

m1 = 3, , .

2 2

The constraint qualification

( ) [ ]

∇g x∗ , y∗ = 1 2 ̸= 0

**for all (x, y) ∈ R2+ . Verify that
**

( )

7 −45

f (10, 0) = 100, f (0, 5) = 0, f 3, = .

2 2

The solution then is (x∗ , y∗ ) = (10, 0). Note that we cannot use sufficiency route since f is not

concave.

**3. Necessity Route: A solution exists by arguments similar to the earlier problem. The La-
**

grangian and the FOCs are

1 2

(29.9) L (x, y, λ) = x 3 y 3 + λ (4 − 2x − y)

∂L (x, y, λ) 1 −2 2

(29.10) = x y − 2λ = 0

3 3

∂x 3

∂L (x, y, λ) 2 1 −1

(29.11) = x3 y 3 −λ = 0

∂y 3

∂L (x, λ)

(29.12) = 4 − 2x − y = 0.

∂λ

260 29. Solution to PS 8

Now

1 − 23 23

3x y 2λ y

= → = 2 → y = 4x

2 3 − 13

1

λ 2x

3x y

( ) 13

2 8 2 1

4 − 2x − y = 4 − 2x − 4x = 0 → x = , y = , λ =

3 3 3 4

We get an interior candidate for solution

( ) 13

2 8 2 1

m1 = , , .

3 3 3 4

**The constraint qualification
**

( ) [ ]

∇g x∗ , y∗ = −2 −1 ̸= 0

**for all (x, y) ∈ R2+ . Verify that
**

( ) ( ) 13 ( ) 23

2 8 2 8

f (2, 0) = 0 = f (0, 4) = 0, f , = > 0.

3 3 3 3

( )

The solution then is (x∗ , y∗ ) = 2 8

3, 3 .

Sufficiency route:

[ ]

∇ f (x, y) = 1 − 23 32 2 13 − 13

3x y 3x y

[ ]

− 29 x− 3 y 3 2 − 23 − 13

5 2

H f (x, y) = 9x y

.

2 − 23 − 31

− 92 x 3 y− 3

1 4

9x y

The determinant of Principal minors of order one,

2 5 2

− x− 3 y 3 6 0,

9

2 1 −4

− x3 y 3 6 0

9

and principal minor of order two

0>0

for ∀ (x, y) ∈ R2+ . Hence f is concave. The constraint is linear and so concave.

( ) λ > 0. L (x, y, λ)

is concave and FOC are sufficient for maximum. Therefore (x∗ , y∗ ) = 2 8

3, 3 which satisfies the

FOC is the solution.

4. Let f : R2 → R

√

max f (x, y) = xy

(29.13)

subject to x + y 6 6, x > 0, y > 0.

29. Solution to PS 8 261

**This problem has inequality constraint and so we will use Kuhn Tucker Sufficiency theorem.
**

We need to check all conditions of the Theorem are satisfied.

(i) Let

{ }

X = (x, y) ∈ R2++ .

Then X is open as its complement

{ }

X C = (x, y) ∈ R2 | x 6 0, y 6 0

is closed.

(ii) Function f (x, y) is continuous as x, and y are continuous, and f (·) is obtained by taking the

product of these two continuous functions. Let g1 (x, y) = 6 − x − y, 2 3

√g (x, y) = x, g (x, y) =

√

1 y 1 x

y are linear and hence continuous functions. Further fx (x, y) = 2 x , fy (x, y) = 2 y are

continuous functions. Hence f , g j ( j = 1, · · · , 3) are continuously differentiable on X.

(iii) The set X is convex as (x1 , y1 ), (x2 , y2 ) ∈ X,then

x1 > 0, x2 > 0 → λx1 + (1 − λ) x2 > 0∀λ ∈ (0, 1)

y1 >0, y2 > 0 → λy1 + (1 − λ) y2 > 0∀λ ∈ (0, 1)

( )

→ λx1 + (1 − λ) x2 , λy1 + (1 − λ) y2 ∈ X.

(iv) Function f (x, y) is concave as

[ √ √ ]

y

∇ f (x, y) = 1

2 x

1

2

x

y

√ √

y

− 1

4√ x3

1

4

1

xy

H f (x, y) = 1 1 √ .

4 xy − 41 x

y3

**The determinant of Principal minors of order one,
**

√

1 y

− 6 0,

4 x3

√

1 x

− 6 0

4 y3

and principal minor of order two

0>0

for ∀ (x, y) ∈ X. Hence f is concave. Further, g j ( j = 1, · · · , 3) are concave being linear

functions.

Hence for the following problem

√

max f (x, y) = xy

(x,y)∈X

subject to x + y 6 6, x > 0, y > 0.

262 29. Solution to PS 8

**all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈
**

X × R3+ , that satisfies the Kuhn-Tucker conditions.

m

(i) Di f (x∗ ) + ∑ λ∗j Di g j (x∗ ) = 0; i = 1, · · · , n,

j=1

(ii) g(x ) > 0 and λ∗ · g(x∗ ) = 0.

∗

They are

√

1 y

(29.14) − λ1 + λ2 = 0

2 x

√

1 x

(29.15) − λ1 + λ3 = 0

2 y

(29.16) 6 − x − y > 0, λ1 (6 − x − y) = 0

(29.17) x > 0, λ2 x = 0; y > 0, λ3 y = 0

√ √

If λ1 = 0, then 1

2

x

y − λ 1 + λ 3 = 0 → λ 3 = − 1

2 y < 0 which contradicts λ3 > 0. Hence

x

λ1 > 0 → 6 − x − y = 0

Since x > 0, y > 0, λ2 = 0, λ3 = 0,

√ √ √ √

1 y 1 x 1 x 1 y

− λ1 + λ2 = − λ1 + λ3 = 0 → = λ1 =

2 x 2 y 2 y 2 x

→ x = y → 6 − x − y = 0 → x = y = 3 > 0.

Note that all conditions are satisfied. Hence it is a global maximum on X. Observe that it is also

a global maximum on R2+ as

f (x, y) = 0 for (x, y) = R2+ \ X

and f (3, 3) > 0. Hence, (3, 3) solves the optimization problem.

5. Let f : R2 → R

max f (x, y) = x + ln (1 + y)

(29.18)

subject to x ≥ 0, y ≥ 0 and x + py ≤ m.

Again we will use Kuhn Tucker Sufficiency theorem. We need to check all conditions of the

Theorem are satisfied.

(i) Let { }

X = (x, y) ∈ R2 | x > −1, y > −1 .

Then X is open as its complement

{ }

X C = (x, y) ∈ R2 | x 6 −1, y 6 −1

is closed.

29. Solution to PS 8 263

**(ii) Function f (x, y) is continuous as x and ln (1 + y), for y > −1 are continuous, and f (·) is
**

sum of two continuous functions. Let g1 (x, y) = m − x − py, g2 (x, y) = x, g3 (x, y) = y are

1

linear and hence continuous functions. Further fx (x, y) = 1, fy (x, y) = 1+y are continuous

functions. Hence f , g j ( j = 1, · · · , 3) are continuously differentiable on X.

(iii) The set X is convex as (x1 , y1 ), (x2 , y2 ) ∈ X, then

x1 > −1, x2 > −1 → λx1 + (1 − λ) x2 > −1∀λ ∈ (0, 1)

y1 > −1, y2 > −1 → λy1 + (1 − λ) y2 > −1∀λ ∈ (0, 1)

( )

→ λx1 + (1 − λ) x2 , λy1 + (1 − λ) y2 ∈ X.

(iv) Function f (x, y) is concave as

[ ]

∇ f (x, y) = 1 1

1+y

[ ]

0 0

H f (x, y) = 0 − 1 .

(1+y)2

**The determinant of Principal minors of order one,
**

0 6 0,

1

− 6 0

(1 + y)2

and principal minor of order two

0>0

for ∀ (x, y) ∈ X. Hence f is concave. g j ( j = 1, · · · , 3) are concave being linear functions.

Hence for the following problem

max f (x, y) = x + ln (1 + y)

(x,y)∈X

subject to x + py 6 m, x > 0, y > 0.

all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈

X × R3+ , that satisfies the Kuhn-Tucker conditions.

m

(i) Di f (x∗ ) + ∑ λ∗j Di g j (x∗ ) = 0; i = 1, · · · , n, and

j=1

(ii) g(x ) > 0 and λ∗ · g(x∗ ) = 0.

∗

They are

(29.19) 1 − λ1 + λ2 = 0

1

(29.20) − pλ1 + λ3 = 0

1+y

(29.21) m − x − py > 0, λ1 (m − x − py) = 0

(29.22) x > 0, λ2 x = 0; y > 0, λ3 y = 0.

264 29. Solution to PS 8

If λ1 = 0, then 1 − λ1 + λ2 = 0 → λ2 = −1 < 0 which contradicts λ2 > 0. Hence

λ1 > 0 → m − x − py = 0

**and x = y = 0 is ruled out because m > 0. There are three remaining cases.
**

(i) x > 0, y = 0. Note λ2 = 0, x = m,

1 = λ1

1 − p + λ3 = 0

λ3 = p − 1.

If p − 1 > 0, then λ3 > 0. So solution is (m, 0, 1, 0, p − 1) if p > 1.

(ii) x = 0, y > 0. Note λ3 = 0, y = mp ,

1 1

( )= == λ1

p 1 + mp p+m

1 − λ1 + λ2 = 0

1

1− + λ2 = 0

p+m

1

λ2 = − 1.

p+m

( )

1

If p+m −1 > 0 → 1 > p+m, then λ2 > 0. So solution is 0, mp , p+m

1 1

, p+m − 1, 0 if p+m 6

1.

(iii) x > 0, y > 0. Note λ2 = 0, λ3 = 0,

1 1

(29.23) 1 = λ1 ,

= p → y = −1 > 0

1+y p

(29.24) m − x − py = 0 → x = m − 1 + p > 0

( )

Hence for 1 > p > 1 − m, the solution is m − 1 + p, 1p − 1, 1, 0, 0 . Combining them the

( )

solution x∗ , y∗ , λ∗1 , λ∗2 , λ∗3 is

(m, 0, 1, 0, p −)1) if p > 1

( m 1 1

0, p , p+m , p+m − 1, 0 if p 6 1 − m and

( )

m − 1 + p, 1p − 1, 1, 0, 0 if 1 − m < p < 1.

**The Kuhn Tucker Sufficiency Theorem asserts that this solution is a global maximum and
**

therefore solves both the problem.

29. Solution to PS 8 265

**6. Recall the two optimization problems are
**

max f (x)

(29.25) subject to g(x) ≥ 0

and x∈X

**and the corresponding optimization problem:
**

}

max f (x)

(29.26)

subject to x ∈ X

**in which the constraint g(x) ≥ 0 has been omitted.
**

(a) We claim that x̄ is also a solution to problem (29.25). For, if this is not the case, then since

x̄ is in the constraint set {x ∈ X : g(x) ≥ 0} of problem (29.25), there is some x′ ∈ X, with

g(x′ ) ≥ 0, such that f (x′ ) > f (x̄). But, since x′ ∈ X and is therefore in the constrain set

of problem (29.26), this means that x̄ is not a solution to problem (29.26), a contradiction.

This establishes our claim. [Note that we are not given the information that problem (29.25)

has a solution, and so we do not make use of this information in the answer].

(b) Let x̂ be any solution to problem (29.25). Note that since both x̂ and x̄ are in X, the constraint

set of problem (29.26), and x̄ solves problem (29.26), we have

(29.27) f (x̄) ≥ f (x̂)

We claim that g(x̂) = 0. For if g(x̂) ̸= 0, we must have g(x̂) > 0, since x̂ is a solution to

problem (29.25), and must therefore be in the constraint set {x ∈ X : g(x) ≥ 0} of problem

(29.25).

Since x̄ is not a solution to problem (29.25), and x̄ ∈ X, it must be the case that g(x̄) < 0.

For if g(x̄) ≥ 0, then, given (29.27), x̄ would also solve problem (29.25).

Since g(x̄) < 0, continuity of g on the convex set X [using the intermediate value theorem]

implies that we can find λ ∈ (0, 1), such that:

(29.28) g(λx̂ + (1 − λ)x̄) = 0

Denote (λx̂+(1−λ)x̄) by z. Then z ∈ X and g(z) = 0 by (29.28), so z satisfies the constraints

of problem (29.25).

Since f is strictly quasi-concave on X, then we can use x̂ ̸= x̄ [recall that g(x̄) < 0 while

g(x̂) > 0], and λ ∈ (0, 1), to obtain:

f (z) = f (λx̂ + (1 − λ)x̄) > min{ f (x̂), f (x̄)} = f (x̂)

using (29.27). But this contradicts the fact x̂ solves (29.25), and establishes our claim.

**7. Suppose that a consumer has the utility function U(x, y) = xa yb and faces the budget constraint
**

px x + py y ≤ I.

(A) Utility Maximization

266 29. Solution to PS 8

**(a) What are the first order conditions for utility maximization?
**

Observe that the utility function makes sense only if a > 0 and b > 0. The Lagrangean

for the optimization problem is

L (x, y, λ) = U(x, y) + λ(I − px x − py y)

= xa yb + λ(I − px x − py y)

The first order conditions are,

∂L

= axa−1 yb − λpx = 0

∂x

∂L

= bxa yb−1 − λpy = 0

∂y

∂L

= I − px x − py y = 0

∂λ

(b) Solve for the consumer’s demands for goods x and y.

From the first two FOCs, we get

axa−1 yb = λpx

bxa yb−1 = λpy

Dividing the first equation by the second, we get

axa−1 yb λpx

=

bxa yb−1 λpy

ay px

=

bx py

b

py y = px x.

a

We use this in the third FOC to get,

px x + py y = I

b

px x + px x = I

a

a+b

px x = I

a

a a I

px x ∗ = I → x∗ =

a+b a + b px

This gives

b b I

py y∗ = I → y∗ =

a+b a + b py

29. Solution to PS 8 267

**(c) Solve for the value of λ. What is the economic interpretation of λ? When is λ an
**

increasing, decreasing or constant function of income?

We use the first FOC (with respect to x) to get,

axa−1 yb

axa−1 yb = λpx → λ∗ =

px

( )a−1 ( )b

a I b I

a a+b px a+b py

λ∗ =

px

( )a ( )b ( )a+b−1

a b I

= >0

px py a+b

**The Lagrange multiplier λ∗ is marginal utility of income, as we can see below.
**

L ∗ (x∗ , y∗ , λ∗ ) = U(x∗ , y∗ ) + λ∗ (I − p∗x − py y∗ )

= (x∗ )a (y∗ )b + λ∗ · (0)

Suppose the income increased by a dollar. Then the utility goes up by λ∗ .

Lastly the λ∗ is increasing with income if and only if a + b > 1.

(d) Show that the second order conditions hold?

Observe that the second order partial derivatives are,

∂2 L

= a(a − 1)xa−2 yb

∂x2

∂2 L

= abxa−1 yb−1

∂x∂y

∂2 L

= b(b − 1)xa yb−2

∂y2

∂2 L

= −px

∂x∂λ

∂2 L

= −py

∂y∂λ

**Using these, we get the bordered Hessian matrix as under:
**

2

∂ L ∂2 L ∂2 L

∂∂x

2 ∂x∂y ∂x∂λ a(a − 1)x a−2 yb abx a−1 yb−1 −p x

a yb−2 −p .

H = ∂2 L

2L ∂2 L

∂x∂y ∂y 2 ∂y∂λ = abx a−1 yb−1 b(b − 1)x y

∂2 L ∂2 L ∂2 L −px −py 0

∂x∂λ ∂y∂λ ∂λ2

**The border preserving leading principal minor of order 2 is the Hessian matrix itself.
**

For the second order condition to be satisfy, the determinant of the Hessian needs to be

268 29. Solution to PS 8

positive.

**det H = (−px )[(−py )abxa−1 yb−1 − (−px )b(b − 1)xa yb−2 ]
**

− (−py )[(−py )a(a − 1)xa−2 yb − (−px )abxa−1 yb−1 ]

= px [py abxa−1 yb−1 − px b(b − 1)xa yb−2 ] − py [py a(a − 1)xa−2 yb − px abxa−1 yb−1 ]

= 2px py abxa−1 yb−1 − p2x b(b − 1)xa yb−2 − p2y a(a − 1)xa−2 yb

[ ]

2abp p b(b − 1)p 2 a(a − 1)p 2

= (x∗ )a (y∗ )b

x y y

− x

−

xy y2 x2

2abpx py b(b − 1)px a(a − 1)py

2 2

= (x∗ )a (y∗ )b aI bI

− bI

− aI

(a+b)px (a+b)py ( (a+b)p y

)2 ( (a+b)p x

)2

( [ ]2 [ ]2 [ ]2 )

(a + b)p p b − 1 p p a − 1 p p

= (x∗ )a (y∗ )b 2

x y x y x y

− (a + b)2 − (a + b)2

I b I a I

[ ]2 ( )

(a + b)px py b−1 a−1

= (x∗ )a (y∗ )b 2− −

I b a

[ ]2 ( )

∗ a ∗ b (a + b)px py 1 1

= (x ) (y ) 2−1+ −1+

I b a

[ ]2 ( )

(a + b)px py 1 1

= (x∗ )a (y∗ )b + >0

I b a

dx

(e) Show that the implicit function theorem value of dI is identical to the value of taking

the partial derivative of x∗ with respect to I.

Using x∗ , we get

∂x∗ a 1

=

∂I a + b px

29. Solution to PS 8 269

**Using the implicit function theorem,
**

0 abxa−1 yb−1 −px

det 0 b(b − 1)xa yb−2 −py

dx∗ −1 −py 0 −1[(−py )abxa−1 yb−1 − (−px )b(b − 1)xa yb−2 ]

= =

dI det H det H

[py abx y − px b(b − 1)x y ]

a−1 b−1 a b−2

=

det H

[apy y − px (b − 1)x] xpx bxa yb−2 px

= bxa−1 yb−2 = bxa−1 yb−2 =

det H det H det H

bxa yb−2 px

= [ ] ( )

(a+b)px py 2 1

(x∗ )a (y∗ )b I b + 1

a

bpx

= [ ] ( )

(a+b)px py 2 a+b

(y∗ )2 I ab

px 1 a 1

=[ ]2 ( ) =( ) = .

y∗ (a+b)py a+b a+b

px a + b px

bI a p2x a

**Thus the two expressions are identical.
**

(f) A consumer’s indirect utility function is defined to be utility as a function of prices and

income. Use x∗ and y∗ to solve for the indirect utility function. Is it true that the partial

of the indirect utility function with respect to income equals λ?

The indirect utility function is

( )a ( )b

∗ ∗ ∗ ∗ a ∗ b aI bI

u = u(x , y ) = (x ) (y ) =

(a + b)px (a + b)py

( )a ( ) b

a b

= I a+b

(a + b)px (a + b)py

Then,

( )a ( )b

∂u∗ a b

= (a + b) I a+b−1

∂I (a + b)px (a + b)py

( )a ( )b ( )a+b−1

a b I

= = λ∗

px py a+b

(B) Expenditure Minimization:

Now consider the “dual ”of the utility maximization problem. The dual problem is to min-

imize expenditures, px x + py y, subject to reaching a given level of utility, u0 (the constraint

is therefore U0 − xa yb = 0).

270 29. Solution to PS 8

**(a) What are the first order conditions for expenditure minimization?
**

First, we write down the minimization problem as

min px x + py y

subject to xa yb ≥ u0 ,

which can be converted into a maximization exercise as under:

max −px x − py y

subject to xa yb ≥ u0 ,

The Lagrangean for the maximization problem is

L (x, y, λ) = −px x − py y + λ(xa yb − u0 )

**The first order conditions are,
**

∂L

= −px + λaxa−1 yb = 0

∂x

∂L

= −py + λbxa yb−1 = 0

∂y

∂L

= xa yb − u0 = 0

∂λ

(b) Use the first order conditions to solve for x∗ and y∗ (these are called the Hicksian or

compensated demand functions).

From the first two FOCs, we get

λaxa−1 yb = px

λbxa yb−1 = py

Dividing the first equation by the second, we get

λaxa−1 yb px

=

λbx y

a b−1 py

ay px

=

bx py

b px

y= x.

a py

We use this in the third FOC to get,

( )b ( ) b

a b a b p x a+b u0 ∗ a py a+b a+b1

x y = u0 ; x x = u0 ; x =( )b ; x = u0

a py b px b px

a py

( ) b ( ) a+b

a

∗ b px ∗ b px a py a+b 1

a+b

b px 1

y = x = u0 = u0a+b

a py a py b px a py

29. Solution to PS 8 271

**(c) Check the second order conditions.
**

It is easy to see that the bordered Hessian is same as in the case of utility maximization

exercise. Hence we conclude that the SOC holds in this case.

(d) Write the level of income, I, necessary to reach U0 as a function of U0 , prices, and

parameters. How does this expenditure function relate to the indirect utility function?

( ) a+b

b ( ) a+b

a

a py 1 b px 1

e(px , py , u0 ) = px x∗ + py y∗ = px a+b

u0 + py u0a+b

b px a py

( ) a+b

b ( ) a+b

a

a b

= (pax pby u0 ) a+b

1

+

b a

1

a+b

( )a ( ( ) a+b

b )b

( ) a+b

a

a b a b

I a+b

1

a b a+b

= (px py ) +

(a + b)px (a + b)py b a

[( 1

)a ( )b ] a+b ( ) a+b

b ( ) a+ba

a b a b I = I.

= +

a+b a+b b a

**This shows that the minimum expenditure required to attain utility equal to the indirect
**

utility function is same as the income I. Thus the two approaches are equivalent.

(e) To avoid confusion, let us call solution for utility maximization of good x as x∗ and

solution for good x in expenditure minimization as h∗ . Prove that

∂x∗ ∂h∗ ∂x∗

= − x∗ .

∂Px ∂Px ∂I

Interpret this answer.

1

Observe that we can rewrite h∗ as h∗ = θ(px )− a+b where θ ≡ ( ab py ) a+b u0a+b . This gives

b b

us

( ) ( ) ∗

∂h∗ b − a+b

b

−1 b h

=θ − (px ) = − .

∂px a+b a + b px

Also from the utility maximization, we get,

( )

∂x∗ aI −x∗

= − (px )−2 = .

∂px a+b px

and

∗

( )

∗ ∂x ∗ a

x =x (px )−1 .

∂I a+b

272 29. Solution to PS 8

Therefore,

( )( ∗ ) ( ∗)

∂x∗ ∗ ∂x

∗ −x∗ a x b x ∂h∗

+x = + =− = .

∂px ∂I px a+b px a + b px ∂px

**The change in x∗ due to change in own price px (Total effect) is the sum of the substi-
**

∂h∗ ∗

tution effect ( ∂p x

) and the income effect (−x∗ ∂x

∂I ).

**8. Suppose a consumer has the utility function U = a ln(x − x0 ) + b ln(y − y0 ) where a, b, x0 and y0
**

are positive parameters. Assume that the usual budget constraint applies.

(a) Solve for the consumer’s demand for good x.

Observe that the utility maximization exercise makes sense if consumption bundle (x0 , y0 )

is feasible. Let us denote x − x0 by x′ and y − y0 by y′ . Then the utility function can be

written as U(x′ , y′ ) = a ln(x′ ) + b ln(y′ ). The budget constraint px + qy = I can be written as

px′ +qy′ = I − px0 −qy0 = I ′ . The utility maximization exercise can therefore be formulated

as

max a ln(x′ ) + b ln(y′ )

subject to px′ + qy′ = I ′ .

The lagrangean for the optimization problem is

L (x′ , y′ , λ) = a ln(x′ ) + b ln(y′ ) + λ(I ′ − px x′ − py y′ )

The first order conditions are,

∂L a

′

= ′ − λpx = 0

∂x x

∂L b

′

= ′ − λpy = 0

∂y y

∂L

= I ′ − px x′ − py y′ = 0

∂λ

From the first two FOCs, we get

a b

′

= λpx ; ′ = λpy

x y

Dividing the first equation by the second, we get

ay′ px b

′

= ; py y′ = px x′ .

bx py a

29. Solution to PS 8 273

**We use this in the third FOC to get,
**

px x′ + py y′ = I ′

b

px x′ + px x′ = I ′

a

a+b

px x′ = I ′

a

a ′ a I′

px x′ = I → x′ =

a+b a + b px

This gives

b ′ b I′

py y′ = I → y′ =

a+b a + b py

We need to show that the second order conditions hold for the solution to yield a maximum.

Observe that the second order partial derivatives are,

∂2 L a ∂2 L ∂2 L b ∂2 L ∂2 L

= − ; = 0; = − ; = −p x ; = −py

∂x2 (x′ )2 ∂x∂y ∂y2 (y′ )2 ∂x∂λ ∂y∂λ

Using these, we get the bordered Hessian matrix as under:

2 a

∂ L ∂2 L ∂2 L

∂x ∂x∂y ∂x∂λ − (x′ )2 0 −px

∂2 L ∂2 L ∂2 L

2

H = ∂x∂y ∂y2 ∂y∂λ = 0 − (yb′ )2 −py .

∂2 L ∂2 L ∂2 L −px −py 0

∂x∂λ ∂y∂λ ∂λ2

The border preserving leading principal minor of order 2 is the Hessian matrix itself. For

the second order condition to be satisfy, the determinant of the Hessian needs to be positive.

[ ( )] [ ( )]

b a

det H = (−px ) −(−px ) − ′ 2 − (−py ) (−py ) − ′ 2

(y ) (x )

( ) ( )

bp2x ap2y

= + > 0.

(y′ )2 (x′ )2

Thus SOC holds and we have a maximum. The optimum consumption bundle is

a I − px x0 − py y0

x∗ = x′ + x0 = + x0

a+b px

a I − py y0 b

= + x0

a + b px a+b

b I − px x0 a

y∗ = + y0

a + b py a+b

(b) Find the elasticities of demand for good x with respect to income and prices.

It is easy to compute the price and income elasticity using the definitions. Please let me

know if you have any questions on this.

274 29. Solution to PS 8

**(c) Show that the utility function V = 45(x − x0 )3.5a (y − y0 )3.5b would have yielded the same
**

demand for good x.

If we take positive monotone transformation of the given utility by taking its natural log,

then we get a function which is similar to the utility function in (a).

lnV = ln 45 + 3.5a ln(x − x0 ) + 3.5b ln(y − y0 )

= ln 45 + 3.5(U)

This implies that the consumption bundle (x∗ , y∗ ) will maximize the utility function V also.

**9. The utility function is
**

U(x, y, z) = a ln(x) + b ln(y) + c ln(z)

where a > 0, b > 0 and c > 0 are such that a + b + c = 1. The budget constraint can be written

as

g1 (x, y, z) = I − px − qy − rz ≥ 0.

The rationing constraint is

g2 (x, y, z) = k − x ≥ 0.

(a) This problem has two inequality constraints and so we will use Kuhn Tucker Sufficiency

theorem.

(i) Let { }

X = (x, y, z) ∈ R2+++ .

Then X is open as its complement

{ }

X C = (x, y, z) ∈ R3 | x 6 0, y 6 0, z 6 0

is closed.

(ii) Function U (x, y, z) is continuous in x, y, and z (being sum of log functions). Let

g1 (x, y, z) = I − px−qy−rz, g2 (x, y, z) = k−x, g3 (x, y, z) = x, g4 (x, y, z) = y, g5 (x, y, z) =

z are linear and hence continuous functions.

It is possible to infer that f , g j ( j = 1, · · · , 5) are twice continuously differentiable on

X and the set X is convex.

(iii) Function U (x, y, z) is concave as

[ ]

∇ f (x, y, z) = a

x

b

y

c

z

− x2

a

0 0

0

H f (x, y, z) = 0 − yb2 .

0 0 − z2 c

**The determinant of leading principal minors of order one is − xa2 < 0; of leading prin-
**

cipal minor of order two is xab

2 y2 > 0; and of leading principal minor of order three

29. Solution to PS 8 275

is − x2abc y2 z2

< 0 for ∀ (x, y, z) ∈ X. Hence f is concave. Further, g j ( j = 1, · · · , 5) are

concave being linear functions.

Hence all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair

((x∗ , y∗ , z∗ ) , λ∗ ) ∈ X × R5+ , that satisfies the Kuhn-Tucker conditions.

5

(i) Di f (x∗ , y∗ , z∗ ) + ∑ λ∗j Di g j (x∗ , y∗ , z∗ ) = 0; i = 1, · · · , 3,

j=1

(ii) g (x , y , z ) > 0 and λ∗j · g j (x∗ , y∗ , z∗ ) = 0.

j ∗ ∗ ∗

They are

(29.29) a

x − λ1 p − λ2 + λ3 = 0

(29.30) b

y − λ1 q + λ4 = 0

(29.31) c

z − λ1 r + λ5 = 0

(29.32) I − px − qy − rz > 0, λ1 (I − px − qy − rz) = 0

(29.33) k − x > 0, λ2 (k − x) = 0

(29.34) x > 0, λ3 x = 0; y > 0, λ4 y = 0, z > 0, λ5 z = 0.

If λ1 = 0, then by − λ1 q + λ4 = 0 → λ4 = − by < 0 which contradicts λ4 > 0. Hence

λ1 > 0 → I − px − qy − rz = 0

Also, x > 0, y > 0, and z > 0 for the three FOC to hold with equality. Thus λ3 = 0 = λ4 = λ5 .

(i) If λ2 > 0 then x = k, and

b c

I − pk = qy + rz = + .

λ1 λ1

Thus λ1 = b+c

I−pk which leads to

b(I − pk) c(I − pk)

y= and z = .

q(b + c) q(b + c)

We need to verify λ2 > 0 which will hold if λ2 = ak − (b+c)p

I−pk > 0 or

a pk

> .

b + c I − pk

(ii) If λ2 = 0, then

aI bI cI I

x= ;y = ;z = ; λ1 =

p(a + b + c) q(a + b + c) r(a + b + c) (a + b + c)

satisfies the KT conditions (Please verify).

(b)

a pk

> .

b + c I − pk

276 29. Solution to PS 8

(c)

b(I−pk)

py (b+c) b

= c(I−pk)

= .

rz c

(b+c)

(d) No, it is more likely that one buys more rice and less butter.

- Math 120 Final Exam Review (3)
- IvyGlobal-SAT Math Review
- Reviewer Math
- Math Review 2 Algebra
- Math review
- GRE Math Review Geometry
- Percent Increase and Decrease
- DynaMath Reviewer for Basic Algebra
- April Math Assessment
- Math Review
- 97081410 Math Reviewer
- Grade 10 Math Review
- GRE_Math_Review_3_Geometry
- intro to differential equations
- CPTCollegeMathReview
- Math Review NAT_20151010
- Writing and Math Assessment
- Grade 9 Math Review
- LET General Math Reviewer
- Grade 8 | Math assessment

Skip carousel

- tmpCA93.tmp
- Modeling To Predict Cutting Force And Surface Roughness of Metal Matrix Composite Using ANN
- AHP technique a way to show preferences amongst alternatives
- tmp939A.tmp
- Diacetyl and acetyl propionyl testing on Suicide Bunny (Sucker Punch and Mother's Milk)
- UT Dallas Syllabus for math2333.501.08s taught by Paul Stanford (phs031000)
- tmp9363.tmp
- De Acc Aprltr 2015b
- tmp4D3F.tmp
- tmp65C0.tmp
- bls_2086_1981_v1.pdf
- Solving Fuzzy Matrix Games Defuzzificated by Trapezoidal Parabolic Fuzzy Numbers
- Tmp 3384
- Aerodynamics
- tmp9649.tmp
- tmp8BC6
- tmpED85.tmp
- UT Dallas Syllabus for math2333.001.10s taught by (axe031000)
- UT Dallas Syllabus for math2418.001.07s taught by Paul Stanford (phs031000)
- UT Dallas Syllabus for math2333.503.10s taught by Paul Stanford (phs031000)
- UT Dallas Syllabus for math2418.001.07f taught by Paul Stanford (phs031000)
- UT Dallas Syllabus for math6319.501 05f taught by Viswanath Ramakrishna (vish)
- tmpD29D.tmp
- tmpA453.tmp
- Tmp 3008
- UT Dallas Syllabus for math2418.001.08f taught by (xxx)
- Computer Oriented Numerical Methods
- tmp2059.tmp
- UT Dallas Syllabus for math2333.502.07s taught by Paul Stanford (phs031000)
- tmp51FA.tmp

Skip carousel

- Principal Component Analysis based Opinion Classification for Sentiment Analysis
- UT Dallas Syllabus for ee2300.001 06f taught by Lakshman Tamil (laxman)
- UT Dallas Syllabus for math2418.001.07s taught by Paul Stanford (phs031000)
- UT Dallas Syllabus for math2418.001 06s taught by Paul Stanford (phs031000)
- tmpB096
- Background Foreground Based Underwater Image Segmentation
- UT Dallas Syllabus for math2418.001.07f taught by Paul Stanford (phs031000)
- UT Dallas Syllabus for math2418.5u1.10u taught by ()
- UT Dallas Syllabus for math2418.001.08f taught by (xxx)
- UT Dallas Syllabus for math2418.001.10f taught by Paul Stanford (phs031000)
- Information Security using Cryptography and Image Processing
- UT Dallas Syllabus for math2418.501.07s taught by Paul Stanford (phs031000)
- tmpE89F.tmp
- UT Dallas Syllabus for ce2300.501.09s taught by Lakshman Tamil (laxman)
- tmp6601
- UT Dallas Syllabus for math2418.5u1.09u taught by Cyrus Malek (sirous)
- tmp59CC.tmp
- UT Dallas Syllabus for math2333.001.11f taught by William Scott (wms016100)
- UT Dallas Syllabus for math2418.001.08s taught by Paul Stanford (phs031000)
- UT Dallas Syllabus for ce2300.003.10s taught by Lakshman Tamil (laxman)
- Offline Signature Recognition and Verification using PCA and Neural Network Approach
- UT Dallas Syllabus for math2418.501 05s taught by Paul Stanford (phs031000)
- UT Dallas Syllabus for ee2300.001.08f taught by William Pervin (pervin)
- tmp808C.tmp
- UT Dallas Syllabus for eco5309.501 05f taught by Daniel Obrien (obri)
- UT Dallas Syllabus for ee2300.501 05s taught by Edward Esposito (exe010200)
- UT Dallas Syllabus for math2418.501 06s taught by Paul Stanford (phs031000)
- UT Dallas Syllabus for ee2300.001.09s taught by Naofal Al-dhahir (nxa028000)
- tmpF5D1
- Employee Attendance Monitoring using Video Streaming Face Recognition

Sign up to vote on this title

UsefulNot usefulClose Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Close Dialog## This title now requires a credit

Use one of your book credits to continue reading from where you left off, or restart the preview.

Loading