0 Up votes0 Down votes

389 views307 pagesThis file contains a compilation of the instruction materials (lecture notes, problem sets and solutions) which I used to teach a mathematics review course (in Summer 2014) to the incoming graduate students in the Department of Economics at Cornell University.

Aug 01, 2012

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

This file contains a compilation of the instruction materials (lecture notes, problem sets and solutions) which I used to teach a mathematics review course (in Summer 2014) to the incoming graduate students in the Department of Economics at Cornell University.

Attribution Non-Commercial (BY-NC)

389 views

This file contains a compilation of the instruction materials (lecture notes, problem sets and solutions) which I used to teach a mathematics review course (in Summer 2014) to the incoming graduate students in the Department of Economics at Cornell University.

Attribution Non-Commercial (BY-NC)

- Grade 10 Math Review
- Grade 9 Math Review
- MAJOR Reviewer
- Reviewer Math
- IvyGlobal-SAT Math Review
- SAT/ ACT/ Accuplacer Math Review
- CPTCollegeMathReview
- Grade 8 | Math assessment
- DynaMath Reviewer for Basic Algebra
- Math Review 2 Algebra
- Math Review
- April Math Assessment
- Math Review NAT_20151010
- Writing and Math Assessment
- Math review
- Math 120 Final Exam Review (3)
- Percent Increase and Decrease
- Let Review 2015
- LET General Math Reviewer
- 100796171-LET-Math-Final-Handout.pdf

You are on page 1of 307

U NIVERSITY, M ONTCLAIR , N EW J ERSEY, 07043

E-mail address: dubeyr@mail.montclair.edu

Contents

Preface v

§1.1. Overview vii

§1.2. Course Schedule viii

§1.3. Topics covered viii

§1.4. Textbook ix

§1.5. Mathematics Proficiency Test ix

§2.1. Introduction 1

§2.2. Statements 2

§2.3. Logical Connective 3

§2.4. Quantifiers 8

§2.5. Rules of Negation of statements with quantifiers 12

§2.6. Logical Equivalences 14

§2.7. Some Math symbols and Definitions 15

§3.1. Methods of Proof 17

§3.2. Trivial Proofs 18

§3.3. Vacuous Proofs 18

§3.4. Proof by Construction 19

iii

iv Contents

§3.6. Proof by Contradiction 22

§3.7. Proof by Induction 24

§3.8. Additional Notes on Proofs 28

§3.9. Decomposition or proof by cases 30

Chapter 4. Problem Set 1 35

Chapter 5. Set Theory, Sequence 37

§5.1. Set Theory 37

§5.2. Set Identities 42

§5.3. Functions 44

§5.4. Vector Space 45

§5.5. Sequences 50

§5.6. Sets in Rn 56

Chapter 6. Problem Set 2 63

Chapter 7. Linear Algebra 67

§7.1. Vectors 67

§7.2. Matrices 69

§7.3. Determinant of a matrix 74

§7.4. An application of matrix algebra 77

§7.5. System of Linear Equations 80

§7.6. Cramer’s Rule 84

§7.7. Principal Minors 86

§7.8. Quadratic Form 87

§7.9. Eigenvalue and Eigenvectors 88

§7.10. Eigenvalues of symmetric matrix 91

§7.11. Eigenvalues, Trace and Determinant of a Matrix 92

Chapter 8. Problem Set 3 95

Chapter 9. Single and Multivariable Calculus 99

§9.1. Functions 99

§9.2. Surjective and Injective Functions 99

§9.3. Composition of Functions 102

§9.4. Continuous Functions 103

Contents v

§9.6. An application of Extreme Values Theorem 107

§9.7. Differentiability 110

§9.8. Mean Value Theorem 115

§9.9. Monotone Functions 116

§9.10. Functions of Several Variables 118

§9.11. Composite Functions and the Chain Rule 122

Chapter 10. Problem Set 4 125

Chapter 11. Convex Analysis 127

§11.1. Concave, Convex Functions 127

§11.2. Quasi-concave Functions 135

Chapter 12. Problem Set 5 139

Chapter 13. Inverse and Implicit Function Theorems 141

§13.1. Inverse Function Theorem 141

§13.2. The Linear Implicit Function Theorem 142

§13.3. Implicit Function Theorem for R2 144

Chapter 14. Homogeneous and Homothetic Functions 147

§14.1. Homogeneous Functions 147

§14.2. Homothetic Functions 151

Chapter 15. Separating Hyperplane Theorem 153

§15.1. Separation by hyperplanes 153

§15.2. Separating Hyperplane Theorem 154

Chapter 16. Problem Set 6 157

Chapter 17. Unconstrained Optimization 159

§17.1. Optimization Problem 159

§17.2. Maxima / Minima for C2 functions of n variables 160

§17.3. Application: Ordinary Least Square Analysis 166

Chapter 18. Problem Set 7 171

Chapter 19. Optimization Theory: Equality Constraints 173

§19.1. Constrained Optimization 173

§19.2. Equality Constraint 175

vi Contents

§20.1. Inequality Constraint 187

§20.2. Global maximum and constrained local maximum 194

Chapter 21. Problem Set 8 201

Chapter 22. Envelope Theorem 205

§22.1. Envelope Theorem for Unconstrained Problems 205

§22.2. Meaning of the Lagrange multiplier 207

§22.3. Envelope Theorem for Constrained Optimization 208

Chapter 23. Elementary Concepts in Probability 209

§23.1. Discrete Probability Model 209

§23.2. Marginal and Conditional Distribution 213

§23.3. The Law of Iterated Expectation 216

§23.4. Continuous Random Variables 216

Chapter 24. Solution to PS 1 219

Chapter 25. Solution to PS 2 227

Chapter 26. Solution to PS 3 235

Chapter 27. Solution to PS 4 243

Chapter 28. Solution to PS 5 249

Chapter 29. Solution to PS 6 255

Chapter 30. Solution to PS 7 263

Chapter 31. Solution to PS 8 273

Preface

These notes have been prepared for the Math Review Class for Graduate students joining Ph. D.

program in the field of economics at Cornell University. While making these notes we have referred

to the material used in previous years’ classes.

The objective of Math Review class is to present elementary concepts from set theory, multi-

variable calculus, linear algebra, elementary probability concepts, real analysis and optimization

theory. I have used examples and problem sets to explain the concepts, definitions and techniques

which are useful in Fall semester graduate economics classes.

These notes could serve to refresh the memory for those incoming students who are familiar

with the material. To others, these notes could be a ready reckoner of math techniques they will

need to know in the first few weeks of the graduate classes (Econ 6090, Econ 6130, Econ 6190) in

Economics before they are discussed in Econ 6170 in more rigorous way.

The topics have been arranged so that the entire material can be covered in thirteen classes

of three hours duration each. Additional problem sets with solutions are provided on each day’s

material. Three additional sections of three hours each are sufficient to go over the questions in

problem sets. It is hoped that they will help the reader to better understand the material in lecture

notes.

Earlier versions have been used for the Math Review Classes during 2009-17. My sincere

thanks go to the participants for their comments and also for pointing out typos, errors.

Ram Sewak Dubey

vii

Chapter 1

Syllabus

Field of Economics

Cornell University

Office Hour: 12:15 -1:15pm E-mail: rsd28@cornell.edu

1.1. Overview

The Field of Economics offers the August Math Review Course for incoming first-year Ph.D.

students. The aim of this review is to refresh students’ mathematical skills and introduce concepts

that are critical to success in the first year economics core courses, i.e., Econ 6090, Econ 6130,

Econ 6170, and Econ 6190. The emphasis is on rigorous treatment of proof techniques, underlying

concepts and illustrative examples.

There is usually a great deal of variation in the mathematical background of incoming first-year

students. However, almost all students have something to gain from the review course. For those

who do not have an adequate mathematics background (by a US Ph.D. standard), the course offers

an opportunity to catch up on critical concepts and get a head start on the fall classes. For those

who took their core undergraduate courses in analysis and algebra some years ago, the course is a

good refresher. For those who do not have significant experience with technical courses taught in

English, the review offers an opportunity to pick up the math vocabulary that will be in use from

the first day of regular instruction.

The Math Review Course is funded by the Department of Economics. There is no charge for

students matriculating into the Economics Ph.D. Program. Students matriculating into other Ph.

D. programs should contact the Director of Graduate Studies in their Field. There will be a charge

ix

x 1. Syllabus

for these students, and the DGS in the student’s Field must make arrangements to pay that charge

before the student may attend the Math Review Course.

The Math Review Course is not linked to Econ 6170, Intermediate Mathematical Economics I.

There is no course grade, and no record will be kept of your performance. However, the Economics

Ph.D. program strongly encourages you to attend. Most students who have taken this course in past

years have found it useful, regardless of their prior mathematics training. Perhaps most importantly,

the review period is an excellent time to get acquainted with other incoming students, meet the

faculty and settle into Ithaca.

The course duration will be July 30- August 17. There will be a lecture session each working day.

The room for all the sessions is URIS 202.

(A) Session Time:

July 30-August 3, August 6-10, 13-17 Time: 9am-Noon.

(B) There will be a handout of some basic definitions distributed at each session, and practice

problems will be assigned on each topic. You are strongly encouraged to at least attempt every

problem, as this is the best way to understand the material. The problem sets will be due the

following day in class (for example, the problem set given in class on Monday will be due on

Tuesday) and I intend to grade some of the questions in each problem set. We will go over the

solutions to the problem sets in class.

A. Elements of Logic: Statements, Truth tables, Implications, Tautologies, Contradictions, Logical

Equivalence, Quantifiers, Negation of Quantified Statements

B. Proof Techniques: Trivial Proofs, Vacuous Proofs, Direct Proofs, Proof by Contrapositive,

Proof by Cases, Proof by Contradiction, Existence Proofs, Proof by Mathematical Induction

C. Set Theory: Definitions, Set Equality, Set Operations, Venn Diagrams, Set Identities, Cartesian

Products, Properties of the Set of Real Numbers

D. Sequences: Convergent Sequences, Subsequences, Cauchy Sequences, Upper and Lower Lim-

its, Algebraic Properties of Limits, Monotone Sequences

E. Functions of One Variable: Limits of Functions, Continuous Functions, Monotone Functions,

Properties of Exponential and Logarithmic Functions

F. Linear Algebra: System of Linear Equations, Solution by Substitution or Elimination of Vari-

ables, Systems with Many or No Solutions

G. Vectors I: Addition, Subtraction, Scalar Multiplication, Length, Distance, Inner Product

H. Matrix Algebra I: Addition, Subtraction, Scalar and Matrix Multiplication, Transpose, Laws of

Matrix Algebra

1.5. Mathematics Proficiency Test xi

Rule

J. Vectors II: Linear Independence, Rn as an example of Vector Space, Basis and Dimension in

Rn

K. Matrix Algebra II: Algebra of Square Matrices, Eigenvalues, Eigenvectors, Properties of Eigen-

values

L. Differential Calculus: Derivative of a Real Function, Mean Value Theorem, Continuity of

Derivatives, L’Hospital’s Rule, Higher Order Derivatives, Taylor’s Theorem

M. Functions of Several Variables: Graphs of Functions of Two Variables, Level Curves, Continu-

ous Functions, Total Derivative, Chain Rule, Partial Derivatives

N. Unconstrained Optimization: First Order Conditions, Global Maxima and Minima, Examples

O. Constrained Optimization with equality constraints: First Order Conditions, Constrained Min-

imization Problems, Examples

P. Constrained Optimization with inequality constraints: Kuhn-Tucker conditions, Interpreting

the Multipliers, Envelope Theorem

1.4. Textbook

There is no textbook for the math review course, however the following books may be helpful.

The textbook ? is used in the Microeconomics course sequence. ? and ? are useful textbooks for

Mathematical Economics. It will be useful to refer to ? for understanding the material. Copies of

this textbook are available in the libraries. ? will be our reference book for analysis. ? contains

many useful examples. ? is the set of Lecture Notes used in Econ 6170.

A Mathematics Proficiency Test will be given on Friday, August 17, 2017 from 12:30pm - 3:30

pm in URIS 202. The test will be based on the course material of Economics 6170. If you pass

this test, you have satisfied the mathematics proficiency requirement of the field of economics,

and need not take the Economics 6170 course. If you fail this test, or if you do not take this test,

you can complete the mathematics proficiency requirement of the field of economics by taking the

Economics 6170 course for credit, and getting a course grade of B- or better.

If you would like any more information, you can contact me at rsd28@cornell.edu. Enjoy

your summer and I look forward to meeting you in August.

Bibliography

Bridges, D., Ray, M., 1984. What is constructive mathematics. The Mathematical Intelligencer

6 (4), 32–38.

Dixit, A. K., 1990. Optimization in Economic Theory, 2nd Edition. Oxford University Press, USA.

Mas-Colell, A., Whinston, M., Green, J., 1995. Microeconomic Theory. Oxford University Press,

USA.

Mitra, T., 2013. Lectures on Mathematical Analysis for Economists. Campus Book Store.

Olsen, L., October 2004. A new proof of Darboux’s Theorem. The American Mathematical

Monthly 111, 173–175.

Royden, H. L., 1988. Real Analysis. Prentice Hall.

Simon, C. P., Blume, L., 1994. Mathematics for Economists. W. W. Norton & Co., New York.

Stricharz, R., 2000. The Way of Analysis. Jones and Bartlett.

Wainwright, E. K., Chiang, A., 2005. Fundamental Methods of Mathematical Economics, 4th Edi-

tion. McGraw Hill, New York.

xiii

Chapter 2

Introduction to Logic

2.1. Introduction

The theory that you’ll learn during the first year is built on a foundation borrowed from engineering

and pure mathematics. You will be required to both understand and reproduce certain key proofs,

particularly in microeconomics. On some problem sets and exams you’ll be asked to produce your

own proofs.

If you haven’t taken any pure math courses, you might be thinking “I don’t even know what a

proof is”. That is completely fine. There are plenty of very accomplished Ph.D. students at Cornell

who had no idea how to write a proof when they arrived. It’s important not to get discouraged

because it takes time to learn how to write good proofs. There is a standard bag of tricks that will

get you through almost any proof in the first year sequence, but it takes exposure and then practice

for you to learn and be comfortable with these tricks. Math majors are at an advantage here, more

than in most areas, but by the end of the year they’ll have forgotten the fancier proof techniques

and you’ll have learned the necessary ones, so the field will be surprisingly leveled.

A proof is a series of statements that demonstrates the truth of a proposition. In writing a proof

you make use of (i) the rules of logic and (ii) Definitions, theorems, and other propositions that

have already been proved, or that you are told you can take as given.

The rules of logic are obviously fixed and unchanging. The components of the second point,

however, will vary depending on the task at hand. The most important question to ask yourself

when attempting to prove a proposition is “What do I already know”? It will often be the case that

if you write down all of the relevant mathematical definitions, the theorems or results that you were

given or that you know you can take as given, and any result that you just proved in a previous

problem, a straightforward rearrangement of everything on the page will give you the proof that

you want.

1

2 2. Introduction to Logic

In this chapter we will discuss the principles of logic that are essential for problem solving in

mathematics. The ability to reason using the principles of logic is key to seek the truth which is

our goal in mathematics. Before we explore and study logic, let us start by spending some time

motivating this topic. Mathematicians reduce problems to the manipulation of symbols using a set

of rules. As an illustration, let us consider the following problem.

Example 2.1. Joe is 7 years older than John. Six years from now Joe will be twice John’s age.

How old are Joe and John?

Solution 2.1. To answer the above question, we reduce the problem using symbolic formulation.

We let John’s age be x. Then Joe’s age is x + 7. We are given that six years from now Joe will be

twice John’s age. In symbols, (x + 7) + 6 = 2(x + 6). Solving for x yields x = 1. Therefore, John

is 1 year old and Joe is 8.

Our objective is to reduce the process of mathematical reasoning, i.e., logic, to the manipulation

of symbols using a set of rules. The central concept of deductive logic is the concept of argument

form. An argument is a sequence of statements aimed at demonstrating the truth of an assertion (a

“claim”). Consider the following two arguments.

Argument 1. If x is a real number such that x < −3 or x > 3, then x2 > 9. Therefore, if x2 ≤ 9,

then x ≥ −3 and x ≤ 3.

Argument 2. If it is raining or I am sick, then I stay at home. Therefore, if I do not stay at home,

then it is not raining and I am not sick.

Although the content of the above two arguments is very different, their logical form is the

same. To illustrate the logical form of these arguments, we use letters of the alphabet (such as p, q

and r) to represent the component sentences and the expression “not p” to refer to the sentence “It

is not the case that p.”. Then the common logical form of both the arguments above is as follows:

If p or q, then r. Therefore, if not r, then not p and not q.

We start by identifying and giving names to the building blocks which make up an argument. In

Arguments 1 and 2, we identified the building blocks as follows:

Argument 1. If x is a real number such that x < −3 (p) or x> 3 (q), then x2 > 9 (r). Therefore,

if ≤ 9 (not r), then x ≥ −3 (not p) and x ≤ 3 (not q).

x2

Argument 2. If it is raining (p) or I am sick (q), then I stay at home (r). Therefore, if I do not

stay at home (not r), then it is not raining (not p) and I am not sick (not q).

2.2. Statements

The study of logic is concerned with the truth or falsity of statements.

Definition 2.1 (Statement). A statement is a sentence which can be classified as true or false

without ambiguity. The truth or falsity of the statement is known as the truth value.

2.3. Logical Connective 3

For a sentence to be a statement, it is not necessary for us to know whether it is true or false.

However, it must be clear that it is one or the other.

Example 2.2. Consider following examples.

(a) One plus two equals three. It is a statement which is true.

(b) One plus one equals three. It is also a statement which is not true.

(c) He is a university student. This sentence is neither true nor false. The truth or falsity depends

on the reference for the pronoun he. For some values of he the sentence is true; for others it is

false, and so it is not a statement.

(d) “Every continuous function is differentiable.” is a statement with truth value being false.

(e) “x < 1 ” is true for some values of x and false for some others. It is a statement if we have

some particular context in mind. Otherwise, it is not a statement.

(f) Goldbach’s Conjecture “Every even number greater than 2 is the sum of two prime numbers”

is a statement whose truth value is not known yet.

(g) “There are infinitely many prime numbers of the form 2n + 1, where n is a natural number.” is

another statement whose truth value is not known till now.

Every statement has a truth value, namely true (denoted by T) or false (denoted by F). We often use

p, q and r to denote statements, or perhaps p1 , p2 , · · · , pn if there are several statements involved.

Exercise 2.1. Which of the following sentences are statements?

(a) If x is a real number, then x2 ≥ 0.

(b) 11 is a prime number.

(c) This sentence is false.

The possible truth values of a statement are often given in a table, called a truth table. The truth

values for two statements p and q are given below. Since there are two possible truth values for

each of p and q, there are four possible combinations of truth values for p and q. It is customary to

consider the four combinations of truth values in the order of TT, TF, FT, FF from top to bottom.

p q

T T

(2.1) T F

F T

F F

A logical connective (also called a logical operator) is a symbol or word used to connect two or

more statements such that the compound statement produced has a truth value dependent on the

respective truth values of the original statements.

We discuss some of the elementary logical operators (connectives) first.

4 2. Introduction to Logic

Logical negation is an operation on one logical value, typically, the value of a proposition,

that produces a value of true if its operand is false and a value of false if its operand is true.

The truth table for ¬A (also written as NOT A or ∼ A) is as follows:

A ¬A

(2.2) T F

F T

For example, consider the statement,

p : The integer 2 is even.

Then the negation of p is the statement

∼ p : It is not the case that the integer 2 is even.

It would be better to write,

∼ p : The integer 2 is not even.

Or better yet to write,

∼ p : The integer 2 is odd.

(2) Logical Conjunction

Logical conjunction is an operation on the values of two propositions, that produces a value

of true if and only if both of its operands are true. The truth table for A ∧ B (also written as

A AND B) is as follows:

A B A∧B

T T T

(2.3) T F F

F T F

F F F

In words, if both A and B are true, then the conjunction A ∧ B is true. For all other assignments

of logical values to A and to B the conjunction A ∧ B is false.

For example, consider the statements

p : The integer 2 is even.

q : 4 is less than 3.

The conjunction of p and q, namely,

p ∧ q : The integer 2 is even and 4 is less than 3,

is a false statement since q is false (even though p is true).

2.3. Logical Connective 5

Logical disjunction is an operation on the values of two propositions, that produces a value

of false if and only if both of its operands are false. The truth table for A ∨ B (also written as

A OR B) is as follows:

A B A∨B

T T T

(2.4) T F T

F T T

F F F

Thus for the statements p and q described earlier, the disjunction of p and q, namely,

p ∨ q : The integer 2 is even or 4 is less than 3,

is a true statement since at least one of p and q is true (in this case, p is true).

(4) Logical Implication

Logical implication is associated with an operation on the values of two propositions, that

produces a value of false only in the case that the first operand is true and the second operand

is false. The truth table associated with A ⇒ B is as follows:

A B A⇒B

T T

T

(2.5) T F F

F T T

F F T

The last row of the table may appear to be counterintuitive. Note, however, that the use of “if

· · · then ” as a connective is quite different from that of day-to-day language.

Consider the following example.

Example 2.3. Suppose your supervisor makes you the following promise:

“If you meet the month-end deadline, then you will get a bonus.”

Under what circumstances are you justified in saying that your supervisor spoke falsely?

The answer is: You do meet the month-end deadline and you do not get a bonus. Your

supervisor’s promise only says that you will get a bonus if a certain condition (you meet the

month-end deadline) is met; it says nothing about what will happen if the condition is not met.

So if the condition is not met, your supervisor did not lie (your supervisor promised nothing if

you did not meet the month-end deadline); so your supervisor told the truth in this case. Are

you convinced? Good! If not, let us then check the truth and falsity of the implication based

on the various combinations of the truth values of the statements

p: You meet the month-end deadline;

q: You get a bonus.

The given statement can be written as p ⇒ q.

6 2. Introduction to Logic

Suppose first that p is true and q is true. That is, you meet the month-end deadline and you

do get a bonus. Did your supervisor tell the truth? Yes, indeed. So if p and q are both true,

then so too is p ⇒ q, which agrees with the first row of the truth table of (2.5).

Second, suppose that p is true and q is false. That is, you meet the month-end deadline

and you did not get a bonus. Then your supervisor did not do as he / she promised. What your

supervisor said was false, which agrees with the second row of the truth table of (2.5).

Third, suppose that p is false and q is true. That is, you did not meet the month-end

deadline and you did get a bonus. Your supervisor (who was most generous) did not lie (your

supervisor promised nothing if you did not meet the month-end deadline); so he/she told the

truth. This agrees with the third row of the truth table of (2.5).

Finally, suppose that p and q are both false. That is, you did not meet the month-end

deadline and you did not get a bonus. Your supervisor did not lie here either. Your supervisor

only promised you a bonus if you met the month-end deadline. So your supervisor told the

truth. This agrees with the fourth row of the truth table of (2.5).

In summary, the implication p ⇒ q is false only when p is true and q is false.

A conditional (or implication) statement that is true by virtue of the fact that its hypothesis

is false is said to be vacuously true or true by default. Thus the statement: “If you meet

the month-end deadline, then you will get a bonus” is vacuously true if you do not meet the

month-end deadline!

why this statement is assigned a truth value of T. But it is indeed true can be seen as follows.

4 + 1 − 4 = 9 − 4 = 5 so 1 = 5 and therefore 8 − 1 = 8 − 5 = 3.

Logical equality is an operation on the values of two propositions, that produces a value of

true if and only if both operands are false or both operands are true. The truth table for A ≡ B

is as follows:

A B A≡B

T T T

(2.6) T F F

F T F

F F T

So A ≡ B is true if A and B have the same truth value (both true or both false), and false if they

have different truth values.

always true regardless of the truth value of the simple statements from which it is constructed. It is

a contradiction if it is always false. Thus a tautology and a contradiction are negation of each other.

2.3. Logical Connective 7

A ¬A A ∨ (¬A) A ∧ (¬A)

(2.7) T F T F

F T T F

A B A ⇒ B A ∧ (A ⇒ B) [A ∧ (A ⇒ B)] ⇒ B

T T

T T T

(2.8) T F F F T

F T T F T

F F T F T

Definition 2.3.

(a) The converse of A ⇒ B is B ⇒ A.

(b) The inverse of A ⇒ B is ∼ A ⇒∼ B.

(c) The contrapositive of A ⇒ B is ∼ B ⇒∼ A.

Example 2.7. Write the converse, inverse and contrapositive of the statement in Example 2.3.

Recall that the given statement can be written as p ⇒ q where p and q are the statements:

p: You meet the month-end deadline;

q: You get a bonus.

(a) The converse of this implication is q ⇒ p: If you get a bonus, then you have met the month-end

deadline.

(b) The inverse of this implication is ∼ p ⇒∼ q: If you do not meet the month-end deadline, then

you will not get a bonus.

(c) The contrapositive of this implication is ∼ q ⇒∼ p: If you do not get a bonus, then you will

not have met the month-end deadline.

The following theorem is extremely useful.

Theorem 2.1. (A ⇒ B) ⇔ (∼ B ⇒∼ A).

A B A ⇒ B ∼ B ∼ A ∼ B ⇒∼ A

T T T F F T

(2.9) T F F T F F

F T T F T T

F F T T T T

The entries in third and sixth columns are identical.

8 2. Introduction to Logic

Remark 2.1. It is an exercise to see that A ⇒ B is not logically equivalent to its converse, B ⇒ A.

One should avoid the very common mistake of claiming the opposite.

Example 2.8. Consider following two statements,

(A) Cornell is in Ithaca.

(B) Cornell is in NY state.

and the compound statements:

(a) Implication : A ⇒ B : If Cornell is in Ithaca, then Cornell is in NY state.

(b) Contrapositive : ∼ B ⇒∼ A : If Cornell is NOT in NY state, then Cornell is NOT in Ithaca.

(c) Converse : B ⇒ A : If Cornell is in NY state, then Cornell is in Ithaca.

Note that the converse statement is FALSE. This leads us to another important interpretation

of the implication A ⇒ B. It means that every time A is true, then B must be true. Hence A is a

sufficient condition for B. If we know that A is true then we can always conclude that B is also

true. The contrapositive ∼ B ⇒∼ A showed us that when B is not true then A cannot be true either.

Hence B is a necessary condition for A. If A is true we must necessarily have that B is true, because

if B isn’t true then A cannot be true either. Thus we have following ways of reading

A implies B,

If A then B,

(2.10) A⇒B:

A is sufficient for B,

B is necessary for A.

Remark 2.2. Note that for equivalence relation (the if and only if) A ⇔ B, the implication goes

in both the directions. In this case A and B are necessary and sufficient conditions for each other.

A ⇔ B means that both the statement A ⇒ B and its converse B ⇒ A are true.

2.4. Quantifiers

In the previous sections, we learnt some definitions and basic properties of compound statements.

We were interested in whether a particular statement was true or false. This logic is called propo-

sitional logic or statement logic. However there are many arguments whose validity cannot be

verified using propositional logic. Consider, for example, the sentence

p : x is an even integer.

This sentence is neither true nor false. The truth or falsity depends on the value of the variable x.

For some values of x the sentence is true; for others it is false. Thus this sentence is not a statement.

However, let us denote this sentence by P(x), i.e.,

P(x) : x is an even integer.

Then, P(5) is false, while P(6) is true. To study the properties of such sentences, we need to extend

the framework of propositional logic to what is called first-order logic.

2.4. Quantifiers 9

Definition 2.4. A predicate or propositional function is a sentence that contains a finite number of

variables and becomes a statement when specific values are substituted for the variables. The do-

main of a predicate variable is the set of all values that may be substituted in place of the variables.

is a propositional function with domain D, the set of integers; since for each x ∈ D, P(x) is a

statement, i.e., for each x ∈ D, P(x) is true or false, but not both.

(a) The sentence “P(x) : x + 3 is an even integer” with domain D the set of positive integers.

(b) The sentence “P(x) : x + 3 is an even integer” with domain D the set of integers.

(c) The sentence “P(x; y; z) : x2 + y2 = z2 ” with domain D the set of positive integers.

Before proceeding further, we introduce following notations. A more comprehensive list of

notation will be described later.

: “such that”,

∧ : AND in the sense that A ∧ B means both Aand B,

∨ : OR in the sense that A ∨ B means either A or B or both

∀ : Universal “for all”

∃ : Existential “there exists” (one or more).

(a) The Universal Quantifier:

Let P(x) be a predicate with domain D. Then the sentence

is a statement. To see this, notice that either P(x) is true at each value x ∈ D (the notation x ∈ D

indicates that x is in the set D, while x ∈

/ D means that x is not in D) or P(x) is false for at least

one value of x ∈ D. If P(x) is true at each value x ∈ D, then Q(x) is true. However, if P(x) is

false for at least one value of x ∈ D, then Q(x) is false. Hence, Q(x) is a statement because it is

either true or false (but not both).

Definition 2.5. Each of the phrases “every”, “for every”, “for each”, and “for all” is referred

to as the universal quantifier and is expressed by the symbol ∀. Let P(x) be a statement with

domain D. A universal statement is a statement of the form ∀x ∈ D, P(x). It is false if P(x) is

false for at least one x ∈ D; otherwise, it is true.

10 2. Introduction to Logic

The statement

∀x ∈ D, x > 0

means “For all x that are elements of D, x is positive.”

Example 2.11. Let P(x) be the predicate “P(x) : x2 ≥ x.”.

Determine whether the following universal statements are true or false.

(i) ∀x ∈ R; P(x);

(ii) ∀x ∈ Z; P(x);

( )2 ( )

(i) Let x = 12 ∈ R. Then, 12 = 14 < 12 , and so P 12 is false. Therefore, “∀x ∈ R; P(x)” is

false.

(ii) For all integers x, x2 ≥ x is true, and so P(x) is true for all ∀x ∈ Z. Hence,“∀x ∈ Z; P(x)”

is true.

(b) The Existential Quantifier:

Each of the phrases “there exists”, “there is”, “for some”, and “for at least one” is referred

to as the existential quantifier and is denoted in symbols ∃. Let P(x) be a predicate with domain

D. An existential statement is a statement of the form ∃x ∈ D such that P(x): It is true if P(x)

is true for at least one x ∈ D; otherwise, it is false.

Example 2.12. As before let D be a set.

The statement

∃x ∈ D, x > 0

tells us that “There exists an element x of D such that x is positive.”

Example 2.13. Let P(x) be the predicate “P(x) : x2 < x.”.

Determine whether the following existential statements are true or false.

(i) ∃x ∈ R; P(x);

(ii) ∃x ∈ Z; P(x);

(i) Let x = 12 ∈ R. Then, ( 12 )2 = 14 < 12 , and so P( 21 ) is true. Therefore, “∃x ∈ R; P(x)” is

true.

(ii) For all integers x, x2 ≥ x is true, and so there is no x ∈ Z such that P(x) is true. Hence,“∃x ∈

Z; P(x)” is false.

(c) Universal Conditional Statements

Recall that a conditional statement has a contrapositive, a converse, and an inverse. These

definitions can be extended to universal conditional statements. Consider a universal condi-

tional statement of the form ∀x ∈ D; P(x) ⇒ Q(x).

(i) Its contrapositive is the statement,

∀x ∈ D; ∼ Q(x) ⇒∼ P(x).

(ii) Its converse is the statement,

∀x ∈ D; Q(x) ⇒ P(x)

2.4. Quantifiers 11

∀x ∈ D; ∼ P(x) ⇒∼ Q(x).

Example 2.14. Write the contrapositive, converse, and inverse of the statement: If a real num-

ber is greater than 3, then its square is greater than 9.

Solution 2.2. Symbolically, the statement can be written as:

∀x ∈ R; if x > 3 then x2 > 9

Here P(x) is the statement x > 3 and Q(x) the statement x2 > 9.

(i) The contrapositive is:

∀x ∈ R; if x2 ̸> 9 then x ̸> 3,

or, equivalently,

∀x ∈ R; if x2 ≤ 9 then x ≤ 3.

(ii) The converse is:

∀x ∈ R; if x2 > 9 then x > 3.

Note that the converse is false; take, for example, x = −4. Then, (−4)2 > 9 is true but

−4 > 3 is false. Hence the statement if (−4)2 > 9 then −4 > 3 is false. Hence the

universal statement ∀x ∈ R; if x2 > 9 then x > 3 is false.

(iii) The inverse is:

∀x ∈ R; if x ̸> 3 then x2 ̸> 9,

or, equivalently,

∀x ∈ R; if x ≤ 3 then x2 ≤ 9.

(d) Order of quantifiers:

If the quantifiers are of the same type, the order in which they appear does not matter.

∀x, ∀y : x+y = y+x

∃x ∧ ∃y : x + y = 2 ∧ x + 2y = 3.

But if the quantifiers are of different types we have to be careful. For the set of real numbers,

the statement

(2.11) ∀x ∃y y > x

is TRUE, that is given any real number x, there is always a real number y that is greater than x.

But the statement

(2.12) ∃y ∀ x, y>x

is FALSE, since there is no fixed real number y that is greater than every real number.

Example 2.15. The statement [∃y ∈ U ∀x ∈ V, statement A] means that one y will make A

true regardless of what x is. The statement [∀x ∈ V, ∃y ∈ U statement A] means that A can be

made true by choosing y depending on x.

12 2. Introduction to Logic

Fact 1. The negation of a universal statement of the form ∀x ∈ D; P(x) is logically equivalent to an

existential statement of the form ∃x ∈ D; such that ∼ P(x). Symbolically,

∼ [∀x ∈ D; P(x)] ≡ ∃x ∈ D; such that ∼ P(x)

Consider the universal statement ∀x ∈ D; P(x). It is false if P(x) is false for at least one x ∈ D;

otherwise, it is true. Hence it is false if and only if P(x) is false for at least one x ∈ D, or, if and

only if ∼ P(x) is true for at least one x ∈ D. Thus the negation of this statement is the statement

∃x ∈ D such that ∼ P(x).

Example 2.16. What is the negation of the statement “All mathematicians wear glasses ”?

Solution 2.3. Let us write this statement symbolically. Let D be the set of all mathematicians and

let P(x) be the predicate “x wears glasses” with domain D. The given statement can be written as

∀x ∈ D; P(x). The negation is ∃x ∈ D such that ∼ P(x). In words, the negation is “There exists a

mathematician who does not wear glasses” or “Some mathematicians do not wear glasses”.

Fact 2. The negation of an existential statement of the form ∃x ∈ D such that P(x) is logically

equivalent to a universal statement of the form ∀x ∈ D; ∼ P(x). Symbolically,

∼ (∃x ∈ D such that P(x)) ≡ ∀x ∈ D; ∼ P(x).

Consider the existential statement, ∃x ∈ D such that P(x). It is true if P(x) is true for at least

one x ∈ D; otherwise, it is false. Hence it is false if and only if P(x) is false for all x ∈ D, in other

words, if and only if ∼ P(x) is true for all x ∈ D. Thus the negation of this statement is the statement

∀x ∈ D; ∼ P(x).

Example 2.17. What is the negation of the statement “Some politicians are honest”?

Solution 2.4. Let us write this statement symbolically. Let D be the set of all politicians and let

P(x) be the predicate “x is honest” with domain D. The given statement can be written as ∃x ∈ D

such that P(x). The negation is ∀x ∈ D; ∼ P(x). In words, the negation is “All politicians are not

honest” or “No politician is honest”.

Consider next the negation of a universal conditional statement. By the second Fact, we have

that ∼ (∀x ∈ D; (P(x) ⇒ Q(x))) ≡ ∃x ∈ D such that ∼ (P(x) ⇒ Q(x)). But the negation of an “if

p then q” statement is logically equivalent to an “p and not q” statement. Hence, ∼ (P(x) ⇒

Q(x)) ≡ P(x)∧ ∼ Q(x). Therefore we have the following fact:

Fact 3. The negation of a universal conditional statement of the form ∀x ∈ D; (P(x) ⇒ Q(x)) is

logically equivalent to the existential statement of the form ∃x ∈ D such that (P(x)∧ ∼ Q(x)).

Symbolically,

∼ (∀x ∈ D; (P(x) ⇒ Q(x))) ≡ ∃x ∈ D such that (P(x)∧ ∼ Q(x)).

Written less symbolically, this becomes

∼ (∀x ∈ D; if P(x) then Q(x)) ≡ ∃x ∈ D such that P(x) and ∼ Q(x).

2.5. Rules of Negation of statements with quantifiers 13

2.5.1. More Examples. We can use the truth tables to prove following examples of negations.

∼ (A ∧ B) ⇔∼ A∨ ∼ B

∼ (A ∨ B) ⇔∼ A∧ ∼ B

∼ (x > y) ⇔ x 6 y

∼ (A ⇒ B) ⇔ A∧ ∼ B

∼ (∼ A) ⇔ A.

Try proving them (Good Exercise).

2.5.2. Negation of statement with one quantifier. The universal statement in the Example 2.10

contains a universal quantifier term and the statement x > 0. To negate a universal statement we

need to find only one counterexample. In this example, if we can find just one x in D that is non

positive, we know that it is not true that all x are positive. Thus the negation of the universal

statement

∀x ∈ D, x > 0

is an existential statement,

∃x ∈ D, x 6 0.

To negate an existential statement we must show that every possible instance is false. The existen-

tial statement

∃x ∈ D, x > 0

is false if there are no positive elements of D. Thus the negation of the existential statement is a

universal statement

∀x ∈ D, x 6 0.

Insight from these examples can be generalized to rules of negation. Note that , such that always

follows ∃ (the existential quantifier).

Rule 2.1. For negating the statement, [quantifier term, statement], first change the quantifier: ∀

becomes ∃, ∃ becomes ∀ and then negate the statement.

Rule 2.2. To negate a statement with a string of quantifiers, change the type of each quantifier,

preserve their order and negate the statement that follows the quantifiers.

(2.13) ∀ε > 0 ∃N ∀ n, if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε.

14 2. Introduction to Logic

Negation: ∃ε > 0 ∼ [∃N ∀ n, if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε],

or ∃ε > 0 ∀N, ∼ [ ∀ n, if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε],

(2.14) or ∃ε > 0 ∀N, ∃n ∼ [if n > N, then ∀ x ∈ D, fn (x) − f (x) < ε],

or ∃ε > 0 ∀N, ∃n n > N and ∼ [∀ x ∈ D, fn (x) − f (x) < ε],

or ∃ε > 0 ∀N, ∃ n > N and ∃ x ∈ D, fn (x) − f (x) > ε.

There are many fundamental logical equivalences that we often encounter. Several of these are

listed in Theorem below. We may find them to be useful for future reference.

Theorem 2.2. Let p, q and r be statements. Then the following logical equivalences hold.

(1) Commutative Laws

(i) p ∧ q ≡ q ∧ p;

(ii) p ∨ q ≡ q ∨ p.

(2) Associative Laws

(i) (p ∧ q) ∧ r ≡ p ∧ (q ∧ r);

(ii) (p ∨ q) ∨ r ≡ p ∨ (q ∨ r).

(3) Distributive Laws

(i) p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r);

(ii) p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r).

(4) De Morgans Laws

(i) ∼ (p ∨ q) ≡ (∼ p) ∧ (∼ q);

(ii) ∼ (p ∧ q) ≡ (∼ p) ∨ (∼ q).

(5) Idempotent Laws

(i) p ∧ p ≡ p;

(ii) p ∨ p ≡ p.

(6) Negation Laws

(i) p ∨ (∼ p) ≡ T ;

(ii) p ∧ (∼ p) ≡ F;

where T: True; F: False.

(7) Universal Bound Laws

(i) p ∨ T ≡ T ;

(ii) p ∧ F ≡ F.

(8) Identity Laws

(i) p ∨ F ≡ p;

(ii) p ∧ T ≡ p.

(9) Double Negation Law ∼ (∼ (p)) ≡ p.

2.7. Some Math symbols and Definitions 15

The De Morgans Laws can be expressed in words as under: “The negation of an and statement is

logically equivalent to the or statement in which each component is negated, while the negation of

an or statement is logically equivalent to the and statement in which each component is negated.”

This is a very brief list of some of the mathematical shorthand that will be used in this course and

in the first year courses. Some of these symbols will be explained in more detail as we go.

Operator Meaning

∀ For all, for every, for each

∃ There exists, there is

∈ In, a member of

∋ Owns, contains

∨ Or

∧ And

∴ Therefore

∼ or ¬ Not

0/ Empty set

⊂ Subset, is a subset of

⊃ Contains the set

∪ Union (of sets)

∩ Intersection (of sets)

⇒ Implies

⇐⇒ or iff If and only if, each implies the other

s.t., |, or : Such that

Q.E.D. Quod erat demonstrandum (Proof complete)

(a) Theorem A statement which can be demonstrated to be true by accepted mathematical

operations and arguments.

In general, a theorem is an embodiment of some general principle that makes it part of a

larger theory. The process of showing a theorem to be correct is called a proof.

(b) Proposition A statement which is required to be proved.

(c) Axiom A proposition regarded as self-evidently true without proof. The word “axiom” is

synonym for postulate.

(d) Corollary An immediate consequence of a result already proved. Corollaries usually state

more complicated theorems in a language simpler to use and apply.

(e) Lemma A short theorem used in proving a larger theorem.

16 2. Introduction to Logic

(f) Hypothesis A hypothesis is a proposition that is consistent with known data, but has been

neither verified nor shown to be false.

(g) Definition Tells us how or what things are.

Chapter 3

Proof Techniques

A proof is a method of establishing the truthfulness of an implication. An example would be to

prove a proposition of the form, “If H1 , · · · , Hn , then T. ”. The statements H1 , · · · , Hn are referred to

as hypotheses of the proof and proposition T is referred to as the conclusion. A formal proof would

consist of a sequence of valid propositions ending with the conclusion T. By valid proposition,

we mean the proposition in the sequence must either be one of the hypotheses H1 , · · · , Hn , or an

axiom, a definition, a tautology or a proposition proved earlier, or it must be derived from previous

propositions using either logical implication or substitution.

Before we present proof techniques, we describe some elementary definitions in number theory.

Definition 3.1. An integer n is even if and only if n = 2k for some integer k. An integer n is odd if

and only if n = 2k + 1 for some integer k.

Using the quotient-remainder theorem, we can show that every integer is either even or odd.

Definition 3.2. An integer n is prime if and only if n > 1 and for all positive integers r and s, if

n = r · s then r = 1 or s = 1. An integer n is composite if and only if n = r · s for some positive

integers r and s, with r ̸= 1 and s ̸= 1.

First three prime numbers are 2, 3, and 5. First six composite numbers are 4, 6, 8, 9, 10 and 12.

Every integer greater than 1 is either prime or composite since the two definitions are negations of

each other.

Definition 3.3. Two integers m and n are said to be of the same parity if m and n are both even or

are both odd, while m and n are said to be of the opposite parity if one of m and n is even and the

other is odd. Two integers are consecutive if one is one more than the other.

17

18 3. Proof Techniques

Integers 2 and 8 are of same parity while 5 and 10 are of opposite parity.

Definition 3.4. Let n and d be integers with d ̸= 0. Then n is said to be divisible by d if n = d · k for

some integer k. In such case we say that n is a multiple of d, or d is a factor of n, or d is a divisor

of n, or d divides n.

We discuss following techniques of writing proofs. Our emphasis here will be on showing how

each of them is used through several examples.

Let P(x) and Q(x) be statements with domain D. If Q(x) is true for every x ∈ D, then the universal

statement

∀x ∈ D, P(x) → Q(x)

is true regardless of the truth value of P(x). Such a proof is called a trivial proof.

Claim 3.1. For x ∈ R, if x > −3, then x2 + 1 > 0.

Proof. Consider the two statements P(x) : x > −3 and Q(x) : x2 + 1 > 0. Since x2 ≥ 0 for every

x ∈ R, it follows that x2 + 1 ≥ 0 + 1 > 0 for every x ∈ R. Thus P(x) → Q(x) is true for every x ∈ R

and hence for x > −3.

Claim 3.2. If n is an odd integer, then 6n3 + 4n + 3 is an odd integer.

where k = 3n3 + 2n + 1 ∈ Z), the integer 6n3 + 4n + 3 is odd for every integer n.

Observe the fact that 6n3 + 4n + 3 is odd does not depend on n being odd. It would have been

better to replace the statement of the claim by “if n is an integer, then 6n3 + 4n + 3 is odd.”

Let P(x) and Q(x) be the statements with domain D. If P(x) is false for all every x ∈ D, then the

universal statement

∀x ∈ D, P(x) → Q(x)

is true regardless of the truth value of Q(x). Such a proof is called vacuous proof.

Claim 3.3. For x ∈ R, if x2 − 2x + 1 < 0, then x > 1.

Proof. Let P(x) : x2 − 2x + 1 < 0 and Q(x) : x > 1. Since x2 − 2x + 1 = (x − 1)2 ≥ 0 for every

x ∈ R, we have (x − 1)2 < 0 is false for every x ∈ R. Hence, P(x) is false for every x ∈ R. Thus,

P(x) → Q(x) is true for every x ∈ R.

3.4. Proof by Construction 19

In a proof by construction we work straight from the set of assumptions.

Example 3.1. Consider a function

(3.1) f (n) = n2 + n + 17,

where n ∈ N. If we evaluate this function, it seems that we always get a prime number. For

instance

f (1) = 19

f (2) = 23

f (3) = 29

f (15) = 257.

We can verify that all these numbers are prime. Then we might conjecture that

Conjecture 1. The function f (n) = n2 + n + 17 generates prime numbers for all n ∈ N.

we have NOT proved the conjecture made above in the example. In fact, this conjecture is false.

Take n = 17, f (17) = 172 + 17 + 17 = 17 · 19 which is not a prime number.

Example 3.2. Let NE be the set of even natural numbers and NO be the set of odd numbers.

We want to show that (i) the sum of two even numbers is even,

∀x, y ∈ NE , x + y ∈ NE

and (ii) the sum of an odd number and an even number is odd

∀x ∈ NE , ∀y ∈ NO , x + y ∈ NO .

(i) Let

x, y ∈ NE ⇔ ∃m, n ∈ N x = 2m ∧ y = 2n,

x + y = 2m + 2n = 2 (m + n) ∈ NE since m + n ∈ N.

(ii) Let

x ∈ NE ⇔ ∃m ∈ N x = 2m, y ∈ NO ⇔ ∃n ∈ N y = 2n + 1,

x + y = 2m + 2n + 1 = 2 (m + n) + 1, where m + n ∈ N ⇒ x + y ∈ NO .

Example 3.3. Consider function g (n, m)

20 3. Proof Techniques

g (n, m) = n2 + n + m where m, n ∈ N.

g (1, 2) = 12 + 1 + 2 = 22

g (2, 3) = 22 + 2 + 3 = 32

g (12, 13) = 122 + 12 + 13 = 132

On the basis of above, we can form a conjecture,

Conjecture 2.

(3.2) ∀n ∈ N, g (n, n + 1) = (n + 1)2 .

Proof. By construction.

g (n, n + 1) = (n)2 + n + (n + 1)

= n2 + 2n + 1

= (n + 1)2 .

Having proved the general statement, we know that

g (15, 16) = 162 .

This is an example of deductive reasoning.

x ∈ NO ⇔ ∃n ∈ N x = 2n + 1,

x2 = (2n + 1)2

= 4n2 + 4n + 1

( )

= 2 2n2 + 2n + 1

⇒ x 2 ∈ NO .

For x = 1, x2 = 1 which is odd.

Example 3.5. If the sum of two integers is even, then so is their difference.

Proof. Assume that the integers m and n are such that m + n is even. Then m + n = 2k for some

integer k. So, m = 2k − n and m − n = 2k − n − n = 2(k − n) = 2l, where l = k − n is an integer.

Thus m − n is even.

3.5. Proof by Contraposition 21

Note that A ⇒ B is not logically equivalent to its converse statement B ⇒ A. It is possible for an

implication to be false while its converse is true. Hence we cannot prove A ⇒ B by showing B ⇒ A.

m2 > 0 ⇒ m > 0

is false but its converse

m > 0 ⇒ m2 > 0

is true.

To show that A ⇒ B, we can instead show that ∼ B ⇒∼ A. We have already shown before that

implication and its contrapositive are logically equivalent.

Its contrapositive is “If m is not an odd number, then 7m is not an odd number.”, or, equivalently,

“If m is an even number, then 7m is an even number.”

We are talking about integers here. Using contrapositive, we can construct a proof of theorem

as under:

Proof.

m ∈ NE ⇔ ∃k ∈ N m = 2k,

7m = 7 (2k) = 2 (7k) , 7k ∈ N ⇒7m ∈ NE .

This is much easier than trying to show directly that 7m being odd implies that m is odd.

(3.3) x 2 ∈ NE ⇒ x ∈ NE

Its contrapositive is

(3.4) x ∈ NO ⇒ x 2 ∈ NO

22 3. Proof Techniques

To prove that statement C is true, try supposing ∼ C is true and then show that this leads to a

contradiction. To show that A ⇒ B we can use

(3.5) ∼ (A ⇒ B) ⇔ A∧ ∼ B.

So assume A to be true and show ∼ B is false. Hence A∧ ∼ B is false. So A ⇒ B is true.

x 2 ∈ NE ⇒ x ∈ NE .

x2 ∈ NE ⇔ ∃m ∈ N x2 = 2m.

x ∈ NO ⇔ ∃n ∈ N x = 2n + 1

⇒ x2 = 4n2 + 4n + 1, which is odd.

This contradicts initial assumption that x2 is even.

Proof. Assume, to the contrary, that there is a greatest integer, say N. Then, N ≥ n for every integer

n. Let m = N + 1. Now m is an integer since it is the sum of two integers. Also, m > N. Thus, m is

an integer that is greater than the greatest integer, which is a contradiction. Hence our assumption

that there is a greatest integer is false. Thus there is no greatest integer.

Definition 3.5. A real number r is rational number if r = mn for some integers m and n with n ̸= 0.

A real number that is not a rational number is called an irrational number.

Proof. Assume, to the contrary, that there is a least positive rational number x. Then, x ≤ y for

every positive rational number y. Consider the number 2x . Since x is a positive rational number,

so too is 2x . Multiplying both sides of the inequality 12 < 1 by x, which is positive, gives 2x < x.

Hence, 2x is a positive rational number that is less than x, which is a contradiction. Hence our

assumption that there is a least positive rational number is false. Thus there is no least positive

rational number.

Example 3.12. The sum of a rational number and an irrational number is irrational.

3.6. Proof by Contradiction 23

Proof. Assume, to the contrary, that there exists a rational number p and an irrational number q

whose sum is a rational number. Thus, by definition of rational numbers, p = ab and p + q = r = dc

for some integers a; b; c and d with b ̸= 0 and d ̸= 0. Hence,

c a bc − ad

q = r− p = − =

d b bd

Now, bc − ad ∈ Z and bd ∈ Z since a; b; c and d ∈ Z. Since b ̸= 0 and d ̸= 0, bd ̸= 0. Hence,

r ∈ Q, which is a contradiction. Hence our assumption that there exists a rational number and an

irrational number whose sum is a rational number is false. Thus, the sum of a rational number and

an irrational number is irrational.

√

We end this section with a proof of the classical result that 2 is irrational.

√

Example 3.13. The real number 2 is irrational.

√

Proof. Assume, to the contrary, that 2 is rational. Then,

√ m

2=

n

where m; n ∈ Z and n ̸= 0. By dividing m and n by any common factors, if necessary, we may

further assume that m and n have no common factors, i.e., mn has been expressed in (or reduced to)

2

lowest terms. Then, 2 = mn2 , and so m2 = 2n2 . Thus, m2 is even. Hence, m is even, and so m = 2k,

where k ∈ Z. Substituting this into our earlier equation m2 = 2n2 , we have (2k)2 = 2n2 , and so

4k2 = 2n2 . Therefore, n2 = 2k2 . Thus, n2 is even, and so n is even. Therefore each of m and n has

2 as a factor, which contradicts our assumption that m ̸= n has been reduced to lowest terms and√

therefore that m and n have no common

√ factors. We deduce, therefore, that our assumption that 2

is rational is incorrect. Hence, 2 is irrational.

Exercise 3.1. The square root of any prime number is irrational.

Remark 3.1. One should be very careful when writing proof by contradiction. Here is a very

strong word of caution which can be found in ?, page 3.

“All students are enjoined in the strongest possible terms to eschew proofs by contradiction!

There are two reasons for the prohibition: First such proofs are very often fallacious, the contra-

diction on the final page arising from an erroneous deduction on an earlier page, rather than from

the incompatibility of p with ¬q. Second, even when correct, such a proof gives little insight into

the connection between p and q whereas both the direct proof and the proof by contraposition con-

struct a chain of argument connecting p and q. One reason why mistakes are so much more likely

in proofs by contradiction than in direct proofs is that in a direct proof (assuming the hypotheses is

not always false) all deduction from the hypothesis are true in those cases where hypothesis holds.

One is dealing with true statements, and one’s intuition and knowledge about what is true help to

keep one from making erroneous statements. In proofs by contradiction, however, you are (assum-

ing the theorem is true) in the unreal world where any statement can be derived, and so the falsity

of a statement is no indication of an erroneous deduction.”.

24 3. Proof Techniques

A proof by induction involves three steps.

(a) Base of induction. Check for n = 1, whether the statement is true.

(b) Inductive transition: Assume that the statement is true for some n and show that it is also true

for n + 1.

(c) Inductive conclusion: The statement is true for all n > 1.

Proof. By Induction.

(a) Base of induction:

Assume that for

then for

f (x) = xn+1 = xn · x,

f ′ (x) = nxn−1 · x + xn · 1

= nxn + xn

(3.8) = (n + 1) xn

∀n ∈ N if f (x) = xn then f ′ (x) = n · xn−1 .

(3.9) 7n − 4n = 7 − 4 = 3

Statement is true.

(b) Inductive transition:

3.7. Proof by Induction 25

7n+1 − 4n+1 = 7 · 7n − 4 · 4n

= 7 · 7n − 7 · 4n + 7 · 4n − 4 · 4n

= 7 · (7n − 4n ) + (7 − 4) · 4n

= 7 · (3m) + 3 · 4n

= 3 · (7m + 4n )

Since m and n are natural numbers, so is 7m + 4n . So 7n+1 − 4n+1 is a multiple of 3.

(c) Inductive conclusion:

7n − 4n is a multiple of 3, for all n ∈ N.

(n)

Example 3.16. Prove the Binomial Theorem : (a + b)n = ∑nk=0 k an−k bk by induction.

For n = 1, the claim is trivially true.

(b) Inductive transition:

Assume that the Binomial Theorem holds true for n. Then

n ( )

n n−k k

(a + b)n+1

= (a + b)(a + b) = (a + b) ∑

n

a b

k=0 k

n ( ) n ( )

n n−k+1 k n n−k k+1

= ∑ a b +∑ a b

k=0 k k=0 k

n ( ) ( )

n n−k+1 k n+1 n

= ∑ a b +∑ an−l+1 bl by change of variable l = k + 1

k=0 k l=1 l − 1

( ) {( ) ( )} ( )

n

n n+1 n n n n+1

= a +∑ + an−l+1 bl + b

0 l=1 l l − 1 n

( ) {( )} ( )

n + 1 n+1 n+1 n+1 n + 1 n+1

= a +∑ an−l+1 l

b + b

0 l=1 l n+1

n+1 ( )

n + 1 (n+1)−k k

= ∑ a b

k=0 k

In the fifth line we have used the fact that ,

( ) ( ) ( )

n n n+1

+ = .

l l −1 l

It is a good exercise to verify this.

(c) Inductive conclusion:

The Binomial Theorem holds for all n ∈ N.

26 3. Proof Techniques

Observe that in the inductive hypothesis of our proof above, we assume that P(k) is true for an

arbitrary, but fixed, positive integer k. We certainly do not assume that P(k) is true for all positive

integers k, for this is precisely what we wish to prove! It is important to understand that our aim is

to establish the truth of the implication “If P(k) is true, then P(k + 1) is true.” which together with

the truth of the statement P(1) allows us to conclude that an infinite number of statements (namely,

P(1), P(2),P(3), · · · ) are true.

Example 3.17. For every positive integer n,

n(n + 1)(2n + 1)

12 + 22 + · · · + n2 = .

6

n(n+1)(2n+1)

Proof. For every integer n ≥ 1, let P(n) be the statement P(n) : 12 + 22 + · · · + n2 = 6 .

(a) Base of induction:

When n = 1, the statement P(1) : 12 = 1(1+1)(2·1+1)

6 is certainly true since 1(1+1)(2·1+1)

6 =

6

6 = 1. This establishes the base case when n = 1.

(b) For every integer n > 1, let P(n) be the statement P(n) : 12 + 22 + · · · + n2 = n(n+1)(2n+1)

6 . For

the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 and assume

that P(k) is true; that is, assume that 12 + · · · + k2 = k(k+1)(2k+1)

6 . For the inductive step, we

need to show that P(k + 1) is true. That is, we show that

(k + 1)(k + 2)(2k + 3)

12 + 22 + · · · + k2 + (k + 1)2 = .

6

Evaluating the left-hand side of this equation, we have

12 + 22 + · · · + k2 + (k + 1)2 = (12 + 22 + · · · + k2 ) + (k + 1)2

k(k + 1)(2k + 1)

= + (k + 1)2 (by the inductive hypothesis)

6

k(k + 1)(2k + 1) 6(k + 1)2

= +

6 6

(k + 1)(2k2 + k + 6k + 6)

=

6

(k + 1)(2k2 + 7k + 6) (k + 1)(2k2 + 4k + 3k + 6)

= =

6 6

(k + 1)(k + 2)(2k + 3)

= ;

6

thus verifying that P(k + 1) is true.

(c) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1; that is,

n(n + 1)(2n + 1)

12 + 22 + · · · + n2 =

6

3.7. Proof by Induction 27

Recall that in a geometric sequence, each term is obtained from the preceding one by multiply-

ing by a constant factor. If the first term is 1 and the constant factor is r, then the sequence is 1, r,

r2 , r3 , · · · , rn , · · · . The sum of the first n terms of this sequence is given by a simple formula which

we shall verify using mathematical induction. This is left as an exercise.

Induction can also be used to solve problems involving divisibility, as the next two example

illustrates.

Example 3.18. For all integers n ≥ 1, 22n − 1 is divisible by 3.

Proof. We proceed by mathematical induction. When n = 1, the result is true since in this case

22n − 1 = 22 − 1 = 3 and 3 is divisible by 3. Hence, the base case when n = 1 is true. For the

inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 and assume that the

property holds for n = k, i.e., suppose that 22k − 1 is divisible by 3. For the inductive step, we must

show that the property holds for n = k + 1. That is, we must show that 22(k+1) − 1 is divisible by

3. Since 22k − 1 is divisible by 3, there exists, by definition of divisibility, an integer m such that

22k − 1 = 3m, and so 22k = 3m + 1. Now,

22(k+1) − 1 = 22k 22 − 1

= 4 · 22k − 1

= 4(3m + 1) − 1

= 12m + 3

= 3(4m + 1).

Since m ∈ Z, we know that 4m + 1 ∈ Z. Hence, 22(k+1) − 1 is an integer multiple of 3; that is,

22(k+1) − 1 is divisible by 3, as desired. Hence, by the principle of mathematical induction, the

property holds for all integers n ≥ 1.

Induction can also be used to verify certain inequalities, as the next example illustrates.

Example 3.19. For all integers n ≥ 2,

√ 1 1 1

n < √ + √ +···+ √ .

1 2 n

Proof. We proceed by mathematical induction. To show the inequality holds for n = 2, we must

show that

√ 1 1

2< √ +√ .

1 2

√ √

But √ √ if and1 only1 if 2 < 2 + 1 which is true if and only if 1 < 2. Since

this inequality is true

1 < 2 is true, so too is 2 < √1 + √2 . Hence the inequality holds for n = 2. This establishes the

28 3. Proof Techniques

base case. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 2 and

assume that the inequality holds for n = k, i.e., suppose that

√ 1 1 1

k < √ + √ +···+ √ .

1 2 k

For the inductive step, we must show that the inequality holds for n = k + 1. That is, we must show

that

√ 1 1 1 1

k +1 < √ + √ +···+ √ + √ .

1 2 k k+1

√ √ √

Since k > 2, k < k + 1, and so multiplying both sides by k,

√ √

k < k k + 1.

√ √ √

Add 1 to both sides, k + 1 < k k + 1 + 1; and so dividing both sides by k + 1 we have

√ √ 1

k+1 < k+ √ .

k+1

Hence, by the inductive hypothesis,

√ 1 1 1 1

k +1 < √ + √ +···+ √ + √ ;

1 2 k k+1

as desired. Hence, by the principle of mathematical induction, the inequality holds for all integers

n > 2.

To prove a universal statement

(3.10) ∀x ∈ D, p (x)

we let x represent an arbitrary element of the set D and then show that statement p (x) is true. The

only properties we can use about x are those that apply to all elements of D. For example, if the set

D consists of the natural numbers, then we cannot assume x to be odd as not all natural numbers

are odd. To prove an existential statement,

(3.11) ∃x ∈ D, p (x)

all we need to do is to show that there exists at least one member of D for which p (x) is true. We

show these techniques through following examples.

Example 3.20. For every ε > 0, there exists a δ > 0 such that

In this example we are asked to prove that the statement is true for each positive number ε. We

begin with an arbitrary ε and use it to find a δ which is positive and has the property that the

implication holds true. We give a particular value of δ which could possibly depend on ε and show

that the statement is true.

3.8. Additional Notes on Proofs 29

1−δ < x < 1+δ

ε ε

1− < x < 1+

2 2

2 − ε < 2x < 2 + ε

5 − ε < 2x + 3 < 5 + ε.

In some cases, it is possible to prove an existential statement in an indirect way without actually

producing any specific element of the set. One indirect method is to use contrapositive and another

is to use a proof by contradiction. Consider following example to show this aspect.

Example 3.21. Let f be a continuous function.

If

∫1

(3.13) f (x) dx ̸= 0,

0

then there exists a point x ∈ [0, 1] such that

f (x) ̸= 0.

∫1

(3.14) If ∀x ∈ [0, 1] f (x) = 0, then f (x) dx = 0

0

This is lot easier to prove. Instead of having to conclude the existence of an x having a particular

property, we are given that all x have a different property. The proof follows directly from the

definition of the integral, since each of the terms in any Riemann sum will be zero.

1

Example 3.22. Let x be a real number. If x > 0 then x > 0.

and

1

(3.15) 6 0.

x

Since x > 0, we can multiply both sides by x.

( )

1

(3.16) (x) 6 (x) · 0 or 1 6 0.

x

This is a contradiction.

30 3. Proof Techniques

Claim 3.4. There exist irrational numbers a and b such that ab is rational.

√ √2

Proof. Consider the real number, 2 . This number is either rational or irrational. We consider

each case in turn.

√ √2 √ √

(1) 2 is rational. Let a = 2 and b = 2. Thus a and b are irrational, and by assumption, ab

is rational.

√ √2 √ √2 √

(2) 2 √ is irrational. Let a = 2 and b = 2. Thus a and b are irrational. Moreover, ab =

√ 2 √ √ √ √ √

( 2 ) 2 = ( 2) 2· 2 = ( 2)2 = 2 is rational.

In both cases, we proved the existence of irrational numbers a and b such that ab is rational, and so

we have the desired result.

We remark that as it stands, this proof does not enable us to pinpoint which of the two choices

of the pair (a, b) has the required property. In order to determine the correct choice of (a, b),

√ √2

we would need to decide whether 2 is rational or irrational. √It is not a constructive proof.

Following would be a constructive proof of this claim. Let a = 2 and b = log2 9. Then b is

an irrational number, for if it were rational, then log2 9 = mn where m and n are integers with no

common factor. This implies 2m = 9n which is a contradiction as 2m is an even number and 9n is

an odd number. This gives ab = 3 which is rational 1.

Let P(x) be a statement. If x possesses certain properties, and if we can verify that P(x) is true

regardless of which of these properties x has, then P(x) is true. Such a proof is called a proof by

cases.

Some proofs naturally divide themselves into consideration of two or more cases. For example

positive integers are either even or odd. Real numbers are positive, negative or zero. It may be that

different arguments are required for each case.

More rigorously, suppose we want to prove that p ⇒ q, and that p can be decomposed into two

disjoint propositions p1 , p2 such that p1 ∧ p2 is a contradiction. Then p ≡ (p1 ∨ p2 ) ∧ ¬(p1 ∧ p2 ) ≡

(p1 ∨ p2 ).

With this choice of p1 and p2 , we have,

(p ⇒ q) ⇔ (¬p ∨ q) ⇔ [¬(p1 ∨ p2 ) ∨ q]

⇔ [(¬p1 ∧ ¬p2 ) ∨ q] ⇔ [(¬p1 ∨ q) ∧ (¬p2 ∨ q)]

⇔ [(p1 ⇒ q) ∧ (p2 ⇒ q)].

This means that we only need to show that p1 ⇒ q and p2 ⇒ q. Note that this method works also

if we can decompose p into a number of propositions greater than 2 as far as these propositions are

1There is an extensive literature on constructive mathematics. You may like to do a google search for easy to read articles on the

subject. A classic reference is ?.

3.9. Decomposition or proof by cases 31

mutually exclusive ( i.e., every pair of them is a contradiction). Following example illustrates this

technique.

Before going over some examples, we state the following theorem.

Theorem 3.1. (Quotient-Remainder Theorem) For every given integer n and positive integer d,

there exist unique integers q and r such that

n = d ·q+r and 0 ≤ r < d.

Definition 3.6. Let n be a nonnegative integer and let d be a positive integer. By the Quotient-

Remainder Theorem, there exist unique integers q and r such that n = d · q + r; where 0 ≤ r < d.

We define,

n div d = q (read as “n divided by d ”), and

n mod d = r (read as “n modulo d ”).

Thus n div d and n mod d are the integer quotient and integer remainder, respectively, obtained

when n is divided by d.

Observe that given a nonnegative integer n and a positive integer d, we have that n mod d ∈

{0, · · · , d − 1} (since 0 ≤ r ≤ d − 1) and that n mod d = 0 if and only if n is divisible by d.

Result 3.1. Every integer is either even or odd.

Proof. By the Quotient-Remainder Theorem with d = 2, there exist unique integers q and r such

that n = 2 · q + r and 0 ≤ r < 2. Hence, r = 0 or r = 1. Therefore, n = 2q or n = 2q + 1 for some

integer q depending on whether r = 0 or r = 1, respectively. In the case that n = 2q, the integer n

is even. In the other case that n = 2q + 1, the integer n is odd. Hence, n is either even or odd.

Example 3.23. If n ∈ Z, then n2 + 5n + 3 is an odd integer.

(1) n is even.

Then, n = 2k for some integer k. Thus, n2 + 5n + 3 = (2k)2 + 5(2k) + 3 = 4k2 + 10k + 3 =

2(2k2 + 5k + 1) + 1 = 2m + 1, where m = 2k2 + 5k + 1. Since k ∈ Z, we must have m ∈ Z.

Hence, n2 + 5n + 3 = 2m + 1 for some integer m, and so the integer n2 + 5n + 3 is odd.

(2) n is odd.

Then, n = 2k + 1 for some integer k. Thus, n2 + 5n + 3 = (2k + 1)2 + 5(2k + 1) + 3 =

4k + 14k + 9 = 2(2k2 + 7k + 4) + 1 = 2m + 1, where m = 2k2 + 7k + 4. Since k ∈ Z, we must

2

odd.

32 3. Proof Techniques

Example 3.24. Let m, n ∈ Z. If m and n are of the same parity (either both even or both odd), then

m + n is even.

Proof. We use a proof by cases, depending on whether m and n are both even or both odd.

(1) m and n are both even.

Then, m = 2k and n = 2l for some integers k and l. Thus, m + n = 2k + 2l = 2(k + l). Since

k + l ∈ Z, the integer m + n is even.

(2) m and n are both odd.

Then, m = 2k + 1 and n = 2l + 1 for some integers k and l. Thus, m + n = (2k + 1) + (2l +

1) = 2(k + l + 1). Since k + l + 1 ∈ Z, the integer m + n is even.

Proof. We shall combine two proof techniques and use both a proof by contrapositive and a proof

by cases. Suppose that n is not a multiple of 3. We wish to show then that n2 is not a multiple of

3. By the Quotient-Remainder Theorem with d = 3, there exist unique integers q and r such that

n = 3 · q + r and 0 ≤ r < 3. Hence, r ∈ {0; 1; 2}. Therefore, n = 3q or n = 3q + 1 or n = 3q + 2

for some integer q depending on whether r = 0; 1 or 2, respectively. Since n is not a multiple of 3,

either n = 3q + 1 or n = 3q + 2 for some integer q. We consider each case in turn.

(1) n = 3q + 1 for some integer q.

Then, n2 = (3q + 1)2 = 9q2 + 6q + 1 = 3(3q2 + 2q) + 1, and so n2 is not a multiple of 3.

(2) n = 3q + 2 for some integer q.

Then, n2 = (3q + 2)2 = 9q2 + 12q + 4 = 3(3q2 + 4q + 1) + 1, and so n2 is not a multiple of

3.

Proof. We shall use both a direct proof and a proof by cases. Assume that n is an odd integer.

By the Quotient-Remainder Theorem with d = 4, there exist unique integers q and r such that

n = 4 · q + r and 0 ≤ r < 4. Hence, r ∈ {0; 1; 2; 3}. Therefore, n = 4q or n = 4q + 1 or n = 4q + 2

or n = 4q + 3 for some integer q depending on whether r = 0; 1; 2 or 3, respectively. Since n is

odd, and since 4q and 4q + 2 are both even, either n = 4q + 1 or n = 4q + 3 for some integer q. We

consider each case in turn.

(1) n = 4q + 1 for some integer q.

Then, n2 = (4q + 1)2 = 16q2 + 8q + 1 = 8(2q2 + q) + 1 = 8m + 1, where m = 2q2 + q.

Since q ∈ Z, we must have m ∈ Z. Hence, n2 = 8m + 1 for some integer m.

(2) n = 4q + 3 for some integer q.

3.9. Decomposition or proof by cases 33

8m + 1, where m = 2q2 + 3q + 1. Since q ∈ Z, we must have m ∈ Z. Hence, n2 = 8m + 1 for

some integer m.

We remark that the last conclusion can be restated as follows: For every odd integer n, we have

n2 mod 8 = 1. Here are some additional illustrative examples.

Example 3.27. If x is a real number, then

x 6 |x| .

{

x if x > 0

(3.17) |x| =

−x if x < 0.

Since this definition is divided into two parts, it makes sense to divide the proof also in two parts.

Proof. Let x be an arbitrary real number. Then either x > 0 or x < 0. If x > 0, then by definition

|x| = x. If x < 0, then −x > 0, so that

x < 0 < −x = |x|

In either case,

x 6 |x| .

Chapter 4

Problem Set 1

(1) Prove or give a counterexample for the following claims. Capital letters refer to propositions

or sets, depending on the context.

(a)

∼ (A ∧ B) ⇔ ∼ A ∨ ∼ B

(b)

∼ (A ∨ B) ⇔∼ A ∧ ∼ B.

(c)

∼ (A ⇒ B) ⇔ A ∧ ∼ B.

(d)

((A ∨ B) ⇒ C) ⇔ ((A ⇒ C) ∧ (B ⇒ C)).

(e) If n and n + 1 are consecutive integers, then both cannot be even.

(f) Give a counter example of the proposed statement: If n ∈ N then n2 > n.

(g) If x is odd then x2 is odd.

(2) Write the negation of the following statements

(a) If S is closed and bounded, then S is compact.

(b) If S is compact, then S is closed and bounded.

(c) If a function is continuous then it is differentiable.

(3) Find the contrapositive of

(a) If x2 ̸= 3 ∧ y2 > 5 then xy is a rational number.

(b) If x ̸= 0 then ∃y xy = 1.

(4) Find the mistake in the “proof”of the following results, and provide correct proofs.

(a) If m is an even integer and n is an odd integer, then 2m + 3n is an odd integer.

Proof. Since m is an even integer and n is an odd integer, m = 2k and n = 2k + 1 for some

integer k. Therefore, 2m + 3n = 2(2k) + 3(2k + 1) = 10k + 3 = 2(5k + 1) + 1 = 2l + 1;

35

36 4. Problem Set 1

2m + 3n is an odd integer.

(b) For all integers n ≥ 1, n2 + 2n + 1 is composite.

Proof. Let n = 4. Then, n2 + 2n + 1 = 42 + 2(4) + 1 = 25 and 25 is composite.

(5) Prove the following claims:

(a) An integer that is not divisible by 2, cannot be divisible by 4. (Try proving this twice, once

with contraposition and once with contradiction).

(b) There is no greatest negative real number.

(c) The product of an irrational number and a nonzero rational number is irrational.

(6) Prove that for n ∈ N,

(a)

1 + 3 + 5 + · · · + (2n − 1) = n2 .

(b)

n(n + 1)

1+2+···+n = .

2

(c)

[ ]2

n(n + 1)

13 + 23 + · · · + n3 = .

2

(d) For q ̸= 1 and n > 1,

n−1

a − [a + (n − 1)r]qn rq(1 − qn−1 )

∑ (a + kr)qk = 1−q

+

(1 − q)2

k=0

(7) (Sum of a Geometric Sequence): For all integers n ≥ 0 and all real numbers r with r ̸= 1,

n

rn+1 − 1

∑ ri = r−1

i=0

What can we say when n → ∞ for arbitrary values of r? For what values of r is the sum well

defined? What is the sum for such values of r?

(8) (a) For all integers n ≥ 2, n3 − n is divisible by 6.

(b) For all integers n ≥ 3, 2n > 2n + 1.

(9) All prime numbers greater than 6 are either of the form 6n + 1 or 6n + 5, where n is some

natural number.

(10) If |9 − 5x| ≤ 11 then show that x ≥ − 25 and x ≤ 4.

Chapter 5

5.1.1. Basic Definitions.

We define a set as a “well-specified collection”in order to emphasize that there must be a clear

rule or group of rules that determine membership in the set. Essentially all mathematical objects

can be gathered into sets: numbers, variables, functions, other sets, etc. Examples of sets can be

found everywhere around us. For example, we can speak of the set of all living human beings,

the set of all cities in Europe, the set of all propositions, the set of all prime numbers, and so on.

Each living human being is an element of the set of all living human beings. Similarly each prime

number is an element of the set of all prime numbers. If A is a set and a is an element of A, then

we write a ∈ A. If it so happens that a is not an element of A, then we write a ∈ / A. If S is the set

whose elements are s, t, and u, then we write S = {s;t; u}. The left brace and right brace visually

indicate the “bounds” of the set, while what is written within the bounds indicates the elements

of the set. For example, if S = {1; 2; 3; 5}, then 2 ∈ S, but 4 ∈ / S. Sets are determined by their

elements. The order in which the elements of a given set are listed does not matter. For example,

{1; 2; 3} and {3; 1; 2} are the same set. It also does not matter whether some elements of a given

set are listed more than once. For instance, {1; 2; 2; 2; 3; 3} is still the set {1; 2; 3}. Many sets are

given a shorthand notation in mathematics as they are used so frequently. A set may be defined by

a property. For instance, the set of all true propositions, the set of all even integers, the set of all

odd integers, and so on. Formally, if P(x) is a property, we write A = {x ∈ S : P(x)} to indicate that

the set A consists of all elements x of S having the property P(x). The colon : is commonly read as

“such that” and is also written as “| ”. So {x ∈ S|P(x)} is an alternative notation for {x ∈ S : P(x)}.

For a concrete example, consider A = {x ∈ R : x2 = 2}. Here the property P(x) is x2 = 2. Thus, A

is the set of all real numbers whose square is one.

37

38 5. Set Theory, Sequence

A B

We write B ⊆ A or A ⊇ B.

Definition 5.3. If A is a set, then B is a strict subset of A if every element of B is also an element

of A, and there exists at least one element of A which is not an element of B.

b∈B⇒b∈A

and B is a strict subset of A if

b ∈ B ⇒ b ∈ A ∧ ∃a ∈ A s.t. a ∈

/ B.

Technically we should differentiate between subsets and strict subsets, but economists are usually

sloppy about this. In most courses you will see the operator ⊂ used for both, and you will not be

required to differentiate between the two concepts. Now let X be a universal set, such that we are

interested in subsets of this set.

Definition 5.4. The complement of the set A is the set Ac containing all elements not in A.

We write Ac = {x : x ∈

/ A}.

For the complement of a set to be clearly understood, we need to know what the relevant

universe is. For example, we can define the set J as all real numbers between 2 and 4, inclusive:

J = {x ∈ R | 2 ≤ x ≤ 4}1.

In this context, the set J c is the set of all real numbers strictly less than 2 or strictly greater than 4:

J c = {x ∈ R | x < 2 ∨ x > 4}.

The “universe” in this case is the set of real numbers. The complement of J doesn’t include all

mathematical objects not in J, nor does it include all numbers not in J (because complex numbers

are excluded). In most cases the universe is clear from the context.

1This can also be written as J = [2, 4], where the square brackets indicate the closed interval between the first entry and the second.

5.1. Set Theory 39

Set A

Set AC

D = {2, 4, 10},

B = {x ∈ R s.t. x ≥ 10}

S = The set of all real-valued functions on R.

R Set of real numbers

R+ Set of non-negative real numbers ≥ 0

R++ Set of positive real numbers > 0

Z Set of integers (−10, 0, 2, 451, etc.)

Z+ Set of non-negative integers ≥ 0 (also called N)

Z++ Set of integers > 0 (sometimes also called N)

Q Set of rational numbers (numbers that can be expressed as fractions)

C Set of complex numbers

0/ Empty set or null set

Ω Universal set

R2 Set of pairs of real numbers

The last set R2 is shorthand notation for the Cartesian product R × R. This notation is accept-

able for any n ∈ Z++ number of sets. You will often encounter proofs and theorems defined on the

set Rn , which is the general way of describing the space of n-vectors, each element of which is a

real number.

40 5. Set Theory, Sequence

Definition 5.5. Union : The union of n sets is the set containing all elements from all n sets. We

write

A ∪ B = {x : x ∈ A ∨ x ∈ B}.

n

∪ Ai = A1 ∪ A2 ∪ · · · ∪ An == {x : for some i = 1, · · · , n, x ∈ Ai }

i=1

A. B

Definition 5.6. Intersection : The intersection of n sets is the set containing the elements common

to all n sets. We write

A ∩ B = {x : x ∈ A ∧ x ∈ B}.

n

∩ Ai = A1 ∩ A2 ∩ · · · ∩ An = {x : for all i = 1, · · · , n, x ∈ Ai }

i=1

C C

∪

j=n ∩

j=n ∩

j=n ∪

j=n

A j = ACj ; A j = ACj .

j=1 j=1 j=1 j=1

A. B

Definition 5.7. Exclusion : The exclusion of the set B from the set A is the set of all elements in

A that are, in addition, not elements of B. We write

A \ B = {x ∈ A | x ∈

/ B}.

5.1. Set Theory 41

A\B B\A

A B

A−B

A. B

B−A

A. B

Proposition 1. (A \ B) ∩ (B \ A) = 0/

Proof.

A \ B = A ∩ BC ⊆ BC

B \ A = B ∩ AC ⊆ B

/

B ∩ BC = 0.

42 5. Set Theory, Sequence

Exercise 5.2. Let B, and A1 , · · · , An be subsets of X. Then,

∪

j=n ∩

j=n ∩

j=n ∪

j=n

B− A j = (B − A j ); B− A j = (B − A j ).

j=1 j=1 j=1 j=1

Next we consider the sets whose elements are sets themselves. For example, let A, B, and C be

subsets of X, then the collection A = {A, B,C} is a set, whose elements are A, B and C. We call a

set whose elements are subsets of X, a family of subsets of X, or a collection of subsets of X. The

notation we follow would be, the lower case letters refer to the elements of X, upper case letters

refer to subsets of X and script letters refer to families of subsets of X.

Any subset of empty set is empty. Observe that the empty set 0/ is a subset of every set X. It is

/ In this case {0}

possible to form a non-empty set whose only element is the empty set, i.e., {0}. / is

a singleton. Also 0/ ⊂ {0}

/ and 0/ ∈ {0}.

/

Definition 5.8. Let A be any subset in X. The power class of A or the power set of A is the family

of all subsets of A. We denote the power set of A by P (A).

Specifically,

P (A) = {B : B ⊆ A}

/ = {0},

The power set of the empty set is P (0) / i.e., the singleton of 0.

/ The power set of a singleton

P ({a}) = {0, / {a}}. Note that the power set of A always contains A and 0. / In general, if A is a

n

finite set with n elements, then P (A) contains 2 elements.

Exercise 5.3. Prove that if A is a finite set with n elements, then P (A) contains 2n elements.

There are a number of set identities that the set operations of union, intersection, and set difference

satisfy. They are very useful in calculations with sets. Below we give a table of such set identities,

where U is a universal set and A, B, and C are subsets of U.

• Commutative Laws: A ∪ B = B ∪ A ; A ∩ B = B ∩ A

5.2. Set Identities 43

• Idempotent Laws: A ∪ A = A ; A ∩ A = A

• Absorption Laws: A ∩ (A ∪ B) = A ; A ∪ (A ∩ B) = A

• Identity Laws: A ∪ 0/ = A ; A ∩U = A

• Complement Laws: A ∪ Ac = U ; A ∩ Ac = 0/

• Complements of U and 0/ : U c = 0/ ; 0/ c = U

A. B

(b) (A ∪ B) \ (C \ A) = A ∪ (B \C).

/

(c) A ∩ (((B ∪Cc ) ∪ (D ∩ E c )) ∩ ((B ∪ Bc ) ∩ Ac )) = 0.

44 5. Set Theory, Sequence

We will discuss additional concepts in set theory after we have gone over some elementary

exposition of functions and sequences.

5.3. Functions

Definition 5.9. A correspondence consists of:

(c) A mapping f (x) which assigns at least one element from R to each element x ∈ D.

Definition 5.10. A function consists of:

(c) A mapping f (x) which assigns exactly one element from R to each element x ∈ D.

f (x) = x3 , D = R, R=R

f (x) = 0, D = R, R = R.

The range need not be exhausted but the domain must be.

The set of all functions is a strict subset of the set of all correspondences. This is the same as

saying that all functions are correspondences, but not the other way around. From here onwards it’s

critical that you specify the domain and the range when defining or using a function. For example

these two functions:

5.4. Vector Space 45

are not the same function, even though in practice they produce identical results.2

Definition 5.11. The argument of a function is the element from the domain that is mapped into

the range and the value of a function is the element from the range that is the destination of the

mapping.

Definition 5.12. A real-valued function is a function whose range is the set R or any subset of R.

From the above definition 5.12, the definitions of integer-valued functions, complex-valued

functions, etc., should be clear.

{ }

Definition 5.13. Let f : D → R and let A ⊆ D. We let f (A) represent the subset f (x) : x ∈ A

{ R. The set f (A)

of } is called the image of A in R. If B ⊆ R, we let f −1 (B) represent the subset

−1

x ∈ D : f (x) ∈ B of D. The set f (B) is called the pre-image of B in D.

Note that the image of a function may be equivalent to the range, or it may be a strict subset of

the range. In the above example, the image of the function f is a strict subset of its range, but the

image of g is equal to its range.

The vector space is defined over a field which is a set on which two operations + and · (called

addition and multiplication respectively) are defined. The formal definition of field is as follows:

Definition 5.14. A field F is a set on which two operations, called addition (+) and multiplication

(·), are defined so that for each pair of elements x, y in F there are unique elements x + y and x · y

in F, such that the following conditions hold for all a, b c in F.

a + b = b + a, and a · b = b · a.

(a + b) + c = a + (b + c), and (a · b) · c = a · (b · c)

(iii) Existence of identity elements for addition and multiplication: There exists elements 0 and 1

in F such that

0 + a = a, and 1 · a = a

2The difference between the two is that the range of f is all real numbers, and the range of g is the set of non-negative real numbers.

This is inconsequential, since the mapping in both cases takes all elements from the domain and assigns them to a non-negative real

number. But the two functions are still not the same.

46 5. Set Theory, Sequence

(iv) Existence of inverses for addition and multiplication: For each element a in F and for each

non-zero element b in F, there exist elements c and d in F such that

a + c = 0, and b · d = 1

a · (b + c) = a · b + a · c.

Example of fields include the set of real numbers R with the usual definitions of addition and mul-

tiplication, the set of rational numbers Q with the usual definitions of addition and multiplication.

Definition 5.15. A vector space V over a field F consists of a set on which two operations, called

addition (+) and scalar multiplication (·), are defined so that for each pair of elements x, y in V

there is a unique element x + y in V , and for each element a in the field F and for each element x in

V , there is a unique element ax in V, such that the following conditions hold.

∀x, y ∈ V, x + y = y + x

∀x, y, z ∈ V, (x + y) + z = x + (y + z)

∃ an element 0 ∈ V x + 0 = x ∀x ∈ V

∀x ∈ V ∃ some element y ∈ V x + y = 0

1 · x = x ∀ x ∈ V.

∀ α, β ∈ R, ∀ x ∈ V, (αβ) · x = α · (β · x)

∀ α ∈ F, ∀ x, y ∈ V, α · (x + y) = (α · x) + (α · y)

5.4. Vector Space 47

∀ α, β ∈ F, ∀ x ∈ V, (α + β) · x = α · x + β · x

In order to show that any space is a vector space, we simply need to show that the properties in

the above definition are satisfied.

Definition 5.16. The Cartesian Product of sets A and B is the set of pairs (a, b) satisfying a ∈

A ∧ b ∈ B. We write

A × B = {(a, b) | a ∈ A ∧ b ∈ B}.

The Cartesian product is the two set case of the general “cross product” of sets, which is the

same concept defined for any number of sets. For example using sets A, B, C and D we could define

E = A × B × C × D, and a typical element of E would be (a, b, c, d) for some a ∈ A, b ∈ B, c ∈ C

and d ∈ D.

Example 5.2.

{ }

R3 = R × R × R = (x, y, z) | x ∈ R ∧ y ∈ R ∧ z ∈ R

R2+ = R+ ×R+ ; R2++ = R++ ×R++ .

The order of the sets in the cross-product does matter as the following example shows.

Example 5.3. Let

A = {1, 2, 3} , B = {2, 4}

{ }

A × B = (1, 2) , (1, 4) , (2, 2) , (2, 4) , (3, 2) , (3, 4)

{ }

B × A = (2, 1) , (2, 2) , (2, 3) , (4, 1) , (4, 2) , (4, 3) .

(a) The nonzero vectors u and v are parallel if there exists a ∈ R such that u = av.

(b) The vectors u and v are orthogonal or perpendicular if their scalar product is zero, that is, if

u · v = 0.

( )

u·v

(c) The angle between vectors u and v is arccos ∥u∥·∥v∥ .

5.4.1. Metric.

Definition 5.17. A distance function is a real-valued function d : V ×V → R which satisfies

(i) Non-negativity:

∀ x, y ∈ V, d(x, y) > 0 with equality if and only if x = y

48 5. Set Theory, Sequence

(ii) Symmetry:

∀x, y ∈ V, d(x, y) = d(y, x)

∀x, y, z ∈ V, d(x, z) 6 d(x, y) + d(y, z).

Any function satisfying these three properties is a distance function. A distance function is also

called a metric. The space V with elements x, y, which would be called points, is a metric space if

we can associate a distance function to it.

Example 5.4.

(a) The set of real numbers R with the distance function d(x, y) ≡ |x − y|.

(b) The set of complex numbers C with the distance function d(w, z) ≡ |w − z|.

√

d (x, y) = (x1 − y1 )2 + · · · + (xn − yn )2

where V = Rn .

0 if x = y

d (x, y) =

̸ y

1 if x =

where V is any vector space.

(e) In V = R2

{ }

d (x, y) = max |x1 − y1 | , |x2 − y2 |

d(x, y)

d1 (x, y) =

1 + d(x, y)

is also a metric. This allows us to construct any number of metric d(x, y) from any given metric.

(g) Let X be a set of people of same generation with a common ancestor, for example all grandchil-

dren of a grandmother. The distance d(x, y) between any two individuals x and y is the number

of generations one has to go back along the female lines to find the first common ancestor. For

example, distance between two sisters is one.

5.4. Vector Space 49

(h) Let X be the set of n letter words in a m-character alphabet A = {a1 , a2 , · · · , am }, meaning

X = {(x1 , x2 , · · · , xn )|xi ∈ A}. We define the distance d(x, y) between two words x = (x1 , · · · , xn )

and y = (y1 , · · · , yn ) to be the number of places in which the words have different letters. That

is,

d(x, y) = #{i|xi ̸= yi }.

Exercise 5.5. Try to show the last two examples are indeed metric functions.

5.4.2. Norm.

Definition 5.18. A norm is a real-valued function written ∥ · ∥: V → R, defined on vector space V ,

which satisfies

(i) Non-negativity:

∀x ∈ V, ∥ x ∥ > 0; with equality if only if x = 0,

(ii) Homogeneity:

∀x ∈ V, α ∈ R, ∥ α · x ∥ = | α | · ∥ x ∥,

∀x, y ∈ V, ∥ x + y ∥ 6∥ x ∥ + ∥ y ∥ .

Example 5.5.

∀x ∈ Rn , ∥x∥ = x12 + · · · + xn2

n

∀x ∈ Rn , ∥x∥ = ∑ |xi |

i=1

Definition 5.19. An inner product is a real valued function ⟨·, ·⟩ : V × V → R, defined on vector

space V , which satisfies

(i) Symmetry:

∀x, y ∈ V, ⟨x, y⟩ = ⟨y, x⟩ ,

∀x ∈ V, ⟨x, x⟩ > 0 with equality if and only if x = 0,

50 5. Set Theory, Sequence

(iii) Bilinearity:

∀x, y, z ∈ V, ∀α, β ∈ R, ⟨αx + βy, z⟩ = ⟨αx, z⟩ + ⟨βy, z⟩ .

Example 5.6. V = Rn . Dot Product

∀x, y ∈ V, x · y = x1 y1 + · · · + xn yn .

Definition 5.20. A metric space (V, d) is a space V equipped with a distance function d.

A normed metric space (V, ∥·∥) is a metric space V equipped with a norm ∥·∥. An inner product

space (V, ⟨·, ·⟩) is a space V and an inner product ⟨·, ·⟩.

5.4.4. Cauchy-Schwartz Inequality. The Cauchy-Schwarz inequality states that for all vectors x

and y of an inner product space,

|⟨x, y⟩|2 6 ⟨x, x⟩ · ⟨y, y⟩,

where ⟨·, ·⟩ is the inner product. Equivalently, by taking the square root of both sides, and referring

to the norms of the vectors, the inequality is written as

|⟨x, y⟩| 6 ∥x∥ · ∥y∥.

Moreover, the two sides are equal if and only if x and y are linearly dependent (or, in a geometrical

sense, they are parallel or one of the vectors is equal to zero).

more explicit way as follows:

|x1 y1 + · · · + xn yn |2 6 (x12 + · · · + xn2 ) · (y21 + · · · + y2n ).

5.5. Sequences

{xn } : N → Rm

that gives us an ordered infinite list of points in Rm .

Another notation for sequence is ⟨xn ⟩ where ⟨xn ⟩ ≡ (x1 , x2 , · · · ). As we saw above, sets are

unordered collections of elements. Even if there is an intuitive ordering to the elements of a set,

with respect to the definition of the set itself there is no “first element” or “last element”. Sequences,

however, are sets for which the elements are assigned a particular order.

5.5. Sequences 51

Example 5.7.

{ }

1

S1 = , n ∈ N is a sequence in R

n

( )

1

S2 = n , n ∈ N is a sequence in R2

n

The interpretation of S1 is that the nth element of the sequence is givenby 1n . So we could also

( ) ( )

1 1

have written S1 = {1, 21 , 13 , 14 , · · · }. Similarly S2 = 1 , 2 , · · · . Note the implication

1 2

of this definition is that the elements of the sequence are numbered from 1 onwards, not from 0.

It’s usually assumed in the first year courses that the first element of a sequence is numbered “1”

not “0”, but this need not always be the case. Note that order of appearance of elements matters

{1, 2, 3, 4, · · · } ̸= {2, 1, 3, 4, · · · }

and elements can be repeated,

S = {1, 1, 1, · · · } is a sequence.

Definition 5.22. We say that x is a limit point of {xn } , n ∈ N, if

∀ε > 0 there exist infinite number of terms xn d (x, xn ) < ε.

Example 5.8. (a) Let xn = (−1)n . This sequence has two limit points: a = −1 and a = 1.

2 ). This sequence has three limit points: a = −1, 0, 1.

{ }

(c) The sequence 1, −1, 12 , −1, 13 , −1, · · · has two limit points 0 and −1.

n

(d) Let xn = n(−1) . This sequence has a limit point a = 0.

Definition 5.23. The sequence {xn } converges to x (has a limit) x if

∀ ε > 0, ∃ N ∈ N such that d (xn , x) < ε ∀ n > N

x = lim xn .

n→∞

52 5. Set Theory, Sequence

Definition 5.23 is a source of a lot of difficulty. However it’s one of the most important defini-

tions in macroeconomic theory and in parts of micro, and it’s worth forcing yourself to fully absorb

it before the end of the Review. The intuition behind limits is not as difficult as the formal defini-

tion. A sequence converges to x if after choosing any very, very tiny number (ε), you can identify a

point in the sequence (N) after which all of the remaining members of the sequence are no farther

than ε from some particular value x. This concept is only well-defined for infinite sequences. In

most economic theory, the elements of a convergent sequence never actually reach their limiting

value. They simply get closer and closer to it as the sequence progresses.

1

Example 5.9. The sequence xn = n is a convergent sequence.

∀n > N, d (xn , 0) = |xn | < ε

1 1

⇔ xn < ε ⇔ < ε ⇔ n > .

n ε

So by choosing N to be any natural number greater than 1ε , we have

1 1

∀n > N, d (xn , 0) = |xn | = < < ε.

n N

Definition 5.24. A sequence {xn } is bounded if

∃ B ∈ R such that d (xn , 0) 6 B, ∀ n ∈ N.

Definition 5.25. A sequence {xn } is unbounded if

∀ B ∈ R ∃ n ∈ N such that d (xn , 0) > B.

Example 5.10. The sequence {1, 0, 1, 0, · · · } is bounded. The sequence {xn } , xn = n, n ∈ N is

unbounded.

Definition 5.26. The tail of a sequence {xn } is the continuation of {xn } after some m ∈ N, that is

{xm+1 , xm+2 , · · · }.

Theorem 5.1. A sequence {xn } is bounded if and only if the tail of {xn } is bounded.

Next let us assume that the tail of the sequence {xn } is bounded and show that {xn } is bounded.

∃ B such that |xn | ≤ B, ∀ n > m.

5.5. Sequences 53

Let

B′ = max {x1 , x2 , · · · , xm−1 , B} .

Then B′ is a bound for {xn },

∀n ∈ N, |xn | ≤ B′ .

{ }∞

Definition 5.27. If {xn }∞

n=1 is a sequence, a subsequence xnk k=1 is obtained from {xn } by cross-

ing out some (possibly infinitely many) elements, while preserving the order.

{ }

Example 5.11. Sequence: {xn } = 1, −1, 12 , −1, 13 , −1, · · · .

{ } { }

Subsequence: xnk = {−1, −1, −1, · · · } or 1, 12 , 13 , · · · .

∀n ∈ N, xn+1 > xn

and is monotone decreasing if

∀n ∈ N, xn+1 6 xn .

quences.

Claim 5.1. Let {xn } be monotonic. Then it is convergent if and only if it is bounded.

next.

Proposition 2. Nested Interval Property Suppose that I1 = [a1 , b1 ], I2 = [a2 , b2 ], · · · , where I1 ⊇

I2 ⊇ · · · , and limn→∞ (bn − an ) = 0. Then there exists exactly one real number common to all

intervals In .

Proof. Note that we have a1 < a2 < a3 · · · < an < · · · < bn < · · · < b2 < b1 . Then each bi is an

upper bound for the set A = {a1 ; a2 ; · · · }. In other words sequence {an } is monotone increasing

and bounded sequence. Therefore, limn→∞ an = a exists and a = sup{an } 6 {bk } for each natural

number k. Hence ak 6 a 6 bk for every k ∈ N or a is contained in each Ik . Now let b be contained

in In for all n ∈ N. Then an 6 b 6 bn for every n ∈ N or 0 6 (b − an ) 6 (bn − an ) for each n. Then

limn→∞ (b − an ) = 0. It follows that b = limn→∞ an = a, and so a is the only real number common

to all intervals.

Theorem 5.2. Bolzano-Weierstrass Theorem Every bounded sequence {xn } has a convergent sub-

sequence.

54 5. Set Theory, Sequence

n=1 be bounded. There is B ∈ R such that xn 6 B for all n ∈ N. We prove the

theorem in following steps.

(i) In is a closed interval [an , bn ] where bn − an = 2B 2n ; and

(ii) {i : xi ∈ In } is infinite.

We let I0 = [−B, B]. This closed interval has length 2B and xi ∈ I0 for all i ∈ N. Suppose we

have In = [an , bn ] satisfying (i) and (ii). Let cn be the midpoint an +b n

2 . Each of the intervals

2n = 2n+1 . If xi ∈ In ,

[an , cn ] and [cn , bn ] is half the length of In . Thus they both have length 12 · 2B 2B

then xi ∈ [an , cn ] or xi ∈ [cn , bn ], possibly both. Thus at least one of the sets {i : xi ∈ [an , cn ]}

or {i : xi ∈ [cn , bn ]} is infinite. If the first set is infinite, we let an+1 = an and bn+1 = cn . If

the second is infinite, we let an+1 = cn and bn+1 = bn . Let In+1 = [an+1 , bn+1 ]. Then (i) and

(ii) are satisfied. By the Nested Interval Property, there exists a ∈ ∩∞ n=1 In .

Step 2 We next find a subsequence converging to a. Choose i1 ∈ N such that xi1 ∈ I1 . Suppose we

have in . We know that {i : xi ∈ In+1 } is infinite. Thus we can choose in+1 > in such that

xin+1 ∈ In+1 . This allows us to construct a sequence of natural numbers i1 < i2 < i3 < · · ·

where xin ∈ In for all n ∈ N.

n=1 converges to a. Let ε > 0.

Choose N such that ε > 2B

2N

. Suppose n > N. Then xin ∈ In and a ∈ In . Thus |xin − a| 6 2B

2n 6

2B

2N

for all n > N.

Remark 5.1. Every bounded sequence {xn } has at least one limit point x̄.

Definition 5.29. A sequence {xn } is Cauchy sequence if

∀ ε > 0, ∃N, such that ∀n, m > N, d ( xn , xm ) < ε.

After N, each element is close to every other element or in other words, the elements lie within

a distance of ε from each other.

(i) Every convergent sequence {xn } (with limit x, say) is a Cauchy sequence, since, given any

real number ε > 0, beyond some fixed point, every term of the sequence is within distance 2ε

of x, so any two terms of the sequence are within distance ε of each other.

(ii) Every Cauchy sequence of real numbers is bounded (since for some N, all terms of the se-

quence from the N-th position onwards are within distance 1 of each other, and if M is the

5.5. Sequences 55

largest absolute value of the terms up to and including the N-th, then no term of the sequence

has absolute value greater than M + 1).

(iii) In any metric space, a Cauchy sequence which has a convergent subsequence with limit x is

itself convergent (with the same limit), since, given any real number ε > 0, beyond some fixed

point in the original sequence, every term of the subsequence is within distance 2ε of x, and

any two terms of the original sequence are within distance 2ε of each other, so every term of

the original sequence is within distance ε of x.

Theorem 5.3. Every sequence has at most one limit.

Proof. By contradiction. We use the intuition that all points end up being close to say r1 and r2

at the same time which is not possible. Let sequence {xn } converge to two limits r1 and r2 . It is

enough to show that there is one ε for which this does not hold. Let us choose ε = d(r14,r2 ) = |r1 −r

4

2|

∃N1 , ∀n > N1 , |xn − r1 | < ε

and since r2 is a limit,

∃N2 , ∀n > N2 , |xn − r2 | < ε.

Let N = max {N1 , N2 }. Then

∀n > N, |xn − r1 | + |xn − r2 | < 2ε.

By triangle inequality,

4ε = |r1 − r2 | = (xn − r2 ) − (xn − r1 ) 6 |xn − r1 | + |xn − r2 | < 2ε

which is a contradiction.

Remark 5.2. A sequence can have more than one limit point.

(a) Every convergent sequence is bounded BUT a bounded sequence may not be convergent. For

example {1, −1, 1, −1, · · · }.

xn + yn → x + y

xn · yn → x · y,

and if yn ̸= 0, ∀n ∧ y ̸= 0,

xn x

→ .

yn y

56 5. Set Theory, Sequence

>

>

6 6

If {xn } → x and xn b, ∀n ∈ N, then x b.

>

>

< 6

{ }∞

(d) x is a limit point of {xn } if and only if ∃ a subsequence xn(k) of the sequence {xn } such

{ } k=1

that xn(k) → x.

x1

x

n

1

x2n x2

(e) Sequence of vectors {xn } = ∈ R converges to a limit {x} =

N if and only if

···

···

xN

xN

{ } n

xin → {xi } , ∀i = 1, 2, · · · , N.

Definition 5.30. A metric space (X, d) is complete if every Cauchy sequence in X converges to a

limit in X.

5.6. Sets in Rn

Now we are ready for additional useful concepts in set theory. We begin with some definitions.

Definition 5.31. A set A on the real line is bounded if ∃B ∈ R ∀x ∈ A, ∥x∥ 6 B.

Theorem 5.4. For every non-empty bounded set A ⊂ R, ∃ a real number sup A such that

∀ x ∈ A, x 6 sup A

y > sup A

or sup A is the least upper bound for A.

5.6. Sets in Rn 57

r x

Br (x)

A = [0, 1] , B = (0, 1) ,C = [0, 1), D = (0, 1]

sup = 1, inf = 0

This example shows that sup and inf of a set need not belong to the set. If sup belongs to the

set A, it is called max {A} and if inf {A} belongs to the set A, it is called min {A}.

Definition 5.32. Point x is a limit point of a set A if every neighborhood of x contains a point of A

different from x : x is a limit point of A if

∀ε > 0, ∃y ∈ A, y ̸= x ∧ d (x, y) < ε.

Theorem 5.5. Bolzano-Weierstrass Theorem for sets Every bounded infinite set has at least one

limit point.

Example 5.13. For the set A = (0, 1), x = 0 is a limit point of the set A.

This shows that limit point of a set need not belong to the set.

Theorem 5.6. Point x is a limit point of set A ⊆ Rn if ∃ a sequence

{xn } ∀n ∈ N, xn ̸= x ∧ xn ∈ A ∧ xn → x.

Definition 5.33. An open ball in Rn centered at x with radius r > 0 is

{ }

Br (x) = y ∈ Rn | d (x, y) < r .

Note that the open ball does not include its boundary points.

{ }

y ∈ R2 | y21 + y22 < 1 .

58 5. Set Theory, Sequence

∀x ∈ A, ∃r > 0 Br (x) ⊆ A.

Around any point in an open set, one can draw an open ball which is completely contained in

the set.

Example 5.15. Following sets are open

/

B = (−∞, 0) ; R; 0.

Definition 5.35. The set A is closed if A contains all its limit points. (contains its borders)

Theorem 5.7. Set A ⊆ Rn is closed if and only if AC is open.

Example 5.16. Following sets are closed.

/

A = [2, 5] since AC = (−∞, 2) ∪ (5, ∞) is open.; R; 0.

There are two sets which are both open and closed. The empty set and the universal set. Empty set

0/ is open since

int 0/ = 0/

and 0/ is closed since

bd 0/ = 0/ ⊆ 0.

/

The universal set is complement of the empty set and so is both open and closed. There can be sets

which are neither open nor closed: A = (0, 1]. Following theorem characterizes the closed set using

convergent sequences.

Theorem 5.8. A set A ⊆ Rn is closed if and only if every convergent sequence of points {xn } ∈ A

has its limit x ∈ A.

Example 5.17. The budget set

{ }

B (p, I) = y ∈ Rn+ | p · y 6 I ,

where p ∈ Rn++ and I ∈ R++ , is closed.

5.6. Sets in Rn 59

Good 2 p1

|Slope| = p2

M

P2

0 M

Good 1

P1

Figure 5.5. Budget set B(p, I)

{xn } xn ∈ B (p, I) ∀n ∧ xn → x.

xn > 0, ∀n ⇒ x > 0,

p · xn 6 I, ∀n ⇒ p · x 6 I

⇒ x ∈ B (p, I) ⇒ B (p, I) is closed.

Theorem 5.9.

60 5. Set Theory, Sequence

Remark 5.3. The finite number of sets in (b) and (d) are necessary as following example shows.

( )

1 1 ∞

For (b) , An = − , , n ∈ N, ∩ An = [0] which is closed.

n n n=1

[ ]

1 ∞

For (d) , Bn = , 2 , n ∈ N, ∪ Bn = (0, 2] which is not closed.

n n=1

Example 5.18.

A = [1, 2] is compact.

R is closed but not bounded. NOT compact.

B = (1, 2] is bounded but not closed. NOT compact.

Definition 5.37. A set A ⊆ Rn is compact if every sequence of points {xn } ∈ A has a limit point

x ∈ A.

Definition 5.38. A set A ⊆ Rn is convex if ∀x, y ∈ A, ∀λ ∈ (0, 1),

λx + (1 − λ) y ∈ A.

It will be useful to draw some sets to differentiate between convex and non-convex sets.

5.6. Sets in Rn 61

62 5. Set Theory, Sequence

Chapter 6

Problem Set 2

(a) The Manhattan distance, for x, y ∈ Rn

n

(6.1) d(x, y) = ∑ | xi − yi | ∀ x, y ∈ Rn

i=1

(b) For x, y ∈ R2 ,

(6.2) d(x, y) = max{| x1 − y1 |, | x2 − y2 |}

(c) Let d(·, ·) be a metric, then

d(x, y)

(6.3) d1 (x, y) = .

1 + d(x, y)

∞ [ ]

∪ 1 2

(6.4) ,

n=1

n n

is compact.

(a) (A ∪ B)c ⊆ Ac ∪ Bc

(b) (A ∪ B)c ⊇ Ac ∪ Bc .

(4) Suppose A, B, and C are sets which satisfy both of the following two conditions

(a) A ∪C = B ∪C,

(b) A ∩C = B ∩C.

63

64 6. Problem Set 2

Prove that A = B.

(a) Prove that P (A) ∩ P (B) = P (A ∩ B).

(b) Prove that P (A) ∪ P (B) ⊂ P (A ∪ B).

(c) Give an example of sets A and B such that P (A) ∪ P (B) ̸= P (A ∪ B).

Prove that A = B.

{ }

(6.5) C = (x1 , x2 ) ∈ R2 : x12 + x22 = 1

(a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 + b2 ), and c(a1 , a2 ) = (ca1 , ca2 ).

Is C a vector space? Justify your answer.

(7) Let V denote the set of ordered pairs of real numbers. If (a1 , a2 ) and (b1 , b2 ) are elements of V

and c ∈ R define

(a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 − b2 ), and c(a1 , a2 ) = (ca1 , ca2 ).

Is V a vector space over R with these operations? Justify your answer.

(8) Let V denote the set of ordered pairs of real numbers. If (a1 , a2 ) and (b1 , b2 ) are elements of V

and c ∈ R define

(a1 , a2 ) + (b1 , b2 ) = (a1 + 2b1 , a2 + 3b2 ), and c(a1 , a2 ) = (ca1 , ca2 ).

Is V a vector space over R with these operations? Justify your answer.

(9) Prove

(J ∩ K)c = J c ∪ K c

(J ∪ K)c = J c ∩ K c

n=1 and {yn }n=1 such that {xn }n=1 → x and {yn }n=1 → y. Show that

{xn + yn }∞

n=1 → x + y.

n=1 is not convergent.

6. Problem Set 2 65

{ }

(13) Prove that the sequence {xn } = 2 − 1n : n ∈ N is not convergent to 1.

(15) Determine whether the following sets are open, closed, neither or both:

(i) S = (0, 1);

(ii) S = [0, 1];

(iii) S = R;

(iv) S = [0, 1).

[ ] ( ] ( )

1 1 1

An = 0, , Bn = 0, , Cn = − , n ,

n n n

where n is a positive integer. Obtain

∞

∪ ∞

∩ ∞

∪ ∞

∩ ∞

∪ ∞

∩

An , An , Bn , Bn , Cn , and Cn .

n=1 n=1 n=1 n=1 n=1 n=1

Chapter 7

Linear Algebra

Linear algebra is the branch of mathematics dealing with (among many other things) matrices and

vectors. It’s intuitively easy to see why linear algebra is important for econometrics and statistics.

Economic data is arranged in matrix format (rows corresponding to observations, columns corre-

sponding to variables), so the body of theory governing matrices should help us analyze data. It

is harder to see the connection between matrix theory and the optimization that we do in micro

theory, but there are some important links. We’ll cover the basics and some of the necessary detail

here, but more detailed coverage will be offered in the core courses.

7.1. Vectors

You may be familiar with vectors from physics courses, in which a vector is a pair giving the mag-

nitude and direction of a moving body. The vectors we use in economics are more general, in that

they can have any finite number of elements (rather than just 2), and the meaning of each element

can vary with the context (rather than always signifying magnitude and direction). Formally speak-

ing a vector can be defined as a member of a vector space, but we don’t need to deal with such a

definition here. For our purposes:

Definition 7.1. A vector is an ordered array of elements with either one row or one column.

The elements are usually numbers. A vector is an n × k matrix for which either n = 1, k = 1 or

both (see the definition of a matrix below). A general vector, for which the number of elements is

not specified but left as n, will sometimes be called an “n-vector”. We also refer to these as “vectors

in Rn ”. A vector can be written in either row or column form:

67

68 7. Linear Algebra

x1

( )

x2

Row Vector: x ∈ Rn = x1 x2 . . . xn ; Column Vector: x ∈ Rn =

.. .

.

xn

Although you will sometimes be able to switch between thinking of a vector as a row or a

column without restriction, there are certain operations that require a vector to be oriented in a

certain way, so it is good to distinguish between row and column vectors whenever possible. Most

people use x to refer to the vector in column form and x′ to refer to it in row form, but this is not

universal. Also, we usually use lowercase letters for vectors and uppercase letters for matrices.

0

.

Null vector 0n×1 = ..

0

1

.

Sum vector un×1 = ..

1

The ith unit vector, called ei , has all elements 0 except for the i th, which is equal to 1. The

definition of a unit vector is specific to the vector space in which it sits. For example:

0

(7.1) e2 ∈ R3 = 1

0

and

0

1

(7.2) e2 ∈ R4 =

0

0

Definition 7.2.

7.2. Matrices 69

(a) Equality :

Vectors x ∈ Rn , y ∈ Rm are equal if n = m and xi = yi ∀ i.

x ≥ y if xi ≥ yi ∀ i = 1, · · · , n;

x > y if xi ≥ yi ∀ i = 1, · · · , n and xi > yi for at least one i;

x ≫ y if xi > yi ∀i = 1, · · · , n.

(c) Addition :

∀x, y ∈ Rn , x + y = z ∈ Rn where zi = xi + yi , ∀i.

αx1

αx2

(7.3) αx =

..

.

αxn

(e) Vector Multiplication : This is essentially an inner product rule applied to Rn . See the rules

for matrix multiplication below, as they also apply for vectors.

7.2. Matrices

Definition 7.3. A matrix is a rectangular array of elements (usually numbers, for our purposes).

matrix A, we can write:

a11 a12 . . . a1k

[ ]

a21 a22 . . . a2k

[A] = ai j = .. .. .. ..

n×k n×k . . . .

an1 an2 . . . ank

70 7. Linear Algebra

It’s worthwhile to check your understanding of each of the above definitions by writing out a

matrix that satisfies each. Then note this next definition carefully:

obvious statement, but you could try proving it formally. It should only take a few lines.

7.2.1.1. Addition. Matrix addition is only defined for matrices of the same size. If A is n × k and

B is n × k then

(7.4) [A] + [B] = [C]

n×k n×k n×k

where

(7.5) ci j = ai j + bi j ∀ i = 1, · · · , n, j = 1, · · · , k.

We say that matrix addition occurs “element wise” because we move through each element of the

matrix A, adding the corresponding element from B.

7.2.1.2. Scalar Multiplication. Scalar multiplication is also an element wise operation. That is,

λa11 λa12 · · · λa1k

λa21 λa22 · · · λa2k

(7.6) ∀ λ ∈ R, λ · [A] = . ..

. .. ..

n×k . . . .

λan1 λan2 · · · λank

7.2.1.3. Matrix Multiplication. Matrix multiplication is defined for matrices [A] and [B] if j = n

m× j n×k

or m = k. That is, the number of columns in one of the matrices must be equal to the number of

rows in the other. If matrices A and B satisfy this condition, so that A is m × j and B is j × k, their

7.2. Matrices 71

product [C] ≡ [A] · [B] is given by ci j = Ai · B j , where Ai is the ith row of A and B j is the jth column

m×k m× j j×k

of B. For example, suppose

[ ] [ ]

1 2 6 5 4

[A] = and [B] =

2×2 3 4 2×3 3 2 1

Multiplication between A and B is only defined if A is on the left and B is on the right. It must

always be the case that the number of columns in the left hand matrix is the same as the number of

rows in the right hand matrix. In this case, if we say AB = C, then element

[ ]

[ ] 6

c11 = 1 2 · = 1 · 6 + 2 · 3 = 12

3

Likewise

c12 = 1 · 5 + 2 · 2 = 9

c13 = 1 · 4 + 2 · 1 = 6

c21 = 3 · 6 + 4 · 3 = 30

c22 = 3 · 5 + 4 · 2 = 23

c23 = 3 · 4 + 4 · 1 = 16

which gives [ ]

12 9 6

[A] · [B] = [C] =

2×2 2×3 2×3 30 23 16

Note that matrix multiplication is not a symmetric operation. In general, AB ̸= BA, and in fact it

is often the case that the operation will only be defined in one direction. In our example BA is not

defined because the number of columns of B = (3) is not equal to the number of rows of A = (2).

For both AB and BA to be defined

[A] · [B] = [C] ,

n×k k×n n×n

and

[B] · [A] = [D].

k×n n×k k×k

(i) Even if n = k,

AB ̸= BA.

[ ] [ ] [ ] [ ]

1 2 0 −1 12 13 −3 −4

A= , B= , AB = , BA = .

3 4 6 7 24 25 27 40

72 7. Linear Algebra

[ ] [ ] [ ]

2 4 −2 4 0 0

A= , B= , AB = .

1 2 1 −2 0 0

[ ] [ ] [ ] [ ]

2 3 1 1 −2 1 5 8

C= , D= , E= , CD = CE = .

6 9 1 2 3 2 15 24

A+B = B+A

A + (B +C) = (A + B) +C

(AB)C = A(BC)

(A + B)C = AC + BC

A(B +C) = AB + AC

Check that you have a clear understanding of the restrictions needed on the number of rows and

columns of A, B and C in order for the above to work. More matrix rules, involving the transpose:

(7.7) (A′ )′ = A

(7.8) (A + B)′ = A′ + B′

(7.9) (AB)′ = B′ A′

Note the reversal of the order of the matrices in the last operation.

Definition 7.4. A set of vectors x1 , · · · , xn in Rm is linearly dependent if there exist λ1 , · · · , λn , not

all zero, such that

(7.10) λ1 x1 + · · · + λn xn = 0.

Definition 7.5. A set of vectors x1 , · · · , xn in Rm is linearly independent if it is not linearly depen-

dent.

Definition 7.6. The rank of a matrix A is the maximum number of linearly independent column

vectors of A. It is also equal to the number of linearly independent row vectors of A.

Example 7.1. Let

1 2 3

A= 0 1 0

2 4 6

7.2. Matrices 73

The first and the third columns are linearly dependent. The elements of column 3 are three

times the corresponding entry in the column 1. Now take Columns 1 and 2.

1 2 0

λ1 0 + λ2 1 = 0

2 4 0

λ1 + 2λ2 = 0

⇔ λ2 = 0

2λ1 + 4λ2 = 0

⇔ λ1 = 0, λ2 = 0

is the only solution. So the first two columns are linearly independent. We found two linearly

independent columns so the rank of matrix A is 2. We could have done the exercise taking rows

instead of columns and still got the same answer. (Please verify).

Theorem 7.1. (i) Rank of [A] 6 {# rows, # columns} = min {n, k};

n×k

{ } { }

(ii) Rank of AB 6 min Rank (A) , Rank (B) 6 Rank (A) , Rank (B) .

Definition 7.7. A square matrix [A] is called non-singular or of full rank if rank (A) = n.

n×n

Definition 7.8. A square matrix [A] is invertible if there exist [B] such that [A] · [B] = [B] · [A] =

n×n n×n n×n n×n n×n n×n

[I] . Then B is called inverse of A.

n×n

( )−1

(7.11) A−1 = A

( ′ )−1 ( )′

(7.13) A = A−1

Definition 7.9. A square matrix [A] is called orthogonal if A−1 = A′ , i.e., AA′ = I.

n×n

n×n n×n

74 7. Linear Algebra

Determinant is defined only for square matrices. The determinant is a function depending on n that

associates a scalar, det (A), to an n × n square matrix A. The determinant of an 1-by-1 matrix A is

the only entry of that matrix: det (A) = A11 . The determinant of a 2 by 2 matrix

[ ]

a b

A=

c d

Definition 7.10. The cofactor Ai j of the element ai j is defined as (−1)i+ j times the determinant of

the sub matrix obtained from A after deleting row i and column j.

Example 7.2. Let

[ ]

1 2

A =

3 4

A11 = (−1)1+1 · 4 = 4, A12 = (−1)1+2 · 3 = −3

A21 = (−1)2+1 · 2 = −2, A22 = (−1)2+2 · 1 = 1.

Definition 7.11. The determinant of an n × n matrix A is given by

n n

(7.14) det (A) = ∑ a1 j A1 j = ∑ ai1 Ai1 .

j=1 i=1

a b c

A = d e f .

g h i

Then

[ ] [ ] [ ]

e f d f d e

det (A) = a (−1)1+1 det + b (−1)1+2 det + c (−1)1+3 det

h i g i g h

= a (ei − f h) − b (di − f g) + c (dh − eg) .

(a)

( )

(7.15) det (A) = det A′

7.3. Determinant of a matrix 75

(b) Interchanging any two rows will alter the sign but not the numerical value of the determinant.

(c) Multiplication of any one row by a scalar k will change the determinant k− fold.

(e) The addition of a multiple of one row to another row will leave the determinant unchanged.

det (AB) = det (A) · det (B) .

(g) Properties (b) − (e) are valid if we replace row by columns everywhere.

[ ] [ ]

1 2 1 3 ( )

A = , det (A) = −2; A′ = det A′ = −2

3 4 2 4

[ ]

3 4

B = , det (B) = 2.

1 2

Result 7.1. Let A be an n × n upper triangular matrix, i.e., ai j = 0 whenever i > j. The determinant

of the matrix A is given by:

det A = ∏i=1 aii

n

a11 a12 · · · a1,n−1 a1n

0 a22 · · · a2,n−1 a2n

.

A= ..

.. . .

. .

..

.

..

.

0 0 · · · a n−1,n−1 a n−1,n

0 0 ··· 0 ann

(1) Base case: Let n = 1. If A is a 1 × 1 matrix, then det A = a11 = ∏1i=1 aii by the definition of a

determinant.

(2) Inductive case: Let n > 1. Assume that for any (n − 1) × (n − 1) matrix A with ai j = 0 for all

i=1 aii . Now consider any n × n matrix A with ai j = 0 for all i > j.

i > j, we have det A = ∏n−1

76 7. Linear Algebra

a11 a12 · · · a1,n−1

n+n

0 a22 · · · a2,n−1

= ann (−1) .

.. . . ..

.. . . .

0 0 · · · an−1,n−1

n−1

= ann ∏ aii

i=1

n

= ∏ aii

i=1

Result 7.2. The upper triangular square matrix A is non-singular if and only if aii ̸= 0 for each

i ∈ {1, · · · , n}.

As an ”if and only if” statement, this requires proofs in both directions.

Claim 7.1. If the upper triangular matrix A is non-singular, then aii ̸= 0 for all i = 1, . . . , n.

Proof. Let A be non-singular. Then A has an inverse, A−1 . Since 1 = det I = det [A−1 A] =

(det A−1 )(det A), we know that det A ̸= 0. If aii = 0 for any i ∈ 1, . . . , n, then by the Result

(7.1) we would have det A = 0, a contradiction. So it must be that aii ̸= 0 for all i = 1, . . . , n.

Claim 7.2. If A is upper triangular and aii ̸= 0 for all i = 1, . . . , n, then A is non-singular.

7.4. An application of matrix algebra 77

Proof. Let aii ̸= 0 for all i = 1, . . . , n. Seeking contradiction, suppose A is singular. Without loss

of generality, we can write A1 = ∑ni=2 αi Ai . Let

B = A1 − ∑ni=2 αi Ai A2 · · · An

= 0 A2 · · · An

We know, by the properties of determinants, that det B = det A. But, expanding B by the first

column, we have det B = 0. This gives det A = 0, a contradiction. So we have that A is non-

singular.

We provide an application of matrix algebra is Markov process or Markov chain. Markov processes

are used to measure movements over time. It involves use of a Markov transition matrix. Each value

in the transition matrix is probability of moving from one state to another state. It also specifies a

vector containing the initial distribution across each of these states. By repeatedly multiplying the

initial distribution vector by the transition matrix, we can estimate changes across states over time.

Consider the problem of movement of employees within a firm at different branches. In the

simple case, we take two locations, namely Ithaca and Cortland to demonstrate the basic elements

of a Markov process.

To determine the number of employees in Ithaca tomorrow, we take the probability that the

employees will stay in Ithaca branch multiplied by the total number of employees currently in

Ithaca. We add to this the number of Cortland employees transferring to Ithaca, which is equal

to total number of employees in Cortland multiplied by the probability of Cortland employees

transferring to Ithaca.

We follow the same process to determine the number of employees in Cortland tomorrow, made

up of the employees who choose to remain at Cortland and the Ithaca employees who transfer into

Cortland.

There are four probabilities involved which can be arranged in a Markov transition matrix.

78 7. Linear Algebra

Let At and Bt denote the populations of Ithaca and Cortland locations at some time t. The

transition probabilities are defined as follows.

pAA ≡ probability that a current A remains an A,

pBB ≡ probability that a current B remains a B,

pBA ≡ probability that a current B moves to A.

The distribution of employees at time t is denoted by the vector xt′ = [At Bt ] and the transition

probabilities in matrix form as

[ ]

pAA pAB

(7.16) M= .

pBA pBB

Then the distribution of employees across the two locations next period (t + 1) is xt′ · M = xt+1

′ ,

which is

[ ]

pAA pAB

[At Bt ] = [(At pAA + Bt pBA ) (At pAB + Bt pBB )] = [At+1 Bt+1 ].

pBA pBB

In the similar manner we can determine the distribution of employees after two periods.

′ ′

xt+1 · M = xt+2

[ ]

pAA pAB

[At+1 Bt+1 ] = [At+2 Bt+2 ]

pBA pBB

[ ][ ]

pAA pAB pAA pAB

[At Bt ] = [At+2 Bt+2 ]

pBA pBB pBA pBB

[ ]2

pAA pAB

[At Bt ] = [At+2 Bt+2 ]

pBA pBB

[ ]n

pAA pAB

(7.17) [At Bt ] = [At+n Bt+n ]

pBA pBB

Example 7.4.

x0′ = [A0 B0 ] = [200 200]

7.4. An application of matrix algebra 79

Let [ ] [ ]

pAA pAB 0.8 0.2

M= = .

pBA pBB 0.4 0.6

Then the distribution of employees in the next period t = 1 is

[ ]

0.8 0.2

[200 200] = [240 160] = [A1 B1 ].

0.4 0.6

[ ]2 [ ]

0.8 0.2 0.72 0.28

[200 200] = [200 200] = [256 144] = [A2 B2 ]

0.4 0.6 0.56 0.44

[ ]6 [ ]

0.8 0.2 0.668 0.332

[200 200] = [200 200] = [266.4 133.6] = [A6 B6 ]

0.4 0.6 0.664 0.336

Observe that when the transition matrix is raised to higher powers, the new transition matrix con-

verges to a matrix whose rows are identical. This is referred to as the steady state. In this example,

the steady state would be

[ ]

2 1

M= 3 3 .

2 1

3 3

Try computing this value.

7.4.1. Absorbing Markov Chains. We can extend the previous model by adding a third choice:

employees can exit the firm with

pAE ≡ probability that a current A choose to exit, E,

Let us assume that

pEA = 0, pEB = 0, pEE = 1

where pEA , pEB , and pEE are the probabilities that an employee who is currently in state E will go

to A, B or E respectively. The values assigned to pEA , pEB , and pEE mean that nobody who leaves

the firm ever returns. It is also implied by these restrictions that the firm never replaces employees

that leave. Starting at time t = 0, the Markov chain becomes,

n

pAA pAB pAE

[A0 B0 E0 ] pBA pBB pBE = [An Bn En ]

pEA pEB pEE

80 7. Linear Algebra

or n

pAA pAB pAE

[A0 B0 E0 ] pBA pBB pBE = [An Bn En ]

0 0 1

This type of Markov process is referred to as absorbing Markov chain. The values of transition

probabilities assigned in the third row are such that once an employee goes to state E, he or she

remains in that state for ever. As n goes to infinity, An , and Bn will approach zero and En will

approach the total number of employees at time zero (i.e., A0 + B0 + E0 ).

(7.18) Ax = b

where matrix A is of dimension n × k, x is a column vector k × 1 and b is column vector n × 1. This

is a system of n equations with k unknowns.

Example 7.5. The system of two linear equations,

5x + 3y = 1

6x + y = 2

can be written as [ ][ ] [ ]

5 3 x 1

=

6 1 y 2

homogeneous system.

Definition 7.12. Column vector x∗ is called a solution to the system if Ax∗ = b.

7.5. System of Linear Equations 81

Claim 7.3. A homogeneous system Ax = 0 always has a solution (Trivial x = 0). But there might

be other solutions (solution may not be unique).

Claim 7.4. For a non-homogeneous system Ax = b, a solution may not exist.

Example 7.6. Following system of two linear equations

2x + 4y = 5

x + 2y = 2

does not have a solution. Multiply second equation by 2. Then LHS of both equations become

same which leads to 5 = 4 which is a contradiction.

Example 7.7. Following system of two linear equations

2x + 4y = 2

x + 2y = 1

has many solution.

[ ]

Given [A] and {b}, the n × (k + 1) matrix [Ab ] = A1 A2 · · · Ak b is called the augmented

n×k k×1 n×(k+1)

matrix. Note Ai is the ith column of A.

[ ] [ ] [ ]

5 3 1 5 3 1

Example 7.8. Let A = , b= ⇒ Ab = .

6 1 2 6 1 2

[A] · {x} = {b}

n×k k×1 n×1

has a solution if and only if

(7.19) rank (A) = rank (Ab ) .

The solution is unique if and only if

(7.20) rank (A) = rank (Ab ) = k = # of columns of A = # of unknowns

and if det (A) ̸= 0 then the solution is characterized by

(7.21) {x∗ } = [A−1 ] · {b}

n×1 n×n n×1

2x + y = 0

2x + 2y = 0

82 7. Linear Algebra

gives us [ ] [ ] [ ]

2 1 0 2 1 0

A= , b = , Ab = .

2 2 0 2 2 0

It is easy to verify that

rank (A) = 2 = rank (Ab ) .

Hence solution exists and is unique.

Example 7.10. The system of linear equations

2x + y = 0

4x + 2y = 0

leads to [ ] [ ] [ ]

2 1 0 2 1 0

A= , b = , Ab = .

4 2 0 4 2 0

It is again easy to verify that

rank (A) = 1 = rank (Ab ) .

However

rank (A) = rank (Ab ) < k = 2.

Hence solution exists but is not unique1.

Now, we revert to the problem of computing the inverse of a non-singular matrix. We first note

the following result.

( )

Theorem 7.4. Matrix [A] is invertible⇔ det (A) ̸= 0. Also if [A] is invertible then det A−1 =

n×n n×n

1

det(A) .

A · A−1 = I

so

1 = det I = det(AA−1 ) = det(A) · det(A−1 )

using properties of determinants, noted above. Consequently det(A) ̸= 0, and det(A−1 ) = [det(A)]−1 .

Suppose, next, that A is not invertible. Then, A is singular and so one of its columns (say, A1 )

can be expressed as a linear combination of its other columns A2 , · · · , An . That is,

n

A1 = ∑ αi Ai

i=2

1A row or column vector of zeros is always linearly dependent on the other vectors.

7.5. System of Linear Equations 83

[ ]

n

Consider the matrix, B, whose first column is A − ∑ αi A and whose other columns are the same

1 i

i=2

as those of A. Then, the first column of B is zero, and so |B| = 0. By the property of determinants,

|B| = |A|, and so |A| = 0.

For a square matrix, [A] , we define the co-factor matrix of A to be the n × n matrix given by

n×n

A11 A12 ... A1n

. ..

C=

..

..

.

..

.

.

An1 An2 ... Ann

The transpose of C is called the adjoint of A, and denoted by adj A.

n n n

∑ a1 j A1 j

j=1 ∑ a1 j A2 j · · · ∑ a1 j An j

j=1 j=1 |A| 0 · · · 0

AC′ = .. ..

=

..

n . . .

n n 0 0 · · · |A|

∑ an j A1 j ∑ an j A2 j · · · ∑ an j An j

j=1 j=1 j=1

This yields the equation

If A is non-singular (that is invertible) then there is A−1 such that

Pre-multiplying (7.22) by A−1 and using (7.23),

C′ = |A| A−1

Since A is non-singular, we have |A| ̸= 0, and

C′ ad j A

(7.24) A−1 = =

|A| |A|

84 7. Linear Algebra

Thus (7.24) gives us a formula for computing the inverse of a non-singular matrix in terms of the

determinant and cofactors of A.

Recall that we wanted to calculate the (unique) solution of a system of n equations in n unknowns

given by

(7.25) Ax = c

where A is an n × n matrix, and c is a vector in Rn .

To obtain a unique solution, we saw that we must have A non-singular, which now translates to

the condition “|A| ̸= 0”. The unique solution to (7.25) is then

adj A

(7.26) x = A−1 c = c

|A|

Let us evaluate x1 , using (7.26). This can be done by finding the inner product of x with the first

unit vector, e1 = (1, 0, · · · , 0). Thus,

e1 · adj A

x1 = e1 x = c

|A|

=

|A|

c1 a12 · · · a1n

= ..

.

|A|−1

cn an2 · · · ann

This gives us an easy way to compute the solution of x1 . In general, in order to calculate xi , replace

the ith column of A by the vector c and find the determinant of this matrix. Dividing this number

by the determinant of A yields the solution xi . This rule is known as Cramer’s Rule.

Example 7.11. General Market Equilibrium with three goods

7.6. Cramer’s Rule 85

Consider a market for three goods. Demand and supply for each good are given by:

D1 =5 − 2P1 + P2 + P3

S1 = − 4 + 3P1 + 2P2

D2 =6 + 2P1 − 3P2 + P3

S2 =3 + 2P2

D3 =20 + P1 + 2P2 − 4P3

S3 =3 + P2 + 3P3

where Pi is the price of good i; i = 1; 2; 3. The equilibrium conditions are: Di = Si ; i = 1; 2; 3, that

is

5P1 + P2 − P3 = 9

−2P1 + 5P2 − P3 = 3

−P1 − P2 + 7P3 = 17

This system of linear equations can be solved at least in two ways.

9 1 −1

A1 = det 3 5 −1 = 356.

17 −1 7

5 1 −1

A = det −2 5 −1 = 178.

−1 −1 7

A1 356

P1∗ = = = 2.

A 178

Similarly P2∗ = 2 and P3∗ = 3. The vector of (P1∗ , P1∗ , P3∗ ) describes the general market equilib-

rium.

5 1 −1 P1 9

A = −2 5 −1 , P = P2 , B = 3

−1 −1 7 P3 17

34 −6 4

1

A−1 = 15 34 7

det A

7 4 27

86 7. Linear Algebra

34 −6 4 9 2

1

P = 15 34 7 · 3 = 2

178

7 4 27 17 3

Again, P1∗ = 2, P1∗ = 2, and P3∗ = 3.

n×n

Definition 7.13. A principal minor of order k (1 6 k 6 n) of [A] is the determinant of the k × k sub

n×n

matrix that remains when (n − k) rows and columns with the same indices are deleted from A.

Example 7.12. Let

1 2 3

A= 0 8 1

2 5 9

[ ] [ ] [ ]

1 2 8 1 1 3

det = 8; det = 67; det = 3.

0 8 5 9 2 9

1 2 3

det 0 8 1 = 23.

2 5 9

Definition 7.14. A leading principal minor of order k, (1 6 k 6 n) of [A] is the principal minor of

n×n

order k which has the last (n − k) rows and columns deleted.

7.8. Quadratic Form 87

[ ]

1 2

det =8

0 8

and leading principal minor of order 3 is

1 2 3

det 0 8 1 = 23.

2 5 9

A quadratic form consists of a square matrix [A] which is pre and post multiplied by a n vector. It

n×n

is a scalar.

(7.27) Q (x, A) = x′ Ax

Example 7.13. Let [ ] [ ]

a b x1

A= , x= .

c d x2

Then

[ ] [ ]

[ ] a b x1

Q (x, A) = x1 x2 · ·

c d x2

= ax12 + (b + c) x1 x2 + dx22 .

n×n

(7.28) Q (z, A) = z′ Az > 0, ∀ z ∈ Rn , z ̸= 0.

(7.29) Q (z, A) = z′ Az < 0, ∀ z ∈ Rn , z ̸= 0.

(7.30) Q (z, A) = z′ Az > 0, ∀ z ∈ Rn .

(7.31) Q (z, A) = z′ Az 6 0, ∀ z ∈ Rn .

88 7. Linear Algebra

[A] is PD if and only if all leading principal minors of A are strictly positive.

n×n

[A] is ND if and only if all leading principal minors of A have sign (−1)k .

n×n

[A] is PSD if and only if all principal minors of A are non-negative.

n×n

[A] is NSD if and only if all principal minors of A have sign (−1)k or are 0.

n×n

Example 7.14. Let [ ]

a11 a12

A= .

a21 a22

Then A is

positive definite: a11 > 0, a11 a22 − a12 a21 > 0.

negative definite: a11 < 0, a11 a22 − a12 a21 > 0.

positive semi-definite: a11 > 0, a22 > 0, a11 a22 − a12 a21 > 0.

negative semi-definite: a11 6 0, a22 6 0, a11 a22 − a12 a21 > 0.

Note that a negative definite matrix necessarily has full rank: indeed, if the zero vector can be

obtained by a linear combination of columns of A with weights α1 , · · · , αn (not all zero), then we

can define t = (α1 , · · · , αn ) to obtain t ′ At = 0.

Definition 7.15. Let A be a symmetric n × n matrix. Matrix A is diagonally dominant if for each

row i, we have |ai,i | ≥ ∑ j̸=i |ai, j |, and it is strictly diagonally dominant if the latter inequality holds

strictly for each row.

Every symmetric, diagonally dominant matrix with non-positive entries along the diagonal is

negative semi-definite; and every symmetric, strictly diagonally dominant matrix with negative

entries along the diagonal is negative definite.

Given an n × n real matrix A, an eigenvalue of A is a number λ which when subtracted from each

of the diagonal entries of A converts A into a singular matrix. Subtracting a scalar λ from each

diagonal entry of A is the same as subtracting λ times the identity matrix I from A. Hence, λ is a

eigenvalue of A if and only if A − λI is a singular matrix.

7.9. Eigenvalue and Eigenvectors 89

This is also equivalent to asking for what non-zero vectors x ∈ Rn , and for what complex

numbers λ, is it true that

(7.32) Ax = λx

This is known as the the eigenvalue problem.

eigenvector of A.

(7.33) (A − λI)x = 0

But (7.33) is a homogeneous system of n equations in n unknowns. It has a non-zero solution for x

if and only if (A − λI) is singular; that is, if and only if

(7.34) |A − λI| = 0

(7.35) f (λ) ≡ |A − λI|

we note that f is a polynomial in λ; it is called the characteristic polynomial of A.

Example 7.15. Consider the 3 × 3 matrix A given by

4 1 1

A= 1 4 1

1 1 4

Then subtracting 3 from each diagonal entries transforms A into the singular matrix

1 1 1

1 1 1 .

1 1 1

Therefore, 3 is an eigenvalue of matrix A.

Example 7.16. Consider the 2 × 2 matrix A given by

[ ]

4 0

A=

0 2

Then subtracting 4 from each diagonal entries transforms A into the singular matrix

[ ]

0 0

.

0 −2

90 7. Linear Algebra

Therefore, 4 is an eigenvalue of matrix A. Also, subtracting 2 from each diagonal entries transforms

A into the singular matrix [ ]

2 0

.

0 0

Therefore, 2 is also an eigenvalue of matrix A.

The above example illustrates a general principal about the eigenvalues of a diagonal matrix.

Theorem 7.5. The diagonal entries of a diagonal matrix A are the eigenvalues of A.

Theorem 7.6. A square matrix A is singular if and only if 0 is an eigenvalue of A.

Example 7.17. Consider the 2 × 2 matrix A given by

[ ]

4 −4

A=

−4 4

Since the first row is negative of the second row, matrix A is singular. Hence 0 is an eigenvalue of

A. Also subtracting 8 from each diagonal entries transforms A into the singular matrix

[ ]

−4 −4

.

−4 −4

Therefore, 8 is also an eigenvalue of matrix A.

Example 7.18. Consider the 2 × 2 matrix A given by

[ ]

2 1

A=

1 2

Then equation (7.34) becomes

2−λ 1

(7.36)

1 2−λ

(1 − λ)(3 − λ) = 0

Thus, the eigenvalues are λ = 1 and λ = 3. In this case it was also possible to see that λ = 1 is a

eigenvalue as subtracting 1 from the diagonal entries converts matrix A into a singular matrix.

[ ] [ ] [ ]

1 1 x1 0

=

1 1 x2 0

7.10. Eigenvalues of symmetric matrix 91

which yields

x1 + x2 = 0

Thus the general solution of the eigenvector corresponding to the eigenvalue λ = 1 is given by

(x1 , x2 ) = θ(1, −1) for θ ̸= 0

Similarly, corresponding to the eigenvalue λ = 3, we have the eigenvector given by

(x1 , x2 ) = θ(1, 1) for θ ̸= 0.

Example 7.19. A square matrix A whose entries are non-negative and whose rows (or columns)

each add to 1 is called a Markov matrix. These matrices play a major role in economic dynamics.

Consider the 2 × 2 matrix A given by

[ ]

a 1−a

A=

b 1−b

where a ≥ 0 and b ≥ 0. Then subtracting 1 from the diagonal entries leads to the matrix

[ ]

a−1 1−a

A=

b −b

Notice that each row of the matrix adds to 0. But if the rows of a square matrix add to zero {0, 0},

the columns are linearly dependent and the matrix is singular. This shows that 1 is an eigenvalue

of the Markov matrix. This same argument shows that 1 is an eigenvalue of every Markov matrix.

For the case of a symmetric matrix A, we can show that all the eigenvalues of A are real.

Theorem 7.7. Let A be a symmetric n × n matrix. Then all the eigenvalues of A are real.

Proof. Suppose λ is a complex eigenvalue, with associated complex eigenvector, x. Then we have

(7.37) Ax = λx

Define x∗ to be the complex conjugate of x, and λ∗ to be the complex conjugate of λ. Then

(7.38) Ax∗ = λ∗ x∗

Pre-multiply (7.37) by (x∗ )′ and (7.38) by x′ to get

(7.39) (x∗ )′ Ax = λ(x∗ )′ x

(7.40) x′ Ax∗ = λ∗ x′ x∗

Subtracting (7.40) from (7.39)

(7.41) (x∗ )′ Ax − x′ Ax∗ = (λ − λ∗ )x′ x∗

92 7. Linear Algebra

x′ Ax∗ = (x′ Ax∗ )′ = (x∗ )′ A′ x = (x∗ )′ Ax

since A′ = A (by symmetry). Thus (7.41) yields

(7.42) (λ − λ∗ )x′ x∗ = 0

Since x ̸= 0, we know that x′ x∗ is real and positive. Hence (7.42) implies that λ = λ∗ , so λ is

real.

n

tr (A) = ∑ aii

i=1

The following properties of the trace can be verified easily [Here A, B and C are n × n matrices,

and λ ∈ R].

generally be written as

(7.43) |A − λI| = (−λ)n + bn−1 (−λ)n−1 + .... + b1 (−λ) + b0

where b0 , ..., bn−1 are the coefficients of the polynomial which are determined by the coefficients

of the A-matrix.

On the other hand, if λ1 , ..., λn are the eigenvalues of A, then the characteristic equation (7.34)

can be written as

(7.44) 0 = (λ1 − λ)(λ2 − λ)....(λn − λ)

Using (7.34), (7.43), and (7.44) and “comparing coefficients” we can conclude that

bn−1 = λ1 + λ2 + ... + λn

7.11. Eigenvalues, Trace and Determinant of a Matrix 93

and

b0 = λ1 λ2 ...λn

Also, by looking at the terms in the characteristic polynomial of A which would involve

(−λ)n−1 , we can conclude that

bn−1 = a11 + a22 + ... + ann

Finally, putting λ = 0 in (7.43), we get

b0 = |A|

Thus we might note two interesting relationships between the characteristic values, the trace

and the determinant of A:

n

tr A = ∑ λi

i=1

and

n

|A| = ∏ λi

i=1

Theorem 7.8. Let A be a symmetric matrix. Then,

(1) A is positive definite if and only if all the eigenvalues of A are positive.

(2) A is negative definite if and only if all the eigenvalues of A are negative.

(3) A is positive semidefinite if and only if all the eigenvalues of A are non-negative.

(4) A is negative semidefinite if and only if all the eigenvalues of A are non-positive.

(5) A is indefinite if and only if A has a positive eigenvalue and a negative eigenvalue.

Chapter 8

Problem Set 3

(1) Let

[ ] 9 6 5 4

1 −1 7

A= , B = 1 −2 −3 3

0 8 10

0 1 −1 2

( ) ( )

1 1

(2) Are the vectors and linearly independent?

2 3

(3) Let

[ ] 8 4

1 6 2

A= , B = 0 −2 .

−1 5 3

7 −3

1 2 3 4

1 2 1 2

A= ?

1 3 5 7

2 1 4 1

95

96 8. Problem Set 3

3 2 1

A = 0 1 7 ?

5 4 −1

x+y+z = 6

x + 2y + 3z = 10

x + 2y + λz = µ.

For what values of λ and µ, the system of equations have

(a) no solution,

(b) a unique solution,

(c) infinitely many solutions?

(7) What is the definiteness of the following matrices? (Hint: Use the principal minors)

[ ] [ ] [ ] [ ]

2 −1 2 4 −3 4 −3 4

A= , B= C= , D= .

−1 1 4 8 4 5 4 −6

(8) Consider the situation of a mass layoff (i.e. a firm goes out of business) where 2000 people

become unemployed and now begin a job search. There are two states: employed (E) and

unemployed (U) with an initial vector

x0′ = [E U] = [0 2000].

Suppose that in any given period an unemployed person will find a job with probability 0.7

and will therefore remain unemployed with a probability 0.3. Additionally, persons who find

themselves employed in any given period may lose their job with a probability of 0.1 (and will

continue to remain employed with probability 0.9).

(i) Set up the Markov transition matrix for this problem.

(ii) What will be the number of unemployed people after (a) two periods; (b) four periods;

(c) six periods; (d) ten periods.

(iii) What is the steady-state level of unemployment?

| ×A×

Ak = A {z· · · · · · A} = O

k times

8. Problem Set 3 97

and n is odd, then A is not invertible.

(c) A n × n matrix A is called orthogonal if AA′ = I. Prove that if A is orthogonal, then

det A = ±1.

(d) Let n × n matrices A and B be such that AB = −BA. Prove that if n is an odd number, then

either A or B is not invertible.

(e) Let n × n matrices A and B be such that AB = I. Use determinants to prove that A is

invertible (and hence B = A−1 .

(10) (a) Prove that the eigenvalues of an upper or lower triangular matrix are precisely its diagonal

entries.

(b) Suppose that A is an invertible matrix. Show that (A−λI)x = 0 implies that (A−1 − λI )x = 0.

Conclude that for an invertible matrix A, λ is an eigenvalue of A if and only if λ1 is an

eigenvalue of A−1 .

(c) Let A be an invertible matrix and let x be an eigenvector of A. Show it is also an eigenvector

of A2 and A−2 . What are the corresponding eigenvalues?

Chapter 9

Calculus

9.1. Functions

Recall the definition of functions discussed earlier. Now we discuss some features of function

which are useful in optimization exercise.

Definition 9.1. A function f : D → R is called surjective (or is said to map) D onto R if f (D) = R,

i.e., if the image f (D) of the function is equal to entire range.

Definition 9.2. A function f : D → R is called injective or one to one if

(9.1) f (x) = f (y) ⇔ x = y.

Example 9.1. Consider function

f : R → R : f (x) = x2 .

It is not surjective as there exist no element in the domain which gets mapped into −1.

99

100 9. Single and Multivariable Calculus

g : R → R+ : f (x) = x2 .

Now this function is surjective as each non-negative real number has a pre-image (square root) in

R. However, this function is not injective as the pre-image of 4 is both −2 and 2.

Next let us also restrict the domain of the function to R+ . The function is

h : R+ → R+ : f (x) = x2 .

It is both surjective and injective. Hence it is bijective.

Example 9.2. Let A be a non-empty set and let S be a subset of A. We define a function χS : A →

{0, 1} by

{

1, if a ∈ S;

(9.2) χS (a) =

0, if a ∈

/ S.

probability and statistics. If S is a non-empty proper subset of A, then χS is surjective. If S = 0/ or

S = A, then χS is not surjective.

Definition 9.3. Inverse Function: Consider f : D → R. If ∃g : R → D such that ∀x ∈ D,

( )

(9.3) g f (x) = x,

then g is called the inverse function of f and is written as f −1 : R → D. Alternatively we can also

define the inverse function as under. Let f : D → R be bijective. The inverse function of f is the

function f −1 : R → D such that ∀x ∈ D,

( )

(9.4) f −1 f (x) = x.

Theorem 9.1. Let f : D → R be bijective. Then f −1 : R → D is bijective.

( )

Example 9.3. f (x) = 2x, f −1 (x) = 2x , f −1 f (x) = f (x) 2x

2 = 2 = x.

Then

[ ]

(a) If f is injective, then f −1 f (A) = A,

[ ]

(b) If f is surjective, then f f −1 (B) = B,

(c) If f is injective, then f (A1 ∩ A2 ) = f (A1 ) ∩ f (A2 ) .

9.2. Surjective and Injective Functions 101

Proof. You should try and prove (a) and (b) on your own. I will provide proof for (c) here. We

need to prove that f (A1 ∩ A2 ) ⊆ f (A1 ) ∩ f (A2 ) and f (A1 ) ∩ f (A2 ) ⊆ f (A1 ∩ A2 ).

Step 1. Show

f (A1 ∩ A2 ) ⊆ f (A1 ) ∩ f (A2 )

Let

y ∈ f (A1 ∩ A2 ) .

Then ∃x ∈ A1 ∩ A2 f (x) = y. Since x ∈ A1 ∩ A2 , x ∈ A1 and x ∈ A2 . But then f (x) ∈ f (A1 ) and

f (x) ∈ f (A2 ). So f (x) ∈ f (A1 ) ∩ f (A2 ). Observe that we have not used the fact that f is injective.

So this part of the result holds for any function.

f (A1 ) ∩ f (A2 ) ⊆ f (A1 ∩ A2 ) .

Let y ∈ f (A1 ) ∩ f (A2 ). Then y ∈ f (A1 ) and y ∈ f (A2 ). Hence there exist a point x1 ∈ A1 and a

point x2 ∈ A2 such that f (x1 ) = y and f (x2 ) = y. Or

f (x1 ) = y = f (x2 ) .

Definition 9.4.

(a) A function f is odd if and only if for every x, − f (x) = f (−x). Example: f (x) = x.

(b) A function f is even if and only if for every x, f (x) = f (−x). Example: f (x) = x2 .

(c) A function f is periodic if and only if there exists a k > 0 such that for every x, f (x + k) = f (x).

Example: f (x) = sin x, since sin (x + 2π) = sin x.

(d) A function f is increasing if and only if for every x and every y, if x ≤ y, then f (x) ≤ f (y).

Example: f (x) = x.

(e) A function f is decreasing if and only if for every x and every y, if x ≤ y, then f (x) ≥ f (y).

Example: f (x) = −x.

102 9. Single and Multivariable Calculus

Definition 9.5. Composition of Functions: If f : A → B and g : B → C are two functions, then for

( a ∈)A, f (a) ∈ B. But B is the domain of g, so mapping g can be applied to f (a), which yields

any

g f (a) , an element in C. This establishes a correspondence between a in A and c in C. This

correspondence is called the composition function of f and g and is denoted by g ◦ f (read g of f ).

Thus we have

( )

(9.5) (g ◦ f ) (a) = g f (a) .

Remark 9.1. Composition of two functions need not be commutative,

(g ◦ f ) (a) ̸= ( f ◦ g) (a)

as the following example shows.

(g ◦ f ) (x) = x2 + 1 but

( f ◦ g) (x) = (x + 1)2 .

Theorem 9.3. Let f : A → B, and g : B → C.

(a) If f and g are surjective, then g ◦ f is surjective,

(b) If f and g are injective, then g ◦ f is injective,

(c) If f and g are bijective, then g ◦ f is bijective.

Proof. (a) Since g is surjective, range of g = C. That is for any element c ∈ C, there exists an

element b ∈ B such that g (b) = c. Since f is also surjective, there exists an element a ∈ A such

that f (a) = b. But then

( )

(g ◦ f ) (a) = g f (a) = g (b) = c.

So, (g ◦ f ) is surjective.

( )

(b) Since g is injective, for all b and b′ in B, (if g)(b) = g b′ = c ∈ C then b = b′ and since f is

injective, for all a and a′ in A, if f (a) = f a′ = b ∈ B then a = a′ . Then

( )

(g ◦ f ) (a) = (g ◦ f ) a′

( ) ( ( ))

⇒ g f (a) = g f a′

( )

⇒ f (a) = f a′

⇒ a = a′

So, (g ◦ f ) is injective.

9.4. Continuous Functions 103

Definition 9.6. The real number L is the limit ofthe function

if for each ε > 0, there exists a δ > 0 such that f (x) − L < ε whenever x ∈ D and 0 < |x − c| < δ.

Definition 9.7. A function f : D → R is continuous at x0 ∈ D, if

( )

(9.6) ∀ε > 0, ∃δ > 0 d (x, x0 ) < δ ⇒ d f (x) , f (x0 ) < ε.

A function f : D → R is continuous if it is continuous at all x0 ∈ D.

It is easy to draw examples of functions which are not continuous. An intuitive way of under-

standing continuity of function is that we should be able to draw its graph without lifting pencil

from paper. If a function has a point of discontinuity say, x0 , then as we approach x0 from the left

hand side and from right hand side, the function attains different values.

For a function to be continuous at x0 , both the LHS and RHS limits must exist and converge to

the function value.

(9.7) lim f (x) = lim+ f (x) = f (x0 )

x→x0− x→x0

Theorem 9.4. A function f : D → R is continuous if and only if for every convergent sequence of

points {xn } ∈ D with limit x ∈ D, the sequence f (xn ) → f (x).

Example 9.4. If

lim f (x) = lim+ f (x) ̸= f (x0 )

x→x0− x→x0

then the function is not continuous. Take

x for 0 6 x < 21

y= 0 for x = 12

1 − x for 1 < x 6 1

2

Definition 9.8. Given f : D → R, let A ⊆ R be any subset of the range. The inverse image of A

under f , f −1 (A), is the set of points x in the domain D such that f (x) ∈ A

{ }

(9.8) f −1 (A) = x ∈ D | f (x) ∈ A .

104 9. Single and Multivariable Calculus

Theorem 9.5. A function f : D → R is continuous if and only if the inverse image of every open

set is open.

open in D (i.e., every point of f −1 (V ) is an interior point of f −1 (V )). Let p ∈ D and f (p) ∈ V .

Since V is open, there exists ε > 0 such that y ∈ V if d( f (p), y) < ε. Also since f is continuous at p,

there exists a δ > 0, such that d( f (p), y) < ε if d(p, x) < δ. Thus x ∈ f −1 (V ) as soon as d(p, x) < δ

and hence f −1 (V ) is open.

Conversely, assume that f −1 (V ) is open in D for every open set V in R. Fix p ∈ D and ε > 0,

and let V be the set of all y ∈ R such that d( f (p), y) < ε. Then V is open and hence f −1 (V ) is

open, and so there exists δ > 0 such that x ∈ f −1 (V ) as soon as d(p, x) < δ. But if x ∈ f −1 (V ), then

f (x) ∈ V , and so d( f (p), y) < ε.

Next theorem (stated without proof) considers the inverse image of the closed subsets of the

range R to characterize continuous functions.

Theorem 9.6. A function f : D → R is continuous if and only if the inverse image of every closed

set is closed.

This follows from Theorem 9.5, since a set is closed if and only if its complement is open, and

since f −1 (V c ) = [ f −1 (V )]c for every V ⊂ R.

Claim 9.1. If f and g are continuous functions then

f ±g

f( · g)

f

(if g ̸

= 0) are continuous.

g

max { f , g}

min { f , g}

Claim 9.2. If f is a continuous function of two variables f (x1 , x2 ), then the functions of one

variable obtained by holding the other variable constant f (·, x̄2 ) and f (x̄1 , ·) are also continuous.

Theorem 9.7. Intermediate Value Theorem for continuous functions: Let f be a continuous func-

tion on a domain containing [a, b], with say f (a) < f (b). Then for any y in between, f (a) < y <

f (b), there exists c in (a, b) with f (c) = y.

9.4. Continuous Functions 105

y = f (x)

f (b)

y=u

u

f (a)

a c b

We can apply the Intermediate Value Theorem to prove the existence of a fixed point for fol-

lowing function.

Theorem 9.8. Consider a continuous function f : [0, 1] → [0, 1]. Then there exists c ∈ [0, 1] such

that f (c) = c.

Proof. Define a function g(x) = f (x) − x. It is continuous since it is sum of two continuous func-

tions, f (x) and −x. If f (0) = 0, then x = 0 is a fixed point. If not, then f (0) > 0, or g(0) > 0.

If f (1) = 1, then x = 1 is a fixed point. If not, then f (1) < 1, or g(1) < 0.

Now we apply the Intermediate Value Theorem to claim that there exists a point c ∈ [0, 1] such

that g(c) = 0. This implies g(c) = f (c) − c = 0 or f (c) = c or c is a fixed point.

106 9. Single and Multivariable Calculus

Definition 9.9. The function f : D → R attains a local maximum at x0 if there exists a neighborhood

of x0 such that f (x) 6 f (x0 ) for all x in the neighborhood.

Definition 9.10. The function f : D → R attains a strict local maximum at x0 if there exists a

neighborhood of x0 such that f (x) < f (x0 ) for all x not equal to x0 in the neighborhood.

Definition 9.11. The function f : D → R attains a global maximum at x0 if f (x) 6 f (x0 ) , ∀x ∈ D.

Definition 9.12. The function f : D → R attains a strict global maximum at x0 if f (x) < f (x0 ),

∀x ∈ D\x0 .

Remark 9.2. A global maximum (minimum) is also a local maximum (minimum).

Theorem 9.9. Weierstrass Theorem: Suppose D is a non-empty closed and bounded subset of

Rn . If f : D → R is continuous on D, then there exists x∗ and x∗ in D such that

( )

(9.9) f x∗ > f (x) > f (x∗ ) , ∀ x ∈ D.

Proof. We first claim that the function f is bounded on the domain D. If not, then there exists a

sequence {xn }∞ n=1 in D such that f (xn ) → ∞ as n → ∞. Since D is compact, there exists a subse-

quence {yn }n=1 of sequence {xn }∞

∞ ∞

n=1 which converges to ȳ in D. Since {yn }n=1 is a subsequence

of sequence {xn }∞ ∞

n=1 and f (xn ) → ∞, it must be true that f (yn ) → ∞. However, {yn }n=1 converges

to ȳ and f is a continuous function, f (yn ) must converge to the finite real number f (ȳ). These two

observations lead to a contradiction. Thus we have proved the claim.

To prove the theorem, we again assume that f does not attain its maximum value in D. Since f

is bounded on D, let M be the least upper bound of the values f takes in D. Clearly M is finite. Also

there exists a sequence {zn }∞

n=1 in D such that f (zn ) → M. Note, even though f (zn ) approaches

towards the least upper bound M as n → ∞, the sequence {zn }∞ n=1 itself need not converge. Since

D is compact, there exists a subsequence {un }∞ n=1 of sequence {zn }∞

n=1 which converges to ū in

D. Since f is a continuous function, f (un ) must converge to the finite real number f (ū). Since a

convergent sequence has only one limit, f (ū) = M and ū is the point of global maximum of f in

D.

This is the theorem we will be using to show the existence of optimal bundles for consumers

and producers. So we need to understand it and be comfortable with using it.

9.6. An application of Extreme Values Theorem 107

The following examples show why the function domain must be closed and bounded in order

for the theorem to apply. In each of the following examples, the function fails to attain a maximum

on the given interval.

(a) f (x) = x defined over [0, ∞) (domain being unbounded) is not bounded from above.

x

(b) f (x) = 1+x defined over [0, ∞) (domain being unbounded) is bounded but does not attain its

least upper bound, i. e., 1.

1

(c) f (x) = x defined over (0, 1] (domain is bounded but not closed) is not bounded from above.

(d) f (x) = 1 − x defined over (0, 1] (domain is bounded but not closed) is bounded but never attains

its least upper bound, i. e., 1.

(e) Defining f (x) = 0 in the last two examples shows that both functions require continuity on

[a, b].

If we are given two norms ∥·∥a and ∥·∥b on some finite-dimensional vector space V over Rn ,

a very useful fact is that they are always within a constant factor of one another. In other words,

there exists a pair of real numbers 0 < C1 < C2 such that, for all x ∈ V , the following inequality

holds:

C1 ∥x∥b ≤ ∥x∥a ≤ C2 ∥x∥b .

Note that any finite-dimensional vector space, by definition, is spanned by a basis e1 , e2 , · · · , en

where n is the dimension of the vector space. The basis is often chosen to be orthonormal if we

have an inner product. That is, any vector x can be written

n

x = ∑ αi ei ,

i=1

Now, we can prove equivalence of norms in four steps, the last of which requires application

of the Extreme Value Theorem.

Step 1 It is sufficient to consider ∥·∥b = ∥·∥1 (Transitivity property for the norms hold.)

108 9. Single and Multivariable Calculus

n

∥x∥1 = ∑ |αi |

i=1

We have seen earlier in a problem set that it is indeed a norm. The linear independence of

any basis {ei } implies that x ̸= 0 ⇐⇒ |α j | > 0 for some j ⇐⇒ ∥·∥1 > 0. The triangle

inequality and the scaling property are obvious and follow from the usual properties of N1

norms on x ∈ Rn .

We will show that it is sufficient for us to prove that ∥·∥a is equivalent to ∥·∥1 , because

norm equivalence is transitive: if two norms are equivalent to ∥·∥1 , then they are equivalent

to each other.

In particular, suppose both ∥·∥a and ∥·∥a′ are equivalent to∥·∥1 for constants 0 < C1 ≤ C2

and 0 < C1′ ≤ C2′ , respectively:

C1 ∥x∥1 ≤ ∥x∥a ≤ C2 ∥x∥1 ,

C1′ ∥x∥1 ≤ ∥x∥a′ ≤ C2′ ∥x∥1 .

C′

Multiply the ∥x∥a inequalities by C12 to get

( ′) ( ′)

C1 C1

C1 ∥x∥1 ≤ ∥x∥a ≤ C1′ ∥x∥1 ,

C2 C2

C′

and multiply the ∥x∥a inequalities by C12 to get

( ′) ( ′)

( ′) C2 C2

C2 ∥x∥1 ≤ ∥x∥a ≤ C1′ ∥x∥1

C1 C1

and combine to get

( ) ( ′) ( ′) ( ′)

C1′ C1 ′ ′ C2 C2

C1 ∥x∥1 ≤ ∥x∥a ≤ C1 ∥x∥1 ≤ ∥x∥a′ ≤ C2 ∥x∥1 ≤ ∥x∥a ≤ C1′ ∥x∥1 .

C2 C2 C1 C1

Then it immediately follows that

C1′ C′

∥x∥a ≤ ∥x∥a′ ≤ 2 ∥x∥a ,

C2 C1

and hence ∥·∥a and ∥·∥a′ are equivalent.

We want to show that

C1 ∥x∥1 ≤ ∥x∥a ≤ C2 ∥x∥1 ,

is true for all x ∈ V for some C1 ,C2 . It is trivially true for x = 0, so we need only consider

x ̸= 0, in which case we can divide by ∥x∥1 to obtain the condition

x

C1 ≤ ≤ C2 ,

∥x∥1 a

9.6. An application of Extreme Values Theorem 109

where u ≡ x

∥x∥1 has norm ∥u∥1 = 1.

We wish to show that any norm ∥·∥a is a continuous function on V under the topology

induced by the norm ∥·∥1 . That is, we wish to show that for any ε > 0, there exists a δ > 0

such that

∥x − x′ ∥1 < δ → |∥x∥a − ∥x′ ∥a | < ε

We prove this in two steps. First, by the triangle inequality on ∥·∥a , it follows that

and

∥x′ ∥a − ∥x∥a = ∥x − (x − x′ )∥a − ∥x∥a ≤ ∥x − x′ ∥a ,

and therefore,

|∥x∥a − ∥x′ ∥a | < ∥x − x′ ∥a .

Second, applying the triangle inequality again, and writing x = ∑ni=1 αi ei and x′ = ∑ni=1 α′i ei ,

we obtain

n

∥x − x′ ∥a ≤ ∑ |αi − α′i |∥ei ∥a ≤ ∥x − x′ ∥1 (max ∥ei ∥a ).

i

i=1

Therefore, if we choose

ε

δ=

maxi ∥ei ∥a

it immediately follows that

Now we have a continuous function (the norm ∥·∥a on a compact (closed and bounded)

non-empty domain, the unit sphere, and can apply Weierstrass Theorem. By the extreme

value theorem, the function must achieve a maximum and minimum value on the set (it

cannot merely approach them). Let

∥u∥1 =1 ∥u∥1 =1

Step 2. This completes the proof.

110 9. Single and Multivariable Calculus

9.7. Differentiability

f (x0 + h) − f (x0 )

(9.10) lim exists.

h→0 h

If this limit exists, we call it derivative of f at x0 and denote it by f ′ (x0 ) or d f (x)

dx |x=x0 .

We follow the steps listed below to determine whether a derivative exists and if yes, its value.

(b) slope of the secant is h = h .

∆f

(c) If the secant h has a limit as h → 0, then f is differentiable at x0 , and the derivative is

equal to this limit.

We can see that the derivative is equal to the slope of the tangent to the graph at x0 . Note that

the tangent can be used to approximate the function in the neighborhood of x0 .

f (x0 + h) = f (x0 ) + h · f ′ (x0 ) .

It is the best linear approximation.

Definition 9.14. A function f : R → R is differentiable on a set S ⊆ R, if it is differentiable at each

point x ∈ S. It is called differentiable if it is differentiable at each point of the domain.

Example 9.5. Let f (x) : R → R be f (x) = x2 . This function is differentiable at all x ∈ R.

f (x0 + h) − f (x0 ) (x0 + h)2 − x02

αsec = =

( h ) h

x02 + h2 + 2x0 h − x02 2x0 h + h2

= =

h h

= 2x0 + h

lim αsec = 2x0 ⇒ f ′ (x0 ) = 2x0 .

h→0

Definition 9.15. Second derivative: Let function f : R → R be differentiable with f ′ (·) denoting

its first derivative. If f ′ (·) is differentiable, its derivative is denoted by f ′′ (·) and is called the

second derivative of f .

Definition 9.16. A function whose derivative exists and is continuous is called continuously dif-

ferentiable or of class C 1 . A function whose second derivative exists and is continuous is called

twice continuously differentiable or of class C 2 .

9.7. Differentiability 111

f (x0 + h) − f (x0 )

lim

h→0 h

exists and is f ′ (x0 ). Consider,

[ ]

f (x) − f (x0 )

lim [ f (x) − f (x0 )] = lim [x − x0 ] ;

x→x0 x→x0 x − x0

[ ]

f (x) − f (x0 )

= lim [x − x0 ] lim

x→x0 x→x0 x − x0

′

= 0 · f (x0 ) = 0

lim f (x) = f (x0 ) .

x→x0

Hence f is continuous at x0 .

Note this claim does not hold in the other direction. Not all continuous functions are differen-

tiable. Consider the example of absolute value function f : R −→ R is defined by

f (x) = |x| .

The absolute value or |x| of x is defined by

{

x if x > 0

|x| =

−x if x < 0.

It is easy to check that f is continuous on R. However, it is not differentiable at x0 = 0 (Please

verify).

Theorem 9.10. If f and g are differentiable functions then

(9.12) f · g is differentiable with ( f · g)′ (x) = f ′ (x) g (x) + f (x) g′ (x)

( )′

f f f ′ (x) g (x) − f (x) g′ (x)

(9.13) If g ̸= 0, then is differentiable with (x) = ( )2 .

g g g (x)

112 9. Single and Multivariable Calculus

f (x) = |x|

( )

(9.14) f ◦ g is differentiable with ( f ◦ g)′ (x) = f ′ g (x) · g′ (x)

( )

Example 9.6. Let f (y) = ln y and g (x) = x2 . Then, f ◦ g (x) = ln x2 and

1 2

( f ◦ g)′ (x) = · 2x =

x2 x

Theorem 9.12. If f is differentiable and has a local maxima or minima at x0 , then f ′ (x0 ) = 0.

Note the converse is not true. Take f (x) = x3 (See Figure 9.3). The first derivative is zero at

x0 = 0 which is a point of inflection.

( )

x sin 1

x for x ̸= 0

f (x) =

0 for x = 0.

9.7. Differentiability 113

−3 −2 −1 1 2 3

−1

−2

−3

−4

−5

−6

−7

( ) ( )( )

′ 1 1 1

f (x) = sin + x cos − 2

x x x

( ) ( )

1 1 1

= sin − cos for x ̸= 0.

x x x

At x = 0, this does not work as 1

x is not defined there. We use the definition, for h ̸= 0, the secant is

( )

( )

h −0

1

f (h) − f (0) h sin 1

= = sin .

h h h

( )

As h → 0, sin 1

h does not tend to any limit, so f ′ (0) does not exist.

114 9. Single and Multivariable Calculus

y

1

0 x

0.2 0.4 0.6 0.8

−1

( )

x2 sin 1

x for x ̸= 0

f (x) =

0 for x = 0.

( ) ( )( )

1 1 1

f ′ (x) = 2x sin + x2 cos − 2

x x x

( ) ( )

1 1

= 2x sin − cos for x ̸= 0.

x x

At x = 0, we use the definition as before, for h ̸= 0, the secant is

( )

h 2 sin 1 − 0 ( )

f (h) − f (0) h 1

= = h sin

h h h

( )

f (h) − f (0)

= h sin 1 6 |h|

h h

As h →( 0,)we see that f ′ (0) = 0. Thus f (x) is differentiable everywhere but f ′ (x) is not continuous

as cos 1x does not tend to a limit as x → 0.

9.7.2. L’Hospital’s Rule. Sometimes we need to determine the value of a function where both the

numerator and the denominator go to zero. We use L’Hospital rule in such case. If f (a) = g (a) = 0

and g′ (a) ̸= 0, then

f (x) f ′ (a)

lim = ′ .

x→a g (x) g (a)

9.8. Mean Value Theorem 115

2

x−8

.

x→4

√

f (x) = x2 − 16, g (x) = 4 x − 8

2

f (4) = g (4) = 0, f ′ (x) = 2x, g′ (x) = √ .

x

Then

x2 − 16 f ′ (4) 8

lim √ = ′ = = 8.

x→4 4 x − 8 g (4) 1

Theorem 9.13. Mean Value Theorem: Let f be a continuous function on the compact interval [a, b]

and differentiable on (a, b). Then there exists a point c ∈ (a, b) where

f (b) − f (a)

f ′ (c) = .

b−a

Following claim is helpful in proving the Mean Value Theorem. The proof the claim relies on

the Weierstrass Theorem and thus is another example of application of the Weierstrass Theorem.

Claim 9.3. Let f (·) and g(·) be continuous functions on [a, b] and differentiable on (a, b). Then

there exist x ∈ (a, b) such that

[ f (b) − f (a)]g′ (x) = [g(b) − g(a)] f ′ (x).

Proof. Define,

h(s) = [ f (b) − f (a)]g(s) − [g(b) − g(a)] f (s).

Then, it is easy to check h(a) = f (b)g(a) − f (a)g(b) = h(b). We need to show that h′ (x) = 0 for

some x ∈ (a, b). If h(x) is a constant function, then h′ (x) = 0 for every point in (a, b). If not, then

consider without loss of generality, h(x) > h(a) for some x ∈ (a, b). Since h(·) is a continuous

function defined on a compact domain [a, b], Weierstrass Theorem can be applied to claim that it

attains a maximum at some point s ∈ (a, b). Also since h(·) is differentiable on (a, b) and attains its

maximum at s ∈ (a, b), h′ (s) = 0. The case where h(x) < h(a) for some x ∈ (a, b) can be proved in

similar manner as in this case, the function h(·) will attain a minimum at some interior point.

To prove the Mean Value Theorem, we consider g(x) = x. Then, g′ (x) = 1 leads to

f (b) − f (a)

[ f (b) − f (a)](1) = [b − a] f ′ (x) or f ′ (x) = ,

b−a

for some x ∈ (a, b).

116 9. Single and Multivariable Calculus

f (b)

f (b)− f (a)

b−a

f (a)

f ′ (c)

a c b

b−a

f (x) 6 f (y) .

9.9. Monotone Functions 117

(f) sometimes we also say f is increasing at c if there exists some δ > 0 such that c − δ < x < c <

y < c + δ implies that

f (x) 6 f (c) 6 f (y) .

(g) f is decreasing at c if there exists some δ > 0 such that c − δ < x < c < y < c + δ implies that

f (x) > f (c) > f (y) .

Result 9.3. Suppose f : [a, b] → R is continuous on [a, b] and differentiable on (a, b).

(a) If f ′ (x) > 0 for all x ∈ (a, b), then f is non-decreasing on [a, b].

(b) If f ′ (x) > 0 for all x ∈ (a, b), then f is strictly increasing on [a, b].

(c) Similarly if f ′ (x) 6 0 for all x ∈ (a, b), then f is non-increasing on [a, b].

(d) If f ′ (x) < 0 for all x ∈ (a, b), then f is strictly decreasing on [a, b].

(e) If f ′ (x) = 0 for all x ∈ (a, b), then f is constant on [a, b].

{ } { }

′ > strictly increasing

(9.15) f (x0 ) 0 ⇒ f is at x0 .

< strictly decreasing

{ } { }

′ > monotone increasing

(9.16) f (x0 ) 0 ⇔ f is at x0 .

6 monotone decreasing

Theorem 9.14. [Darboux’s Theorem] Intermediate Value Theorem for derivative: If f is differ-

entiable on (a, b) then its derivative has the intermediate value property. If x1 < x2 are any two

points in the interval (a, b), and y lies between f ′ (x1 ) and f ′ (x2 ), then there exists a number x in

the interval [x1 , x2 ] such that f ′ (x) = y.

Assume y lies strictly between f ′ (x1 ), and f ′ (x2 ). Define a function g : (a, b) → R by

g(t) = f (t) − yt.

118 9. Single and Multivariable Calculus

Then g′ (x1 ) = f ′ (x1 ) − y and g′ (x2 ) = f ′ (x2 ) − y. Then either (i) g′ (x1 ) > 0 and g′ (x2 ) < 0 or (ii)

g′ (x1 ) < 0 and g′ (x2 ) > 0. Take the first case, i.e. g′ (x1 ) > 0 and g′ (x2 ) < 0. It is clear that neither

x1 nor x2 can be a point where g attains even a local maximum. Since g is a continuous function, it

must therefore attain its maximum at an interior point x of the closed and bounded interval [x1 , x2 ]

by Weierstrass Theorem. So we conclude that

0 = g′ (x) = f ′ (x) − y, or f ′ (x) = y.

We can clearly assume that y lies strictly between f ′ (x1 ) and f ′ (x2 ). Define continuous func-

tions fx1 , fx2 : [a, b] → R by

{

f ′ (x1 ) for t = x1

fx1 (t) = f (x1 )− f (t)

x1 −t for t ̸= x1 .

and {

f ′ (x2 ) for t = x2

fx2 (t) = f (t)− f (x2 )

t−x2 for t ̸= x2 .

Observe that fx1 (x1 ) = f ′ (x1 ), fx2 (x2 ) = f ′ (x2 ) and fx1 (x2 ) = fx2 (x1 ). Hence, y lies between fx1 (x1 )

and fx1 (x2 ); or y lies between fx2 (x1 ) and fx2 (x2 ). If y lies between fx1 (x1 ) and fx1 (x2 ), then (by

continuity of fx1 ) there exists s in (x1 , x2 ] with

f (s) − f (x1 )

y = fx1 (s) = .

s − x1

Then by Mean Value Theorem there exists x ∈ [x1 , s] such that

f (s) − f (x1 )

y= = f ′ (x).

s − x1

Similarly if y lies between fx2 (x1 ) and fx2 (x2 ), then (by continuity of fx2 ) there exists s in [x1 , x2 )

and x ∈ [s, x2 ] such that

f (x2 ) − f (s)

y= = f ′ (x).

x2 − s

(9.17) f (x) = f (x1 , x2 , · · · xn )

Examples of such functions are utility functions for several goods, the production functions for

many inputs etc.

9.10. Functions of Several Variables 119

Definition 9.18. The function f (x) is differentiable at the point x if there exists an n dimensional

vector D f (x), called the differential or total derivative of f at x, such that

∀ε > 0, ∃δ > 0 ∥x − y∥ < δ

⇒ f (x) − f (y) − D f (x) · (x − y) < ε · ∥x − y∥ .

9.10.1. Partial Derivative. To us the more important concept is that of partial derivative which

we define now.

Definition 9.19. Let f : D → R where D ⊆ Rn be a function of n variables. If the limit

f (x1 , · · · , xi + h, · · · xn ) − f (x1 , · · · , xi , · · · xn )

lim

h→0 h

∂ f (x)

exists, it is called the ith (first order) partial derivative of f at x and is denoted by ∂xi or fi (x).

The function f (x) is then said to be partially differentiable with respect to xi . The function

f (x) is said to be partially differentiable if it is partially differentiable with respect to every xi .

Note ∂ ∂x

f (x)

i

is the derivative of f (x1 , · · · , xn ) with respect to xi holding all other variables con-

stant. When all the partial derivatives exist, the vector of partial derivatives

[ ]

∂ f (x) ∂ f (x)

∇ f (x) = ,··· ,

∂x1 ∂xn

is called the Jacobian vector or the gradient vector. For functions of one variable, ∇ f (x) = f ′ (x).

Result 9.4. If a function is differentiable at x0 then it is partially differentiable at x0 .

However, existence of all the partial derivatives do not guarantee even the continuity of the

function as the following example shows.

Example 9.10. Let f (x, y) be defined as

{

xy

x2 +y2

if (x, y) ̸= (0, 0)

f (x, y) =

0 otherwise.

We can prove that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in R2 , although

f is not continuous at (0, 0).

If f is a real valued function defined on an open set D in Rn , and the partial derivatives are

bounded in D, then f is continuous on D.

Example 9.11. Let f : R2 → R be

f (x1 , x2 ) = x13 + 2x1 x2 + 3x23 .

120 9. Single and Multivariable Calculus

Then

∂ f (x) ∂ f (x)

= 3x12 + 2x2 , = 2x1 + 9x22

∂x1 ∂x2

[ ]

∇ f (x) = 3x12 + 2x2 , 2x1 + 9x22 , ∀x ∈ R2

For functions of one variable we have seen earlier that we could approximate the function

around a point by the tangent to the function at the point. We can do something similar in case of

functions of several variables. Instead of approximation by a line (the tangent), we now approxi-

mate by the tangent hyperplane.

Definition 9.20. Given f : D → R with gradient ∇ f (x) at x0 , the tangent hyperplane to f at x0 is

given by

f (x) = f (x0 ) + ∇ f (x0 ) · (x − x0 ) .

9.10.2. Second Order Partial Derivatives. Let us look at the example above again. For

f (x1 , x2 ) = x13 + 2x1 x2 + 3x23 ,

∂ f (x)

∂x1 = 3x12 + 2x2 and ∂∂xf (x)

2

= 2x1 + 9x22 are differentiable functions of x1 and x2 themselves. When

we take partial derivatives of these functions we get the second partial derivatives.

∂2 f (x) ∂2 f (x) ∂2 f (x) ∂2 f (x)

= 6x1 , = 18x 2 , = = 2.

∂x12 ∂x22 ∂x1 ∂x2 ∂x2 ∂x1

This example can be generalized.

Definition 9.21. Let f : Rn → R be twice differentiable. For each of the n partial derivatives, we

get n partial derivative of second order,

( )

∂ ∂ f (x) ∂2 f (x)

= = fi j (x) .

∂x j ∂xi ∂x j ∂xi

We organize the second order derivatives in a matrix, called the Hessian Matrix.

∂2 f (x) ∂2 f (x)

∂x12 · · · · · · ∂xn ∂x1

∂2 f (x)

∂x ∂x · · · · · · ···

2

H f (x) = D f (x) = .

(9.18) .. .

1 2

.. ..

.. . . .

2

∂ f (x) ∂2 f (x)

∂x1 ∂xn · · · · · · ∂x 2

n

9.10. Functions of Several Variables 121

If all the partial derivatives of the first order exist and are continuous then f is called C 1 or contin-

uously differentiable. If all the partial derivatives of second order exist and are continuous then f

is called C 2 or twice continuously differentiable and so forth.

Theorem 9.15. Young’s Theorem : If f is twice continuously differentiable then

∂2 f (x) ∂2 f (x)

= ,

∂x j ∂xi ∂xi ∂x j

i.e., the Hessian of f is a symmetric matrix.

Example 9.12. For the example above

[ ]

6x1 2

H f (x) = .

2 18x2

The off diagonal element of the Hessian are also called cross-partials. For functions of one

variable, H f (x) = f ′′ (x).

Example 9.13. Let f : R3 → R be

f (x) = 5x12 + x1 x23 − x22 x32 + x33 .

Then [ ]

∇ f (x) = 10x1 + x23 3x1 x22 − 2x2 x32 −2x22 x3 + 3x32

and

10 3x22 0

H f (x) = 3x22 6x1 x2 − 2x32 −4x2 x3

0 −4x2 x3 −2x22 + 6x3

We now provide three very useful theorems on continuous and differentiable functions on

convex sets in Rn for n ≥ 1. They are the Intermediate Value theorem, the Mean Value theorem

and Taylor’s theorem.

Theorem 9.16 (Intermediate Value Theorem:). Suppose A is a convex subset of Rn , and f : A → R

is a continuous function on A. Suppose x1 and x2 are in A, and f (x1 ) > f (x2 ). Then given any

c ∈ R such that f (x1 ) > c > f (x2 ), there is 0 < θ < 1 such that f [θx1 + (1 − θ)x2 ] = c.

Example 9.14. Suppose X ≡ [a, b] is a closed interval in R (with a < b). Suppose f is a continuous

function on X. By Weierstrass theorem, there will exist x1 and x2 in X such that f (x1 ) ≥ f (x) ≥

f (x2 ) for all x ∈ X. If f (x1 ) = f (x2 ) [this is the trivial case], then f (x) = f (x1 ) for all x ∈ X, and

so f (X) is the single point, f (x1 ). If f (x1 ) > f (x2 ), then using the fact that X is a convex set, we

can conclude from the Intermediate Value Theorem that every value between f (x1 ) and f (x2 ) is

attained by the function f at some point in X. This shows that, f (X) is itself a closed interval.

122 9. Single and Multivariable Calculus

Theorem 9.17 (Mean Value Theorem). Suppose A is an open convex subset of Rn , and f : A → R

is continuously differentiable on A. Suppose x1 and x2 are in A. Then there is 0 ≤ θ ≤ 1 such that

f (x2 ) − f (x1 ) = (x2 − x1 )∇ f (θx1 + (1 − θ)x2 )

Example 9.15. Let f : R → R be a continuously differentiable function with the property that

f ′ (x) > 0 for all x ∈ R. Then given any x1 , x2 in R, with x2 > x1 we have by the Mean-Value

Theorem (since R is open and convex), the existence of 0 ≤ θ ≤ 1, such that

f (x2 ) − f (x1 ) = (x2 − x1 ) f ′ (θx1 + (1 − θ)x2 )

Now f ′ (θx1 + (1 − θ)x2 ) > 0 by assumption, and x2 > x1 by hypothesis. So f (x2 ) > f (x1 ). This

shows that f is an increasing function on R.

Observe that a function f : R → R can be increasing without satisfying f ′ (x) > 0 at all x ∈ R.

For example, f (x) = x3 is increasing on R, but f ′ (0) = 0.

Theorem 9.18 (Taylor’s Expansion up to Second-Order). Suppose A is an open, convex subset of

Rn , and f : A → R is twice continuously differentiable on A. Suppose x1 and x2 are in A. Then

there exists 0 ≤ θ ≤ 1, such that

1

f (x2 ) − f (x1 ) = (x2 − x1 )′ ∇ f (x1 ) + (x2 − x1 )′ H f (θx1 + (1 − θ)x2 )(x2 − x1 )

2

on an open set A ⊂ Rn . Let f : B → R be a function defined on an open set B ⊂ Rm which contains

the set h(A). Then, we can define G : A → R by G(x) ≡ f [h(x)] ≡ f [h1 (x), · · · , hm (x)] for each

x ∈ A. This function is known as a composite function [of f and h].

The “Chain Rule” of differentiation provides us with a formula for finding the partial deriva-

tives of a composite function, F, in terms of the partial derivatives of the individual functions, f

and h.

Theorem 9.19 (Chain Rule of differentiation). Let h : A → Rm be a function with component

functions hi : A → R(i = 1, · · · , m) which are continuously differentiable on an open set A ⊂ Rn .

Let f : B → R be a continuously differentiable function on an open set B ⊂ Rm which contains the

set h(A). If F : A → R is defined by F(x) = f [h(x)] on A, and a ∈ A, then F is differentiable at a

and we have for i = 1, · · · , n,

m

Di F(a) = ∑ D j f (h1 (a), · · · , hm (a))Di h j (a)

j=1

9.11. Composite Functions and the Chain Rule 123

Example 9.16. Let m = 2, n = 1. Let h1 (x) = x3 on R, and h2 (x) = 10 + x on R; and let f (y1 , y2 ) =

y1 + y42 on R2 . Then

F(x) = f [h(x)] = f [h1 (x), h2 (x)] = h1 (x) + [h2 (x)]4 = x3 + (10 + x)4

is a composite function on R. If a ∈ R,

F ′ (a) = D1 F(a) = D1 f (h1 (a), h2 (a)) · D1 h1 (a) + D2 f (h1 (a), h2 (a)) · D1 h2 (a)

= 1 · (3a2 ) + 4(h2 (a))3 · 1 = 3a2 + 4(10 + a)3

Example 9.17. Take m = 1, n = 2. Let h1 (x) = h1 (x1 , x2 ) = x12 + x2 on R2 ; f (y) = 2y on R. Then

F(x) = F(x1 , x2 ) = f [h1 (x1 , x2 )] = 2[x12 + x2 ]. Then if a ∈ R2 ,

D1 F(a) = D1 f [h1 (a1 , a2 )]D1 h1 (a1 , a2 )

D2 F(a) = D1 f [h1 (a1 , a2 )]D2 h1 (a1 , a2 )

Thus, D1 F(a) = 2(2a1 ) = 4a1 ; and D2 F(a) = 2(1) = 2.

Chapter 10

Problem Set 4

[ ]1

2x + 1 2

(10.1) f (x) =

x−1

(10.2) f (x) = ln(3x2 − 5x)

(3) Let f : R → R be

{

x2 − 1, x 6 0

f (x) =

−x2 , x > 0.

and g : R → R

{

3x − 2, x 6 2

g (x) =

−x + 6, x > 2.

(a) Is f continuous at x = 0?

(b) Is g continuous at x = 2?

(4) Find

( )

f (x) exp x2 + exp (−x) − 2

(10.3) lim = lim

x→0 g (x) x→0 2x

125

126 10. Problem Set 4

f (x, y) = x2 y + y2 x − 2xy + 3x

at the point (1, 2).

{

xy

x2 +y2

if (x, y) ̸= (0, 0)

f (x, y) =

0 otherwise.

Show that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in R2 , although f

is not continuous at (0, 0).

(7) This exercise gives an example of a function with D12 f (x, y) ̸= D21 (x, y). Let f (x, y) be defined

as {

xy(x2 −y2 )

x2 +y2

if (x, y) ̸= (0, 0)

f (x, y) =

0 otherwise.

(a) We can prove that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in

(x, y) ∈ R2 and f is continuous on R2 .

(b) The partial derivatives D1 f (x, y) and D2 f (x, y) are continuous at every point in R2 .

(c) The second order cross partial derivatives D12 f (x, y) and D21 f (x, y) exist at every point in

R2 and are continuous everywhere in R2 except at (0, 0).

(d) D21 f (0, 0) = +1 and D12 f (0, 0) = −1.

Chapter 11

Convex Analysis

[ ]

(11.1) λ f (x) + (1 − λ) f (y) 6 f λx + (1 − λ) y

Function f is strictly concave if the inequality is strict for all λ ∈ (0, 1).

For functions of real variable, we say that a function is concave if and only if, informally, its

slope is weakly decreasing. If the function is differentiable then the derivative is weakly decreasing.

Theorem 11.1. Let X ⊂ R be an open interval. Then f : X → R is concave if and only if for any

a, b, c ∈ X with a < b < c

f (b) − f (a) f (c) − f (b)

≥ , and

b−a c−b

≥ .

b−a c−a

Proof. First we assume that f is concave and show that the two inequalities hold. Note b − a > 0

and c − b > 0. Hence the first inequality holds if and only if

[ f (b) − f (a)](c − b) ≥ [ f (c) − f (b)](b − a).

127

128 11. Convex Analysis

f(d)

f ′ (d)

f (d)− f (c)

d−c

f(c)

f ′ (c)

c d

Figure 11.1. A concave Function of one variable: f ′ (d) < < f ′ (c)

f (d)− f (c)

d−c

(c − b + b − a) f (b) ≥ (b − a) f (c) + (c − b) f (a)

or ( ) ( )

b−a c−b

f (b) ≥ f (c) + f (a).

c−a c−a

Observe that ( ) ( )

b−a c−b

b= c+ a,

c−a c−a

where

b−a c−b

= λ > 0, and = 1 − λ > 0.

c−a c−a

Since f is concave,

f (b) = f (cλ + a(1 − λ)) ≥ λ f (c) + (1 − λ) f (a)

11.1. Concave, Convex Functions 129

( ) ( )

b−a b−a

f (b) ≥ f (c) + 1 − f (a),

c−a c−a

with b−a

c−a ∈ (0, 1). This holds true since f is concave.

To show that if the two inequalities hold then f is concave, we can take any a < c and any

λ ∈ (0, 1), and let b = λa + (1 − λ)c so that a < b < c holds.

Theorem 11.2. Suppose A is a convex subset of Rn and f is a real-valued function on A. Then f

is a concave function if and only if the set

C ≡ {(x, α) ∈ A × R : f (x) > α}

is a convex set in Rn+1 .

( ) ( )

Proof. Let the function f be concave. Let x1 , α1 ∈ C and x2 , α2 ∈ C. Then f (x1 ) ≥ α1 and

f (x2 ) ≥ α2 . Since f is concave, and x1 , x2 ∈ A, for every λ ∈ [0, 1],

[ ] ( ) ( )

f λx1 + (1 − λ) x2 ≥ λ f x1 + (1 − λ) f x2 ≥ λα1 + (1 − λ) α2 ,

( )

which implies λx1 + (1 − λ) x2 , λα1 + (1 − λ) α2 ∈ C. Hence C is convex.

( ) ( )

Next we assume C to be convex. Note for x1 , x2 ∈ A, we have x1 , f (x1 ) ∈ C and x2 , f (x2 ) ∈

C. Since C is convex, for every λ ∈ [0, 1],

( ) ( )

λ · x1 , f (x1 ) + (1 − λ) · x2 , f (x2 ) ∈ C.

This implies

f (λx1 + (1 − λ)x2 ) ≥ λ · f (x1 ) + (1 − λ) · f (x2 ),

or f is concave.

In general, a concave function on a convex set in Rn need not be continuous as the following

example shows.

130 11. Convex Analysis

f (x)

b

f (x) is not continuous at x = 0

bC

{

1 + x for x > 0

f (x) =

0 for x = 0.

This function is concave but it is not continuous at x = 0.

However, if the set A is open and convex, then the concave function f is continuous on A.

Following theorem can be proved using Theorem 12.1 for functions of real variable.

Theorem 11.3. Let X ⊂ R be open and convex and let f : X → R be a concave function. Then f

is continuous on X.

Proof. Assume f is concave. Theorem 12.1 implies that for any a, b, c ∈ X with a < b < c the

graph of the function f lies between the graph of the line through points (a, f (a)) and (b, f (b)) and

the line through points (b, f (b)) and (c, f (c)). Thus for any x ∈ [a, b],

f (b) − f (a) f (c) − f (b)

f (b) − (b − x) ≤ f (x) ≤ f (b) − (b − x)

b−a c−b

and for any x ∈ [b, c],

f (b) − f (a) f (c) − f (b)

f (b) + (x − b) ≥ f (x) ≥ f (b) + (x − b).

b−a c−b

These two inequalities imply that f is continuous at b.

11.1. Concave, Convex Functions 131

If the function is continuously differentiable on an open convex set, then following theorem

characterizes the concave functions.

Theorem 11.4. Suppose A ⊂ Rn is an open convex set, and f : A → R is continuously differentiable

on A. Then f is concave on A if and only if

whenever x1 and x2 are in A.

Then ( )

f (x1 + λ(x2 − x1 )) − f (x1 ) ≥ λ · f (x2 ) − f (x1 ) .

Dividing both sides by λ, we get

f (x1 + λ(x2 − x1 )) − f (x1 )

≥ f (x2 ) − f (x1 ).

λ

Taking λ → 0, we get

∇ f (x1 ) · (x2 − x1 ) ≥ f (x2 ) − f (x1 ),

which proves the inequality.

Next we assume (11.2) holds true for all x2 , x1 ∈ A. Then for any λ ∈ [0, 1], let x = λx2 + (1 −

λ)x1 . Since A is convex, x ∈ A. Note

x2 − x = x2 − λx2 − (1 − λ)x1 = (1 − λ)(x2 − x1 ).

Also

x1 − x = x1 − λx2 − (1 − λ)x1 = −λ(x2 − x1 ).

Applying (11.2), we get

f (x2 ) − f (x) 6 ∇ f (x) · (x2 − x) = ∇ f (x) · (1 − λ)(x2 − x1 ),

and

f (x1 ) − f (x) 6 ∇ f (x) · (x1 − x) = ∇ f (x) · (−λ)(x2 − x1 ).

We multiply the first inequality by λ and the second inequality by 1 − λ and add to obtain

λ · f (x2 ) + (1 − λ) · f (x1 ) − f (x) 6 0,

132 11. Convex Analysis

which implies

λ · f (x2 ) + (1 − λ) · f (x1 ) 6 f (x) = f (λx2 + (1 − λ)x1 ).

So f is concave.

Also the function will be strictly concave if we change the weak inequality to strict inequality.

Theorem 11.5. Suppose A ⊂ Rn is an open convex set, and f : A → R is continuously differentiable

on A. Then f is strictly concave on A if and only if

whenever x1 and x2 are in A.

Now we consider twice continuously differentiable functions. Following two theorems char-

acterize concave and strictly concave functions.

Theorem 11.6. Suppose A ⊂ Rn is an open convex set, and f : A → R is twice continuously

differentiable on A. Then f is concave on A if and only if H f (x) is negative semi-definite for all

x ∈ A.

If H f (x) is negative definite whenever x ∈ A, then the function is strictly concave, but the

converse is not true.

Theorem 11.7. Suppose A ⊂ Rn is an open convex set, and f : A → R is twice continuously

differentiable on A. If H f (x) is negative definite for all x ∈ A then f is strictly concave on A.

Following example shows that the converse implication does not hold.

Example 11.2. Let f : R → R be defined by f (x) = −x4 for all x ∈ R (See Figure 2). This is a

twice continuously differentiable function on the open, convex set R. We can verify that f is strictly

concave on R, but since f ′′ (x) = −12x2 , f ′′ (0) = 0. This shows that the converse implication is not

valid.

Claim 11.1. If f : A → R is a function of one variable and is twice continuously differentiable then

∀ x ∈ A, f ′′ (x) 6 0 ⇔ f is concave.

Definition 11.2. Function f : A → R is convex if ∀x, y ∈ A, ∀λ ∈ [0, 1],

[ ]

(11.3) λ f (x) + (1 − λ) f (y) > f λx + (1 − λ) y

Function f is strictly convex if the inequality is strict for all λ ∈ (0, 1).

11.1. Concave, Convex Functions 133

−3 −2 −1 1 2 3

−1

−2

−3

−4

−5

−6

−7

Claim 11.2. If f : A → R is a function of one variable and is twice continuously differentiable then

Note that a local maxima (minima) of a concave (convex) function is a global maxima (minima)

as well.

Theorem 11.8. Let f : A → R (where A ⊆ Rn is open and convex) be twice continuously differen-

tiable. Then,

f is convex if and only if H f (x) is PSD ∀x ∈ A.

H f (x) is ND ∀x ∈ A ⇒ f is strictly concave.

H f (x) is PD ∀x ∈ A ⇒ f is strictly convex.

134 11. Convex Analysis

−3 −2 −1 1 2 3

f is convex if and only if f ′′ (x) > 0 ∀x ∈ A.

f ′′ (x) < 0 ∀x ∈ A ⇒ f is strictly concave.

f ′′ (x) > 0 ∀x ∈ A ⇒ f is strictly convex.

Example 11.3. The implication

f is strictly convex ⇒ f ′′ (x) > 0, ∀x ∈ A

does not hold.

Take f (x) = x4 , f ′′ (x) = 12x2 . It is strictly convex everywhere but f ′′ (0) = 0. We would need

f ′′ (x) > 0, ∀x ∈ A for the Hessian to be PD.

Proposition 3.

11.2. Quasi-concave Functions 135

( )

(b) If f (x) is concave (convex) and F (u) concave(convex) and increasing then U (x) = F f (x)

is concave(convex).

Example 11.4.

(3.) 1

x is strictly convex on R++ and strictly concave on R−− .

for α even integer and strictly concave for α odd integer.

( ) { }

f λx + (1 − λ) y > min f (x) , f (y) .

{ }

Theorem 11.9. Function f : A{→ R is quasi-concave

} if and only if ∀a ∈ R, the set f + = x ∈ A | f (x) > a

a

is a convex set. The set fa+ = x ∈ A | f (x) > a is called upper contour set.

Definition 11.4. Function f : A → R is quasi-convex if function − f is quasi-concave.

{ }

Theorem 11.10. Function f : A { → R is quasi-convex

} if and only if ∀a ∈ R, the set fa− = x ∈ A | f (x) 6 a

is a convex set. The set fa− = x ∈ A | f (x) 6 a is called lower contour set.

Theorem 11.11.

f : A → R concave ⇒ f is quasi-concave,

f : A → R convex ⇒ f is quasi-convex.

Note that for functions of one variable, any monotone function is quasi-concave. This however

does NOT apply for functions of more than one variable. Also all quasi-concave functions need

not be concave. Take f (x) = x2 it is monotone increasing, hence quasi-concave. But it is not

136 11. Convex Analysis

concave, rather it is convex. For functions of one variable, following theorem characterizes the

quasi-concave functions.

Theorem 11.12. A function f of a single variable is quasiconcave if and only if either (a) it is

non-decreasing, (b) it is non-increasing, or (c) there exists x∗ such that f is non-decreasing for

x < x∗ and non-increasing for x > x∗ .

matrix.

Definition 11.5. Bordered Hessian Let f be a C 2 function.

∂ f (x) ∂ f (x)

0 ∂x1 ··· ··· ∂xn

∂ f (x)

∂2 f (x)

··· ··· ∂2 f (x)

∂x1 ∂x12 ∂xn ∂x1

B (x) =

···

∂2 f (x)

∂x1 ∂x2 ··· ··· ··· .

. .. .. .. ..

.. . . . .

∂ f (x) ∂2 f (x) ∂2 f (x)

∂xn ∂x1 ∂xn ··· ··· ∂xn2

Let Br (x) denote the sub matrix of the first (r + 1) rows and columns of Br (x), i.e., Br (x) is

(r + 1) × (r + 1) matrix.

( )

Condition 1. A necessary condition for f to be quasiconcave is that (−1)r det Br (x) > 0, ∀r =

1, 2, · · · , n; ∀x ∈ D.

( )

Condition 2. A sufficient condition for f to be quasiconcave is that (−1)r det Br (x) > 0, ∀r =

1, 2, · · · , n; ∀x ∈ D.

When we check for quasi-concavity, we have to check the sufficient conditions. We need

[ ]

0 f1

det < 0,

f1 f11

0 f1 f2

det f1 f11 f12 > 0, etc.

f2 f21 f22

Remark 11.1. When we have to check whether a function is quasi-concave, start out checking

whether it is concave because it is easier to check for concavity and concavity implies quasi-

concavity.

Remark 11.2. Quasi-concavity is preserved under monotone transformation whereas concavity

need not be preserved.

11.2. Quasi-concave Functions 137

√

Example 11.5. Let f (x, y) = xy for (x, y) ∈ R2++ .

Then √ √

y

− 1

4√ x3

1

4

1

xy

H f (x) = 1 1 √ .

4 xy − 14 x

y3

The principal minors of order one are negative and of order two is zero. Hence f (x) is concave and

so quasi-concave.

( )4

Let us take a monotone transformation g (x, y) = f (x, y) = x2 y2 , for (x, y) ∈ R2++ .

0 2xy2 2x2 y

B (x, y) = 2xy2 2y2 4xy

2x2 y 4xy 2x2

[ ]

( ) 0 2xy2

det B1 (x, y) = = −4x2 y4 < 0

2xy2 2y2

( )

⇒ (−1)1 det B1 (x, y) > 0, ∀ (x, y) ∈ R2++ .

( ) ( ) ( )

det B2 (x, y) = −2xy2 4x3 y2 − 8x3 y2 + 2x2 y 8x2 y3 − 4x2 y3

= 8x4 y4 + 8x4 y4 = 16x4 y4 > 0, (x, y) ∈ R2++

⇒ g(x, y) is quasi-concave.

Note however, g (x, y) is not concave.

[ ]

2y2 4xy

Hg (x, y) =

4xy 2x2

Principal minors of order one are strictly positive and of order two is −12x2 y2 which is strictly

negative. Thus g (x, y) is not concave.

Chapter 12

Problem Set 5

(1) Prove or give a counterexample: The sum of two concave functions is concave.

Suppose A and B are convex sets in Rn .

(a) Set A ∪ B is convex set in Rn .

(b) Set A ∩ B is convex set in Rn .

(c) Define the set

C = {x + y : x ∈ A and y ∈ B}

Set C a convex set in Rn .

(3) Suppose f : [0, 1] → R+ and g : [0, 1] → R+ are increasing, convex functions on [0, 1]. Define

the function h : [0, 1] → R+ by:

(a) f (x) = 3x + 4;

(b) g(x, y) = yex , y > 0;

(c) h(x, y) = −x2 y3 .

(5) Show using an example that the sum of two quasi-concave functions need not be quasi-concave

(in general).

139

140 12. Problem Set 5

(i) f (x, y, z) = 8x3 + 2xy2 − z3

(ii) g(x, y) = x + y − ex − ex+y

Write out the gradient vector and the Hessian matrices ∇ f (x, y, z) and H f (x, y, z) and ∇g(x, y)

and Hg (x, y). State if f concave, quasi-concave, quasi-convex? What about function g?

Theorem 12.1. Let X ⊂ R be an open interval. Then f : X → R is concave if and only if for

any a, b, c ∈ X with a < b < c

f (b) − f (a) f (c) − f (b)

≥ , and

b−a c−b

f (b) − f (a) f (c) − f (a)

≥ .

b−a c−a

Theorem 12.2. Let X ⊂ R be open and convex and let f : X → R be a concave function. Then

f is continuous on X.

Chapter 13

Function Theorems

defined by f (x) = 4x. It is one - to - one on R; and also we can define a function g : R → R by

g(y) = 4y . The function g(y) satisfies the property g[ f (x)] = x and is called the inverse function of

f on R. Furthermore g′ [ f (x)] = f ′1(x) for all x ∈ R.

This idea can be extended to the domains of the function, A, being subsets of Rn , with the

function f defined from A to R. Then f is one-to-one on A if for all x1 , x2 ∈ A, x1 ̸= x2 , we have

f (x1 ) ̸= f (x2 ). In this case, if there is a function g, from f (A) to A, such that g[ f (x)] = x for each

x ∈ A, then g is called the inverse function of f on f (A).

Let a ∈ A, and suppose that f ′ (a) ̸= 0. If f ′ (a) > 0, then there is an open interval B(a, r) such that

f ′ (x) > 0 for all x in B(a, r), and f is increasing on B(a, r). Thus, for every z ∈ f [B(a, r)], there

is a unique x in B(a, r) such that f (x) = y. Or there is a unique function h : f [B(a, r)] → B(a, r)

such that h[ f (x)] = x for all x ∈ B(a, r). Thus, h is an inverse function of f on f [B(a, r)]. In other

word, h is the inverse of f “locally” around the point f (a). We have not guaranteed that the inverse

function is defined on the entire set f (A). Similarly, if f ′ (a) < 0, an inverse function could be

141

142 13. Inverse and Implicit Function Theorems

defined “locally” around f (a). The important restriction to carry out the kind of analysis noted

above is that f ′ (a) ̸= 0.

is continuously differentiable on R, but f ′ (0) = 0. Now, we cannot define a unique inverse function

of f even “locally” around f (0). If we choose any open ball B(0, r), and consider any point y ̸= 0

in the set f [B(0, r)], then there will be two values x, x′ in B(0, r), x ̸= x′ , such that f (x) = y = f (x′ ).

We note here that f ′ (a) ̸= 0 is not a necessary condition to get a unique inverse function of f .

For example if f : R → R is defined by f (x) = x3 , then we have f to be continuously differentiable

on R, with f ′ (0) = 0. However f is an increasing function, and clearly has a unique inverse function

g(y) = y1/3 on R, and hence locally around f (0).

Following theorem deals with the existence and properties of inverse functions.

Theorem 13.1 (Inverse Function Theorem). Let A be an open set of Rn , and f : A → Rn be con-

tinuously differentiable on A. Suppose a ∈ A and the Jacobian of f at a is non-zero. Then there is

an open set X ⊂ A containing a, and an open set Z ⊂ Rn containing f (a), and a unique function

h : Z → X, such that:

(i) f (X) = Z;

(ii) f is one-to-one on X;

Following example shows that continuity of f ′ is needed in the inverse function theorem, even

in the case n = 1.

Example 13.1. Let ( )

1

f (t) = t + 2t 2 sin for t ̸= 0 and f (0) = 0,

t

then f ′ (0) = 1, f ′ is bounded in (−1, 1), but f is not one-to-one in any neighborhood of 0.

For the system of simultaneous linear equations Ax = b, we have seen earlier, that there exists a

unique solution for every choice of right hand side column vector b, if and only if the rank of A

13.2. The Linear Implicit Function Theorem 143

is equal to the number of rows of A which is equal to the number of columns of the matrix A.

In economic models, the vector b represents some externally determined (exogenous) parameters

while the linear equations constitute some equilibrium conditions which determine the vector x

which is the set of internal (endogenous) variables.

In this sense it is possible to divide the set of variables in two disjoint subsets of endogenous and

exogenous variables. Thus a general linear economic model will have m equations in n unknowns:

a11 x1 + a12 x2 + · · · + a1n xn = b1

··· ··· ··· ··· ···

am1 x1 + am2 x2 + · · · + amn xn = bm

In general it will be possible to divide the set of variables into endogenous variables and exoge-

nous variables. Such a division will be useful only if after substituting the values of the exogenous

variables in the m equations, it is possible to obtain a solution of the system for the remaining en-

dogenous variables. For this two conditions must hold. The number of endogenous variables must

be equal to the number of equations m and the square matrix corresponding to the endogenous

variables must have maximal rank m.

A formal statement of the above observation is known as the linear version of Implicit Function

Theorem.

equations (22.2) into endogenous and exogenous variables respectively. Then there exists, for

every choice of the exogenous variables, x̄ j+1 , · · · , x̄n , a unique set of the values, x̄1 , · · · , x̄ j , if and

only if

a11 a12 . . . a1 j

a21 a22 . . . a2 j

(13.1)

[A] = . ..

.. ..

j× j .. . . .

a j1 a j2 . . . a jk

Exercise 13.1.

144 13. Inverse and Implicit Function Theorems

x+ 2y+ z− w = 1

3x− y− 4z+ 2w = 3

0x+ y+ z+ w = 0

Determine how many variables can be endogenous at any one time and show a partition of the vari-

ables into endogenous and exogenous variables such that the system of equations have a solution.

Find an explicit formula for the endogenous variables in terms of the exiguous variables.

Exercise 13.2.

−x+ 3y− z+ w = 0

4x− y+ 2z+ w = 3

7x+ y+ z+ 3w = 6

Is it possible to partition the variables into endogenous and exogenous variables such that the

system of equations have a unique solution.

y2 − 6xy + 5x2 = 0.

Given any value of x, we can solve this equation for y. For example if x = 0, then y = 0; if x = 1

the equation takes the form y2 − 6y + 5 = 0 and yields y = 1 or y = 5 as solution. Observe that it

is possible to solve y explicitly in terms of x (it turns out to be a correspondence) by applying the

quadratic formula:

√

6x ± 36x2 − 20x2

y=

2

or y = 5x or y = x.

It is possible to apply quadratic formula to the implicit function xy2 − 3y − 2 exp x = 0 to obtain

an explicit function for y as

√

3 ± 9 + 8 exp x

y= .

2x

However it could turn out to be the case that the explicit functions more difficult to work with than

the original implicit function.

13.3. Implicit Function Theorem for R2 145

y5 − 5xy + 4x2 = 0

then it is not possible to solve it in explicit form as there is no general formula for solving a quintic

equation. Note however that the equation still defines y as an implicit function of x. For x = 0, we

get y = 0, for x = 1 we get y = 1 and so on.

Example 13.2. A profit maximizing firm uses single input x (with unit cost w per unit) to produce

an output y using production function y = f (x). Let the price of the output be p per unit. Then the

profit function for this firm given p and w is

Π(x) = p · f (x) − w · x.

To obtain the optimal input x which maximizes the profit , we take the first order condition, which

is

p · f ′ (x) − w = 0.

We can treat p and w as exogenous variables and then this equation defines x as a function of p and

w. The equation need not yield x as an explicit function of p and w. However, it does define x as an

implicit function of p and w and we can use it to estimate the change in x in response to changes in

p and w.

y = G(x1 , · · · , xn ).

In this the endogenous variable y is an explicit function of the exogenous variables (x1 , · · · , xn ).

Such an ideal situation need not occur in every case. More frequently we come across functions of

the form

(13.2) F(x1 , · · · , xn ; y) = 0.

If the function G determines value of y for each set of values (x1 , · · · , xn ), then we say that Eq. (13.2)

defines the endogenous variable y as an implicit function of the exogenous variables (x1 , · · · , xn ).

We consider implicit functions in R2 of the form F(x, y) = c and analyze following question.

For a given implicit function F(x, y) = c and a specified solution (x0 , y0 ),

(a) Does F(x, y) = c determine y as a continuous function of x for points (x, y) such that x is near

x0 and y is near y0 ?

146 13. Inverse and Implicit Function Theorems

(a) Given the implicit function F(x, y) = c, determine a point (x0 , y0 ) such that F(x0 , y0 ) = c, and

also does there exist a continuous function y = f (x) defined on an interval I around x0 so that:

(1) F(x, f(x))=c for all x ∈ I and

(2) y0 = f (x0 )?

Theorem 13.3. Let F(x, y) = c be a continuously differentiable function on an open ball around

(x0 , y0 ) in R2 . Suppose F(x0 , y0 ) = c, and consider the expression

F(x, y) = c.

∂F(x,y)

If ̸= 0, then there exists a continuously differentiable function y = f (x) defined on an

∂y (x0 ,y0 )

open interval I around x0 such that:

(c)

∂F(x,y)

′ ∂x (x ,y )

f (x0 ) = − ∂F(x,y) 0 0 .

∂y (x0 ,y0 )

F(x, y) = x2 + y2 − 1 = 0

(the graph of this function is a circle with radius r = 1. If we choose (a, b) with

F(a, b) = a2 + b2 − 1 = 0, and a ̸= ±1,

then there are open intervals I ⊂ R containing a, and Y ⊂ R containing b, such that if x ∈ I, there

is a unique y ∈ Y with

F(x, y) − 1 = 0.

Thus, we can define a unique function

f : I → Y such that F(x, f (x)) = 0

√

for all x ∈ I. If a > 0 and b > 0, then f (x) = 1 − x2 on I. We say such a function is defined implic-

itly by the equation F(x, y) = 0, with y = f (x). Note that if a = 1, and b = 0, so that D2 F(a, b) = 0,

we cannot find such a unique function, f .

Chapter 14

Homogeneous and

Homothetic Functions

Most of us have come across homogeneous functions in the elementary algebra courses. For exam-

ple f (x) = ax is homogeneous of degree 1, f (x) = axm is homogeneous of degree m, f (x) = ax + 1

is not a homogeneous function, and so on. First we define the homogeneous function formally.

Definition 14.1. For any scalar k, a real valued function f (x1 , · · · , xn ) is homogeneous of degree k

on Rn+ if for all x ∈ Rn+ , and all t > 0,

(a) Consider f : R2+ → R given by f (x1 , x2 ) = x12 x23 . Then if t > 0, we have f (tx1 , tx2 ) = (tx1 )2 (tx2 )3 =

t 2+3 x12 x23 = t 5 f (x1 , x2 ). So, f is homogeneous of degree 5.

(b) The function f (x1 , x2 ) = x1a x2b is homogeneous of degree a + b. This function is an example of

returns to scale based on the value of a and b which we assume to be non-negative. If a + b = 1,

the function displays constant returns to scale. If a + b > 1, the function displays increasing

returns to scale. If a + b < 1, the function displays decreasing returns to scale.

147

148 14. Homogeneous and Homothetic Functions

(c) The Cobb-Douglas function f (x1 , x2 , · · · , xn ) = x1a1 x2a2 · · · xnan is homogeneous of degree a1 +

a2 + · · · + an .

( )q

(d) The constant elasticity of substitution function f (x1 , x2 ) = A a1 x1p + a2 x2p p is homogeneous

of degree q.

√

(e) The function f (x1 , x2 ) = x13 + x23 is homogeneous of degree 32 .

(f) The function f : R2+ → R given by f (x1 , x2 ) = x12 x2 + 3x1 x22 + x23 is homogeneous of degree 3,

since each term is homogeneous of degree 3.

(i) In consumer theory, the demand function is homogeneous function of degree zero.

(j) The only homogeneous of degree k function of one variable is f (x) = axk for some constant a.

(k) The only homogeneous of degree zero function of one variable is the constant function f (x) = a

for some constant a.

(l) There exist non-constant homogeneous of degree zero functions. Consider for example f (x, y) =

y , y ̸= 0.

x

(m) If functions f and g are homogeneous of degree k, then the sum function f + g is also homo-

geneous of degree k.

(n) The function f : R2+ → R given by f (x1 , x2 ) = 3x12 x23 − 6x15 x22 is not homogeneous since the first

term is homogeneous of degree 5 but the second term is homogeneous of degree 7.

Let us look at the function f (x1 , x2 ) = x1a x2b again. We can calculate the partial derivatives of f

on R2++ . Thus,

∂ f (x1 , x2 ) ∂ f (x1 , x2 )

= ax1a−1 x2b ; = bx1a , x2b−1 .

∂x1 ∂x2

Now, if t > 0, then

∂ f (tx1 ,tx2 ) ∂ f (x1 , x2 )

= a(tx1 )a−1 (tx2 )b = t a+b−1 ax1a−1 x2b = t a+b−1 .

∂x1 ∂x1

14.1. Homogeneous Functions 149

So ∂ f (x∂x11,x2 ) is homogeneous of degree (a + b − 1). Similarly, one can check that ∂ f (x∂x12,x2 ) is homo-

geneous of degree (a + b − 1). More generally, whenever a function, f , is homogeneous of degree

k, its partial derivatives are homogeneous of degree (k − 1).

Rn++ . Then for each i = 1, · · · , n, ∂ f (x1∂x,···i ,xn ) is homogeneous of degree (k − 1) on Rn++ .

(14.1) f (tx1 , · · · ,txn ) = t k f (x1 , · · · , xn )

We can consider f (tx) to be a function of n + 1 variables, t, x1 , · · · , xn . We will show this result for

the partial derivative with respect to x1 . In this case the remaining variables t, x2 , · · · , xn are held

as constant. Applying the Chain Rule, we have for i = 1, the partial derivative of the expression on

the left hand side of (14.1) is

∂ f (tx1 , · · · ,txn ) ∂tx1

(14.2) · = D1 f (tx1 , · · · ,txn ) · t

∂tx1 ∂x1

The partial derivative of the function on the right hand side of (14.1) is t k ∂ f (x∂x

1 ,··· ,xn )

1

. Equality of the

two expressions lead to

∂ f (x1 , · · · , xn )

(14.3) D1 f (tx1 , · · · ,txn ) · t = t k

∂x1

Dividing by t, we get,

∂ f (x1 , · · · , xn )

(14.4) D1 f (tx1 , · · · ,txn ) = t k−1

∂x1

Thus the partial derivatives are homogeneous functions of degree k − 1.

x1 D1 f (x1 , x2 ) + x2 D2 (x1 , x2 ) = ax1a x2b + bx1a x2b = (a + b)x1a x2b = (a + b) f (x1 , x2 ).

More generally, when a function, f , is homogeneous of degree k, then x ∇ f (x) = k f (x), a result

known as Euler’s theorem.

Theorem 14.2 (Euler’s Theorem). Suppose f : Rn+ → R is homogeneous of degree k on Rn+ and

continuously differentiable on Rn++ . Then,

∂ f (x1 , · · · , xn ) ∂ f (x1 , · · · , xn )

x1 · + · · · + xn · = k f (x)

∂x1 ∂xn

x · ∇ f (x) = k f (x) for all x ∈ Rn++

150 14. Homogeneous and Homothetic Functions

f (tx) = t k f (x1 , · · · , xn )

Then, applying the Chain Rule, we have

d f (tx) ∂ f (tx) ∂ f (tx)

(14.5) = · x1 + · · · + · xn

dt ∂x1 ∂xn

But since f is homogeneous of degree k, we have

f (tx) = t k f (x1 , · · · , xn )

and,

d f (tx)

(14.6) = kt k−1 f (x1 , · · · , xn )

dt

Take t = 1 to complete the proof.

Theorem 14.3 (Euler’s Theorem). Suppose f : Rn+ → R is continuous function on Rn+ and contin-

uously differentiable on Rn++ . Also suppose,

∂ f (x1 , · · · , xn ) ∂ f (x1 , · · · , xn )

x1 · + · · · + xn · = k f (x)

∂x1 ∂xn

for all x ∈ Rn++ . Then, f is homogeneous of degree k.

A useful geometric property of the homogeneous function is as follows. Let f (x) be a homo-

geneous function of degree one and consider the level set f (x) = 1. In the producers’ theory, the

function f could be a constant returns to scale production function and the level sets would then be

the iso - quants. Let x be a point on the iso-quant f (x) = 1. If we translate the point x by a factor r

along the ray joining point x and the origin, we obtain a point on the iso - quant f (z) = r.

Similarly if the function f is homogeneous of degree k, then translation of points on iso - quant

q = 1 by a factor r along the ray joining point x and the origin would generate the iso - quant q = rk ,

since f (rx) = rk f (x) = rk as f (x) = 1. Thus the level sets of a homogeneous function are radial

expansions and contractions of each other. This observation leads to following consequence.

tiable on Rn++ . Then, the tangent planes of the level sets of f have constant slope along each ray

from the origin.

14.2. Homothetic Functions 151

of a homogeneous function.

such that f (x) = g(h(x)) holds for all x in the domain, then f is a homothetic function.

degree 2 and g(z) = z3 + z is a monotone transformation of z.

Theorem 14.5. Suppose f : Rn+ → R be a strictly monotonic function. Then, f is homothetic if

and only if for all x and y in Rn+ ,

f (x) ≥ f (y) ⇔ f (θx) ≥ f (θy) for all θ > 0.

the partial derivatives.

Theorem 14.6. Suppose f : Rn+ → R is continuously differentiable on Rn++ . If f is homothetic

then, the tangent planes to the level sets of f are constant along rays from the origin; i. e., in other

words, for every i and j and for every x in Rn++

∂ f (tx) ∂ f (x)

∂xi ∂xi

(14.7) ∂ f (tx)

= ∂ f (x)

for all t > 0.

∂x j ∂x j

The converse of this theorem is also true and is stated here for the sake of completeness.

Theorem 14.7. Suppose f : Rn+ → R is continuously differentiable on Rn++ . If (14.7) holds for all

x in Rn++ , for every i and j and for all t > 0, then f is homothetic.

Chapter 15

Separating Hyperplane

Theorem

[p ≥ a]

denote the set

{x ∈ Rn : p · x ≥ a} ,

where p · x is the Euclidean inner product

n

∑ pi xi .

i=1

A hyperplane in Rn is a set of the form [p = a], where p ̸= 0. We could visualize the vector p ∈ Rn

as as a vector normal (orthogonal) to the hyperplane at each point. The hyperplane does not change

when we multiply both the vector p and real number a by non-zero scalar α.

Example 15.1. Consider p = (1, 2) ∈ R2 and a = 4. Then the hyperplane [p = a] is the set of

points in R2 on the line x1 + 2x2 = 4 which is a straight line. For p = (1, 2, 3) ∈ R3 and a = 6, the

hyperplane [p = a] is the set of points in R3 on the plane x1 + 2x2 + 3x3 = 6. For p = (4) ∈ R1 and

a = 8, the hyperplane [p = a] is the set of singleton point in R1 4x1 = 8 or x1 = 2.

point.

153

154 15. Separating Hyperplane Theorem

A weak half space or closed half space is a set of the form [p ≥ α] or [p ≤ α]. A strict half

space or open half space is a set of the form [p > α] or [p < α]. We say that a non-zero p, or the

hyperplane [p = α] separates A and B if either

A ⊂ [p ≥ α], and B ⊂ [p ≤ α],

or

B ⊂ [p ≥ α], and A ⊂ [p ≤ α]

holds. We will write p · A ≥ p · B to mean p · x ≥ p · y for all x ∈ A and y ∈ B.

strongly separates A and B if A and B are in disjoint closed half spaces, i.e., there exists some ε > 0,

such that either

A ⊂ [p ≥ α + ε], and B ⊂ [p ≤ α],

or

B ⊂ [p ≥ α + ε], and A ⊂ [p ≤ α]

holds. An equivalent way to state the strong separation is that

inf p · x > sup p · y, or inf p · y > sup p · x.

x∈A y∈B y∈B x∈A

We state and prove one of the versions of the separating hyperplane theorems.

Theorem 15.1. Let A and B be disjoint non-empty convex subset of Rn . Let A be compact and B

be close sets. Then there exists a non-zero p ∈ Rn that strongly separates A and B.

{ }

f (x) = inf d(x, y) : y ∈ B ,

which is the distance from x ∈ A to set B. We claim that the function f (x) is continuous. Observe

that for any x, x′ ∈ A and y ∈ B, the distance function satisfies the triangle inequality,

( ) ( )

d(x, y) ≤ d x, x′ + d x′ , y ,

and ( )

d(x′ , y) ≤ d x, x′ + d (x, y) .

Thus ( ) ( ) ( )

−d x, x′ ≤ d(x, y) − d x′ , y ≤ d x, x′ ,

( )

where the first inequality is obtained from the triangle inequality for d x′ , y , and the second in-

equality follows from the triangle inequality for d (x, y). Together they imply

( ) ( )

|d(x, y) − d x′ , y | ≤ d x, x′ .

15.2. Separating Hyperplane Theorem 155

Further

( ) ( )

f (x) ≤ d(x, y) ≤ d x, x′ + d x′ , y ,

( ) ( )

for all y ∈ B. Consider a sequence {yn } such that d x′ , yn → f x′ , then

( )

f (x) ≤ d(x, y) ≤ d x, x′ + f (x′ ).

Similarly,

( ) ( )

f x′ ≤ d(x′ , y) ≤ d x, x′ + d (x, y) .

Consider again sequence {yn } such that d (x, yn ) → f (x), then

( )

f (x′ ) ≤ d(x, y) ≤ d x, x′ + f (x).

Thus,

( ) ( ) ( )

−d x, x′ ≤ f (x) − f x′ ≤ d x, x′ ,

or

| f (x) − f (x′ )| ≤ d(x, x′ ).

Thus f (x) is a continuous function on A.

Since A is a compact subset of Rn and f (x) is continuous, using Weierstrass Theorem, there

exists x̄ such that f (x) attains its minimum, i.e.,

f (x̄) ≤ f (x), for all x ∈ A.

f (x̄) = d(x̄, ȳ).

Define family of sets Bn for each n ∈ N

{ }

1

Bn = y ∈ B : d(x̄, y) ≤ f (x̄) + .

n

The sets Bn is non-empty, closed and convex subset of B and

Bn+1 ⊂ Bn ,

for each n. Further,

f (x̄) =∈ {d(x̄, y) : y ∈ Bn },

i.e., if such a ȳ exists, it must be in Bn for each n. Also we note that the set B1 is a compact set.

Since Bn is non-empty, we can choose a sequence yn ∈ Bn which is a bounded sequence. Then by

applying the Heine - Borel theorem, there exists a subsequence which is convergent to some point

ȳ. Since

{ }

diam Bn = sup d(y1 , y2 ) : y1 , y2 ∈ Bn → 0

as n → ∞, we have found the ȳ having the desired property.

156 15. Separating Hyperplane Theorem

∥p∥2 > 0,

implies

∥p∥2 = p · p = p · (x̄ − ȳ) = p · x̄ − p · ȳ > 0,

so p · x̄ > p · ȳ. It still remains to show that

p · ȳ ≥ p · y, for all y ∈ B,

and

p · x̄ ≤ p · x, for all x ∈ A.

We will show the first inequality and the other one can be shown using similar arguments. Consider

y ∈ B. Since ȳ minimizes the distance (hence the square of the distance) to x̄ over all y ∈ B, and B

is convex, for any point z = ȳ + λ(y − ȳ) (with λ ∈ (0, 1], on the line segment joining ȳ and y, we

have

[d(x̄, z)]2 = (x̄ − z) · (x̄ − z) ≥ (x̄ − ȳ) · (x̄ − ȳ) = [d(x̄, ȳ)]2 .

Observe that

x̄ − z = x̄ − ȳ − λ(y − ȳ) = p − λ(y − ȳ).

Thus

(x̄ − z) · (x̄ − z) = p · p − 2λp · (y − ȳ) + (λ)2 (y − ȳ) · (y − ȳ) ≥ p · p.

This simplifies (after canceling terms on both sides) to

0 ≥ 2p · (y − ȳ) − (λ)(y − ȳ) · (y − ȳ).

Since the inequality holds for all λ ∈ (0, 1], taking the limit λ → 0, we get

0 ≥ p · (y − ȳ),

ofr

p · ȳ ≥ p · y,

for all y ∈ B.

Chapter 16

Problem Set 6

x+ 3y+ z− 2w = 1

2x+ 6y− 2z− 4w = 3

(a) Determine how many variables can be endogenous at any one time and show a partition of

the variables into endogenous and exogenous variables such that the system of equations

have a solution.

(b) Find an explicit formula for the endogenous variables in terms of the exiguous variables.

−x+ 3y− z+ w = 0

4x− y+ z+ w = 3

7x+ y+ z+ 3w = 6

Is it possible to partition the variables into endogenous and exogenous variables such that the

system of equations have a unique solution.

(3) Show that the equation x2 − xy3 + y5 = 19 is an implicit function of y in terms of x in the

neighborhood of (x, y) = (5, 2). Then estimate the value of y which corresponds to x = 4.9.

(a) If x = 6 and y = 3, find a value of z which satisfies the equation f (x, y, z) = 0.

(b) Verify if this equation

( defines

) z as(an )implicit function of x and y near x = 6 and y = 3.

∂z ∂z

(c) If it does, compute ∂x and ∂y .

(6,3) (6,3)

(d) If x increases to 6.1 and y decreases to 2.8, estimate the corresponding change in z.

157

158 16. Problem Set 6

(5) Consider the profit maximizing firm described in the Example 13.2. If p increases by ∆p and

w increases by ∆w, what will be the change in the optimal input amount x?

(6) Consider 3x2 yz + xyz2 = 96 as defining x as an implicit function of y and z around the point

x = 2, y = 3, z = 2.

(a) If y increases to 3.1 and z remains same at 2, use the Implicit Function Theorem to estimate

the corresponding x.

(b) Use the quadratic formula to solve 3x2 yz + xyz2 = 96 for x as an explicit function of y and

z.

(c) Use the approximation by differentials on the explicit formula to estimate x when y = 3.1

and z = 2.

(d) Which of the two methods is easier?

f (x + y) = f (x) + f (y).

(8) Let f : Rn+ → R be a non-decreasing, quasi-concave and homogeneous of degree one function.

Show that f must be concave on Rn+ .

(9) Let f be a continuous function from Rn+ to R, which is twice continuously differentiable on

Rn++ . Suppose f is homogeneous of degree m, where m is a positive integer ≥ 2. Show that

x′ H f (x)x = m(m − 1) f (x)

for all x ∈ Rn++ where H f (x) is the Hessian of f evaluated at x.

Chapter 17

Unconstrained

Optimization

We call

(17.1) max f (x) , x ∈ D ⊆ Rn ,

or

(17.2) min f (x) , x ∈ D ⊆ Rn ,

where domain D is an open set, unconstrained optimization problems. There are no restrictions on

x within the domain. Furthermore, there are no boundary solutions, because the domain does not

include its boundary (recall the definition of open set). Note max f (x) , x ∈ Rn or min f (x) , x ∈ Rn

are unconstrained optimization problems since Rn is an open set. While solving unconstrained op-

timization problem, we want to use the tools we developed earlier, i.e., find points where ∇ f (x) = 0

and investigate the curvature / shape of the function.

Remark 17.1. An unconstrained optimization problem may not have a solution.

Example 17.1. Let f (x) = x2 . Then,

(17.3) max f (x) , x ∈ R

does not have a solution. See the graph of f (x) = x2 .

159

160 17. Unconstrained Optimization

−3 −2 −1 1 2 3

Remark 17.2. A minimization problem can always be turned into a maximization problem and

vice versa:

(17.4) min f (x) ⇔ max − f (x) .

x∈D x∈D

We will see several examples of unconstrained optimization in these notes. Also there are

additional exercises in the problem set.

Theorem 17.1. First order necessary condition for local maxima / minima: Let A be an open

set in Rn , and let f : A → R be a continuously differentiable function on A. If function f has local

maximum / minimum at x∗ , then

( )

∇ f x∗ = 0

where 0 is a n × 1 null vector.

Remark 17.3. The converse is not true.

Theorem 17.2. Second order necessary condition for local maxima / minima: Let A be an open

set in Rn , and let f : A → R be a twice continuously differentiable function on A.

17.2. Maxima / Minima for C 2 functions of n variables 161

The first order and second order necessary conditions are useful tools to help us in ruling out

the points where a local maximum or local minimum cannot occur. This narrows down our search

for points where a local maximum or local minimum does occur. Examples below explain this

further.

set, and f a continuously differentiable function on A with f ′ (x) = −2x. Consider the point x∗ = 1.

Then f ′ (x∗ ) = f ′ (1) = −2(1) = −2. We apply Theorem 17.1 to conclude that x∗ = 1 is not a point

of local maximum of f .

set, and f a twice continuously differentiable function on A. Consider the point x∗ = 2. We can

calculate f ′ (x∗ ) = f ′ (2) = −4 + 2(2) = 0, so the necessary condition of Theorem 17.1 is satisfied.

However this theorem in itself fails to provide any additional information at this stage. In other

words, we cannot conclude from Theorem 17.1 that x∗ = 2 is a point of local maximum. Also, we

cannot conclude from Theorem 17.1 that x∗ = 2 is not a point of local maximum. Theorem 17.2 is

useful at this point. We can calculate

f ′′ (x∗ ) = f ′′ (2) = 2 > 0,

and so the necessary condition of Theorem 17.2 is violated. Consequently, by Theorem 17.2, we

can conclude that x∗ = 2 is not a point of local maximum of f .

It is easy to see that the necessary first and second order conditions are not sufficient.

f (x) d 2 f (x)

Example 17.4. Let X = R be the domain and f (x) = x3 − x4 . Then d dx = 3x2 − 4x3 and dx2

=

6x − 12x2 are both 0 at x = 0. But x = 0 is not a local maximizer for f (x).

Theorem 17.3. Sufficient conditions for local maxima / minima: Let A be an open set in Rn , and

let f : A → R be a twice continuously differentiable function on A.

(a) If x∗ ∈ A is such that H f (x∗ ) is negative definite and ∇ f (x∗ ) = 0 then f has local maximum

at x∗ .

(b) If x∗ ∈ A is such that H f (x∗ ) is positive definite and ∇ f (x∗ ) = 0 then f has local minimum at

x∗ .

It should be noted that the sufficient condition in Theorem 17.3 cannot be weakened to the

necessary condition in the statement of Theorem 17.2. The following example explains this point.

162 17. Unconstrained Optimization

Example 17.5. Let f : R → R be given by f (x) = x3 for all x ∈ R. Then A = R is an open set, and

f is a twice continuously differentiable function on A. At x∗ = 0,

f ′ (x∗ ) = f ′ (0) = 0, and f ′′ (x∗ ) = f ′′ (0) = 0,

so first order necessary condition and second order necessary condition are satisfied. But x∗ is

clearly not a point of local maximum of f since f is an increasing function on A.

It may also be observed that the second order necessary condition in Theorem 17.2 cannot be

strengthened to the sufficient condition in the statement of Theorem 17.3. The following example

illustrates this point.

Example 17.6. Let f : R → R be given by f (x) = −x4 for all x ∈ R. Then A = R is an open set, and

f is a twice continuously differentiable function on R. Clearly, x∗ = 0 is a point of local maximum

of f , since f (0) = 0, while f (x) < 0 for all x ̸= 0. We can calculate that

f ′ (x∗ ) = f ′ (0) = 0, and f ′′ (x∗ ) = f ′′ (0) = 0.

Thus first order necessary condition (in Theorem 17.1) and second order necessary condition (in

Theorem 17.2) are satisfied are, but the second order sufficient condition (in Theorem 17.3) is

violated.

The above discussion shows that the second-order necessary conditions for a local maximum

are different from (weaker than) the second-order sufficient conditions for a local maximum. This

demonstrates the fact that, in general, the first and second derivatives of a function at a point do not

capture all aspects relevant to the occurrence of a local maximum of the function at that point.

Theorem 17.4. Concavity (convexity) and global maxima (minima): Let A be an open and con-

vex set in Rn , and let f : A → R be a continuously differentiable function on A.

(a) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and f is concave on A, then f has global maximum at x∗ .

(b) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and f is convex on A, then f has global minimum at x∗ .

This is very easy to show. Note that concavity alongwith continuous differentiability of f

implies that for all x ∈ A,

f (x) − f (x∗ ) 6 ∇ f (x∗ ) · (x − x∗ ).

So f (x) − f (x∗ ) 6 0 or x∗ is a point of global maximum of f on A.

Theorem 17.5. Let A be an open and convex set in Rn , and let f : A → R be a twice continuously

differentiable function on A.

(a) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and H f (x) is negative semi-definite for all x ∈ A, then f

has global maximum at x∗ .

17.2. Maxima / Minima for C 2 functions of n variables 163

(b) If x∗ ∈ A is such that ∇ f (x∗ ) = 0 and H f (x) is positive semi-definite for all x ∈ A, then f has

global minimum at x∗ .

It is worth noting that Theorem 17.4 or Theorem 17.5 might be applicable in cases Theorem

17.3 is not applicable as the following example shows.

Example 17.7. Let f : R → R be given by f (x) = −x4 . Here, we note that f ′ (0) = 0 and f ′′ (x) =

−12x2 ≤ 0 for all x ∈ R. Thus we can apply Theorem 17.4 or Theorem 17.5 and conclude that

x = 0 is a point of global maximum, and hence also a point of local maximum. But the conclusion

that x = 0 is a point of local maximum cannot be derived from Theorem 17.3, since f ′′ (0) = 0.

Now we explain the steps in applying these theorems via several examples.

Example 17.8. Consider X = R2+ and f (x) = x1 x2 − 2x14 − x22 . The optimization exercise is to

maximize the objective function f (x) by choosing x ∈ X. The two first order conditions are

x2 − 8x13 = 0, and x1 − 2x2 = 0.

Solving the second equation for x1 , we have x1 = 2x2 . Substituting this into the first equation, we

have x2 − 64x23 = 0, which has three solutions:

1 1

x2 = 0, , and − .

8 8

Then the first order conditions have three solutions,

( ) ( )

1 1 1 1

(x1 , x2 ) = (0, 0) , , , and − , − ,

4 8 4 8

but the last of these is not in the domain of f , and the first is on the boundary of the domain, giving

f (0, 0) = 0. Thus, we have a unique solution in the interior of the domain:

( )

( ∗ ∗) 1 1

x1 , x2 = , .

4 8

f (x, y, z) = x2 + 2y2 + 3z2 + 2xy + 2xz.

[ ] [ ]

∇ f (x, y, z) = 2x + 2y + 2z 4y + 2x 6z + 2x = 0 0 0 .

The only solution is (x, y, z) = (0, 0, 0). So we have one candidate for local maximum or

minimum.

164 17. Unconstrained Optimization

Step 2 Compute H f .

2 2 2

H f (x, y, z) = 2 4 0 .

2 0 6

Note that in this example, H f is independent of (x, y, z). So whichever property of H f , we

get, will be global.

Step 3 Determine the curvature. Begin with computing the leading principal minors.

D1 = 2 > 0, D2 = 2 · 4 − 2 · 2 = 4 > 0 and

D3 = 2 (24 − 0) − 2 (12 − 0) + 2 (0 − 8) = 48 − 24 − 16 = 8 > 0

All leading principal minors are strictly positive→ H f is positive definite ∀ (x, y, z) in-

cluding (0, 0, 0) which implies that f is strictly convex.

Step 4 Conclude, using Theorem 17.4, that we have a global minimum at (0, 0, 0).

Example 17.10. Let us find maxima / minima for f : R2 → R

f (x, y) = −x3 + xy − y3 .

[ ] [ ]

∇ f (x, y) = −3x2 + y −3y2 + x = 0 0 .

( )

There are two solutions (x, y) = (0, 0) ; (x, y) = 13 , 31 .

Step 2 Compute H f .

[ ] ( ) [ ] [ ]

−6x 1 1 1 −2 1 0 1

H f (x, y) = ⇒ Hf , = and H f (0, 0) = .

1 −6y 3 3 1 −2 1 0

()

1 1

Step 3 Determine the curvature. For 3, 3

, the leading principal minors.

( )

1 1

D1 = −2 < 0, D2 = 3 > 0 ⇔ H f , is negative definite.

3 3

For (0, 0),the principal minors are

D1 = 0, 0; D2 = −1 < 0 ⇒ H f (0, 0) is neither negative semi-definite nor positive semi-definite.

17.2. Maxima / Minima for C 2 functions of n variables 165

Step 4 Then Theorem ( 17.3) on second order necessary conditions applies and we have strict local

maximum at 13 , 31 . The contrapositive of the second order necessary conditions (Theo-

rem 17.2) shows that (0, 0) is neither a point of local maximum nor local minimum. It is

an inflection point.

Example 17.11. Let us find maxima / minima for f : R2 → R

f (x, y) = 2x3 + xy2 + 5x2 + y2 .

[ ] [ ]

∇ f (x, y) = 6x2 + y2 + 10x 2xy + 2y = 0 0 .

2xy + 2y = 0 ⇒ y = 0 ∨ x = −1,

for x = −1, 6x2 + y2 + 10x = y2 − 4 = 0 ⇒ y = 2 ∨ y = −2

5

for y = 0, 6x2 + y2 + 10x = 6x2 + 10x = 0 ⇒ x = 0 ∨ x = − .

3

There are four solutions

( )

5

(x, y) = (0, 0) ; (−1, 2) ; (−1, −2) , and − , 0 .

3

Step 2 Compute H f .

[ ]

12x + 10 2y

Hf = .

2y 2x + 2

Step 3

[ ]

10 0

H f (0, 0) = , D1 = 10 > 0, D2 = 20 > 0

0 2

⇒ H f (0, 0) is positive definite.

[ ]

−2 4

H f (−1, 2) = , D1 = −2 < 0, and 0, D2 = −16 < 0

4 0

⇒ H f (−1, 2) is neither positive semi-definite nor negative semi-definite.

[ ]

−2 −4

H f (−1, −2) = , D1 = −2 < 0, and 0, D2 = −16 < 0

−4 0

⇒ H f (−1, −2) is neither positive semi-definite nor negative semi-definite.

166 17. Unconstrained Optimization

( ) [ ]

5 −10 0 40

Hf − ,0 = D1 = −10 < 0, D2 = >0

3 0 −34

3

( )

5

⇒ H f − , 0 is negative definite.

3

( )

Step 4 Then Theorem 17.3 on sufficient conditions apply for (0, 0) and − 53 , 0 . We have strict

( )

local minimum at (0, 0); and strict local maximum at − 53 , 0 . The contrapositive of the

second order necessary conditions (Theorem 17.2) implies that neither local maximum

not local minimum exist at (−1, 2) and (−1, −2). They are inflection points.

regression coefficients in the method of ordinary least squares.

for all x ∈ R. Our objective is to find a function f (i.e., we want to choose a ∈ R and b ∈ R) such

that the quantity

n

(17.5) ∑ [ f (xi ) − yi ]2

i=1

is minimized. Thus the coefficients are such that the sum of the squares of the residuals (error

terms, i.e., the difference between the estimates and the actual observations) is minimized.

Define f : R2 → R by

n

f (a, b) = − ∑ [axi + b − yi ]2

i=1

(a,b)

17.3. Application: Ordinary Least Square Analysis 167

can calculate

n n

f1 = −2 ∑ [axi + b − yi ]xi = −2 ∑ [axi2 + bxi − xi yi ]

i=1 i=1

n

f2 = −2 ∑ [axi + b − yi ]

i=1

n

f11 = −2 ∑ xi2 ;

i=1

n

f12 = −2 ∑ xi ;

i=1

n

f21 = −2 ∑ xi ;

i=1

f22 = −2n

[ ]

−2 ∑ni=1 xi2 −2 ∑ni=1 xi

H f (a, b) = .

−2 ∑ni=1 xi −2n

The principal minors of order one for the Hessian are, f11 = −2 ∑ni=1 xi2 < 0 and f22 = −2n < 0.

We need to check the determinant of the principal minor of order two to be non-negative. Thus, the

determinant of the Hessian of f is

[ ]2

n n

det(H f (a, b)) = 4n ∑ xi2 − 4 ∑ xi

i=1 i=1

|x · y| ≤ ∥x∥ · ∥y∥.

We can take the two vectors x and the sum vector u and apply the inequality to get

|x · u| ≤ ∥x∥ · ∥u∥

|x · u|2 ≤ ∥x∥2 · ∥u∥2

n [ ]

| ∑ xi |2 ≤ ∑ xi2 · n.

i=1

Therefore, det(H f (a, b)) ≥ 0. Since f11 (a, b) ≤ 0, f22 (a, b) ≤ 0, and det(H f (a, b)) ≥ 0, H f (a, b) is

negative semi-definite. Consequently, if (a∗ , b∗ ) satisfies the first-order conditions, then (a∗ , b∗ ) is

168 17. Unconstrained Optimization

n n n

a ∑ xi2 + b ∑ xi = ∑ xi yi

i=1 i=1 i=1

n n

a ∑ xi + bn = ∑ yi

i=1 i=1

n n

∑ xi ∑ yi

i=1 i=1

Denoting n by x and n by y (the mean of x and mean of y respectively), we get

(17.6) ax + b = y

Using this in the first equation leads to

n n

(17.7) a ∑ xi2 + (y − ax)nx = ∑ xi yi

i=1 i=1

Thus,

n

∑ xi yi

i=1

n −xy

n = a

∑ xi2

i=1

n − x2

y − ax = b

solves the problem. Note the solution is meaningful provided not all the xi are the same.

In the next exercise, we provide an alternative proof of the determinant of the Hessian is non-

negative.

2αβ ≤ α2 + β2 .

Observe,

(α − β)2 = α2 + β2 − 2αβ ≥ 0

which shows that the desired inequality holds.

(x1 + x2 + · · · + xn )2 ≤ n(x12 + x22 + · · · + xn2 ).

For n = 2, we know

2x1 x2 ≤ x12 + x22

x12 + x22 + 2x1 x2 ≤ 2(x12 + x22 )

(x1 + x2 )2 ≤ 2(x12 + x22 ).

17.3. Application: Ordinary Least Square Analysis 169

Next, we assume that the claim holds for some k ∈ N and show that it holds true for n = k + 1.

Let

(x1 + · · · + xk )2 ≤ k(x12 + · · · + xk2 ).

Then

(x1 + · · · + xk + xk+1 )2 = (x1 + · · · + xk )2 + 2(x1 + · · · + xk )xk+1 + xk+1

2

2

2

) + · · · + (xk2 + xk+1

2 2

) + xk+1

= (k + 1)(x12 + · · · + xk+1

2

).

Hence, the claim holds true for all n ∈ N.

[ ]2 [ ][ ]

n n n n

∑ni=1 xi

4n ∑ xi − 4 ∑ xi = 4n ∑ xi − 4 ∑ xi

2 2

·n

i=1 i=1 i=1 i=1 n

[ ]

n n

= 4n ∑ xi2 − ∑ xi · x .

i=1 i=1

[ ]

n n

∑ xi2 − ∑ xi · x ≥ 0.

i=1 i=1

Note

[ ]

n n n n

∑ xi2 − ∑ xi · x = ∑ xi (xi − x) = ∑ (xi − x + x)(xi − x)

i=1 i=1 i=1 i=1

n n

= ∑ (xi − x)2 + x · ∑ (xi − x) ≥ 0,

i=1 i=1

since the first term being sum of squares is non-negative and the second term is zero because

∑ni=1 (xi − x) = 0.

Chapter 18

Problem Set 7

g(x, y) = x3 + y3 − 3x − 2y.

Write out ∇g(x, y) and Hg (x, y). Show that g is convex in its domain and find its (global)

minimum.

(18.1) f (x) = x4 − 4x3 + 4x2 + 4?

Which if any of them are global maxima or minima?

(3) A monopolist producing a single output has two types of buyers. If it produces Q1 units for

buyers of type 1, then the buyers are willing to pay a price of 100 − 5Q1 dollars per unit. If it

produces Q2 units for buyers of type 2, then the buyers are willing to pay a price of 50 − 10Q2

dollars per unit. The monopolist’s cost of producing Q units of output is 50+10Q. How many

units the monopolist should produce to maximize profit?

(4) Suppose that a perfectly competitive firm receives a price of P for its output, pays prices of

w, and r for its labor (L), and capital inputs (K), and operates with the production function

Q = La K b .

(a) Write profits as a function of L, and K. Derive the first order conditions. Provide an eco-

nomic interpretation of the first order conditions.

(b) Solve for the optimal levels of L, and K.

(c) Check the second order conditions. What restrictions on the values of a, and b are necessary

for a profit maximum. Provide an economic interpretation of these restrictions.

171

172 18. Problem Set 7

(d) Find the signs of the partial derivatives of L with respect to P, w, and r.

(e) Derive the firm’s long run supply curve, i.e., Q as a function of the exogenous parameters.

Find the elasticities of supply with respect to w, r, and P. Do these elasticities sum to zero?

Provide an economic explanation for this fact.

(5) Suppose that a perfectly competitive firm receives a price of P for its output, pays prices of

w, v, and r for its labor (L), natural resource (R) and capital inputs (K), and operates with the

production function Q = A(L)a (K)b + ln R.

(a) Write profits as a function of L, R and K. Derive the first order conditions. Provide an

economic interpretation of the first order conditions.

Now take A = 3, a = b = 31 for remainder of the problem.

(b) Check the second order conditions.

(c) [Optional)] Solve for L∗ . Find the change in L∗ for a change in r when all other parameters

are constant by taking the partial derivatives of L∗ with respect to r.

(d) [Optional)] Find the change in L∗ for a change in v when all other parameters are constant

by taking the partial derivatives of L∗ with respect to v.

(e) [Optional)] It is also possible to determine the changes in L∗ when r or v values change

without explicitly solving for L∗ by using the Implicit Function Theorem. You might like

to use a more general version of the Implicit Function Theorem (than what we stated in

class) to complete this exercise.

(i) Find the change in L for a change in r when all other parameters are constant.

(ii) Find the change in L for a change in v when all other parameters are constant.

Chapter 19

Optimization Theory:

Equality Constraints

The optimization problems we encounter in economics are, in general, constrained problems where

there are some restrictions on the set we can choose x from. Some examples of constrained opti-

mization problems we see are,

max u (x)

(19.1) x

subject to x ∈ B (p, I)

Producer Theory

max py − w · x

(19.2) y,x

subject to (y, x) ∈ Y

where

{ }

Y = (y, x) ∈ R × Rn | y 6 f (x)

is the production possibility set with f (x) being the production function (one output, many inputs).

173

174 19. Optimization Theory: Equality Constraints

We will work with maximization problem as it is easy to turn a minimization problem into a

maximization problem. A constrained maximization problem has the following form.

max f (x)

x

subject to x ∈ G (x)

f (x) is called the objective function,

where x is called the choice variable,

G (x) is called the constraint set.

We assume the objective function to be C 2 so that we can use differential calculus techniques.

max f (x)

(19.3) x

subject to x ∈ [a, b]

( )

(19.4) x∗ ∈ X ∗ ⊂ [a, b] ∧ f x∗ > f (x) ∀x ∈ [a, b] .

Does a solution exist? Note f is continuous (because it is C 2 ) and [a, b] is a non-empty compact

set. We can use Weierstrass Theorem to show existence of a maximum and minimum. Having

shown the existence, there are two possibilities:

(a) The solution is interior, x∗ ∈ (a, b). Then x∗ must also be a local maximum, i.e.,

( ) ( )

(19.5) f ′ x∗ = 0 ∧ f ′′ x∗ 6 0.

If x∗ = a, then f ′ (a) 6 0.

If x∗ = b, then f ′ (b) > 0.

of its domain is that f ′ (x0 ) = 0 and that there exists an interval (a, b) around x0 such that f ′ (x) < 0

in (a, x0 ) and f ′ (x) > 0 in (x0 , b). Following example shows that the conditions are not necessary.

19.2. Equality Constraint 175

( )

2x2 + x2 sin 1 , for x ̸= 0

f (x) = x

0, for x = 0.

( ) ( )

′ 1 1

f (x) = 4x + 2x sin − cos ,

x x

( )

changes sign arbitrarily because of the term cos 1x close to the origin.

In general, constrained optimization problems are of two categories, (a) with equality con-

straint and (b) with inequality constraint. We discuss them next.

g1 (x) = 0

··· where x ∈ Rn , or,

gk (x) = 0

{ }

(19.6) G (x) = x ∈ Rn | g (x) = 0 .

Note that g (x) = (g1 (x) , · · · , gk (x)) is k-dimensional row vector. The interesting case will be

k < n as the following example shows.

max f (x)

x∈R2 {

x1 + x2 − 2 = 0 : g1 (x) = 0

subject to

3 x1 + x2 − 1 = 0 g (x) = 0.

1

: 2

( )

The only point in the constraint set is (x1 , x2 ) = 32 , 21 . Maximizing over this set is trivial. The

solitary point in the constraint set is also the solution.

Definition 19.1. A point x∗ ∈ G (x) is point of local maximum of f subject to the constraint g (x) =

0, if there is δ > 0 such that x ∈ G (x) ∩ B (x∗ , δ) implies f (x) 6 f (x∗ ).

176 19. Optimization Theory: Equality Constraints

Definition 19.2. A point x∗ ∈ G (x) is point of global maximum of f subject to the constraint

g (x) = 0, if x∗ solves the problem

max f (x)

subject to g (x) = 0.

Theorem 19.1. Necessary condition for a constrained local maximum (Lagrange Theorem) Let

A ⊆ Rn be open and f : A → R, g : A → Rk be C 1 functions. Suppose x∗ is a point of local

maximum of f subject to the constraint g (x) = 0. Suppose further that ∇g (x∗ ) ̸= 0. Then there is

λ∗ ∈ Rk such that

( ) ( )

(19.7) ∇ f x∗ = λ∗ ∇g x∗ .

Remark 19.1. The condition ∇g (x∗ ) ̸= 0 is called constraint qualification.

It is important to check the constraint qualification condition ∇g(x∗ ) ̸= 0, for applying the

conclusion of Lagrange’s theorem. Without this condition, the conclusion of Lagrange’s theorem

would not be valid, as the following example shows.

Example 19.5.

Let f : R2 → R be given by

f (x1 , x2 ) = 4x1 + 3x2 for all (x1 , x2 ) ∈ R2 ;

and let g : R2 → R be given by

g(x1 , x2 ) = x12 + x22 .

Consider the constraint set C = {(x1 , x2 ) ∈ R2 : g(x1 , x2 ) = 0}. The only element of this set is (0,0),

so (x1∗ , x2∗ ) = (0, 0) is a point of local maximum of f subject to the constraint g(x) = 0. Observe that

the conclusion of Lagrange’s theorem does not hold here. For, if it did, there would exist λ∗ ∈ R

such that

∇ f (0, 0) = λ∗ ∇g(0, 0)

But this means that

(4, 3) = λ∗ (0, 0)

which is a contradiction. The problem here is that

∇g(x1∗ , x2∗ ) = ∇g(0, 0) = (0, 0),

so the constraint qualification condition is violated.

In the next Theorem, we use notation C to denote the constraint set, i.e.,

{ }

C = x ∈ Rn : g(x) = 0 .

19.2. Equality Constraint 177

Theorem 19.2. Sufficient Conditions for a Global maximum: Let A ⊆ Rn be an open convex set

and f : A → R, g : A → Rk be C 1 functions. Suppose (x∗ , λ∗ ) ∈ C × Rk satisfies

( ) ( )

(19.8) ∇ f x∗ = λ∗ ∇g x∗ .

If L (x, λ∗ ) = f (x) − λ∗ · g (x) is concave in x on A, then x∗ is a point of global maximum of f

subject to constraint g (x) = 0.

L (x, λ∗ ) − L (x∗ , λ∗ ) ≤ [∇ f (x∗ ) − λ∗ ∇g(x∗ )] · (x − x∗ )

by concavity of L in x on A. Using the first-order condition, the term on the right hand side of the

inequality [∇ f (x∗ ) − λ∗ ∇g(x∗ )] · (x − x∗ ) is zero and we get

f (x) − λ∗ g(x) = L (x, λ∗ ) ≤ L (x∗ , λ∗ ) = f (x∗ ) − λ∗ g(x∗ ).

Since x ∈ C, and x∗ ∈ C, we have g(x) = g(x∗ ) = 0. Thus, f (x) ≤ f (x∗ ), and so x∗ is a point of

global maximum of f subject to the constraint g(x) = 0.

We use the following steps to solve the optimization problem with equality constraint. Let f

and gi , i = 1, · · · , k, be C 1 functions.

Necessity Route:

Step 1 Existence of solution can be shown by using Weierstrass Theorem. For this we need to

show that the constraint set is closed and bounded.

L (x, λ) = f (x) − λ · g(x) = f (x) − λ1 g1 (x) − · · · − λk gk (x)

where λi , i = 1, · · · , k are Lagrange multipliers.

Step 3 Take the partial derivative with respect to each variable x1 , · · · xn , and Lagrange multipliers

λ1 , · · · , λk .

∂L (x, λ)

= 0, i = 1, · · · , n;

∂xi

∂L (x, λ)

= 0, i = 1, · · · , k.

∂λi

These are n + k first order conditions (FOCs) for n + k unknowns.

178 19. Optimization Theory: Equality Constraints

x2

2 x1

Step 5 Let

{ }

M = (x, λ) ∈ Rn+k | x satisfies gi (x) = 0, i = 1, · · · , k and FOCs hold.

Verify that ∇g (x∗ ) ̸= 0 holds at each point in the set M. Then evaluate f at each (x, λ) ∈ M

and find the maximum.

Sufficiency Route: We know that if f and λ1 g1 (x) , · · · , λk gk (x) are such that L (x, λ) is con-

cave, then the FOCs are sufficient for a maximum. Hence if we can show concavity, then any point

satisfying the FOC will be a solution. We illustrate the use of the two routes through following

examples.

Remark 19.2. Note if f is not concave, we have to compare points in M (x, λ).

Example 19.6.

max f (x1 , x2 ) = −x12 − x22

x∈R2+

subject to 5x1 + 10x2 = 10

The constraint set consists of 1 − 0.5x1 and non-negative values of x1 and x2 subject to the equality

constraint. To get the constraint in g (x) = 0 form, we rearrange it

5x1 + 10x2 − 10 = 0.

19.2. Equality Constraint 179

Constraint set is closed. Take any convergent sequence {xn } ∈ G (x) → x̄. Since 5x1n + 10x2n −

10 = 0, x1n > 0, x2n > 0, ∀n ∈ N, and weak inequalities are preserved in the limit,

5x̄1 + 10x̄2 − 10 = 0, x̄1 > 0, x̄2 > 0.

So x̄ ∈ G (x).

√ √

x1 6 2 and x2 6 1 ⇒ ∥x∥ 6 (2, 1) = 22 + 12 = 5.

√

So 5 will serve as a bound. So the constraint set is compact and non-empty and the objective

function f is continuous, hence Weierstrass theorem is applicable and a solution exists.

L (x, λ) = −x12 − x22 − λ (5x1 + 10x2 − 10)

∂L (x, λ)

= −2x1 − 5λ = 0

∂x1

∂L (x, λ)

= −2x2 − 10λ = 0

∂x2

∂L (x, λ)

= −(5x1 + 10x2 − 10) = 0.

∂λ

Now from the first two FOCs

4x1 = 2x2 ⇔ 2x1 = x2

and from the third FOC,

5x1 + 20x1 − 10 = 0

10 2 4 4

x1 = = , x2 = , λ = − .

25 5 5 25

We get a candidate for solution ( )

2 4 4

m1 = , ,− .

5 5 25

Since we know a solution exists, it must necessarily be either m1 or one of the corners (2, 0) or

(0, 1). The constraint qualification

( ) [ ]

∇g x∗ = 5 10 ̸= 0

is verified trivially.

2 4 4

f (2, 0) = −4, f (0, 1) = −1, f , =− .

5 5 5

180 19. Optimization Theory: Equality Constraints

( )

The solution then is x∗ = 2 4

5, 5 .

Sufficiency Route

[ ]

∇ f (x) = −2x1 −2x2

[ ]

−2 0

H f (x) =

0 −2

D1 = −2 < 0, D2 = 4 > 0

So H f (x) is negative definite ∀x. Since H f (x) is negative definite ∀x, f is concave. The constraint

g(x) is concave as it is linear. Also −λ > 0. Then f (x) − λg (x) is concave as a sum of concave

functions.

( )Then we know that the FOCs are sufficient condition for a maximum. So the point

∗ 2 4

x = 5 , 5 is our solution.

max f (x1 , x2 ) = x12 x2

subject to 2x12 + x22 = 3.

The constraint set is an ellipsoid and can be rewritten as 3 − 2x12 − x22 = 0. Here the sufficiency

route will not work as the objective function is not concave.

[ ]

2x2 2x1

H f (x) =

2x1 0

D1 = 2x2 , D2 = −4x12 , D2 < 0 ∀ x ̸= 0

which means that H f (x) is indefinite ∀x ̸= 0. So f is not concave. Hence we have to use the

necessity route.

( )2

Constraint set is closed. Take any convergent sequence {xn } ∈ G (x) → x̄. Since 2 x1n +

( n )2

x2 = 3 ∀ n ∈ N, and weak inequalities are preserved in the limit,

2 (x̄1 )2 + (x̄2 )2 = 3.

So x̄ ∈ G (x).

√

3 √ √

x1 6 < 3 and x2 6 3.

2

19.2. Equality Constraint 181

√ √
√ √

So ∥x∥ 6
3, 3
= 3 + 3 = 6. So the constraint set is compact and non-empty and the

objective function f is continuous, hence Weierstrass theorem is applicable and a solution exists.

( )

L (x, λ) = x12 x2 − λ 3 − 2x12 − x22

∂L (x, λ)

= 2x1 x2 + 4λx1 = 0

∂x1

∂L (x, λ)

= x12 + 2λx2 = 0

∂x2

∂L (x, λ)

= −(3 − 2x12 − x22 ) = 0.

∂λ

Now

x2

2x1 (x2 + 2λ) = 0 ⇔ x1 = 0 ∨ λ = − .

2

Case (i) √

x1 = 0, x2 = ± 3, λ = 0.

We get two candidates for solution

( √ ) ( √ )

m1 = 0, 3, 0 , m2 = 0, − 3, 0 .

Case (ii)

x2

λ = − → x12 − x22 = 0

2

→ x1 = x2 ∨ x1 = −x2

→ 3 − 2x12 − x22 = 0

gives x1 = 1 ∨ x1 = −1. If

1 1

x1 = 1 → x2 = 1 ∨ x2 = −1, λ = − ∨ λ = .

2 2

Similarly for x1 = −1. We get four more candidates for solution.

( ) ( )

1 1

m3 = 1, 1, − , m4 = 1, −1, ,

2 2

( ) ( )

1 1

m5 = −1, −1, , m6 = −1, 1, − .

2 2

Thus

M = {m1 , m2 , · · · , m6 } .

The constraint qualification

( ) [ ]

∇g x∗ = −4x1∗ −2x2∗ ̸= 0

182 19. Optimization Theory: Equality Constraints

√ ( √ )

f (0, 3) = 0 = f 0, − 3 ,

f (1, 1) = f (−1, 1) = 1,

f (1, −1) = f (−1, −1) = −1.

The solution then is x = (1, 1) and x = (−1, 1).

Example 19.8.

max f (x1 , x2 ) = x1 x2

x∈R2+

subject to x1 + 4x2 = 16 or 16 − x1 − 4x2 = 0.

The Hessian is [ ]

0 1

H f (x) =

1 0

which is indefinite for all values of x ∈ R2+ . Hence the objective function is not concave.

Observe that x is restricted to R2+ and the equality constraint holds. This constraint set is non-

empty as (0, 4) is contained in it, and compact. A solution to this problem exists as f is continuous

and the constraint set is non empty and compact, hence Weierstrass theorem is applicable.

L (x, λ) = x1 x2 − λ (16 − x1 − 4x2 )

∂L (x, λ)

= x2 + λ = 0

∂x1

∂L (x, λ)

= x1 + 4λ = 0

∂x2

∂L (x, λ)

= −(16 − x1 − 4x2 ) = 0.

∂λ

The FOCs will give us interior candidates. We will still need to compare with the corners. Now

x1 = 4x2 →

8x2 = 16 → x2 = 2 and

x1 = 8, λ = −8.

We get one candidate for solution

m1 = (8, 2, −8) .

The constraint qualification

( ) [ ]

∇g x∗ = −1 −4 ̸= 0

19.2. Equality Constraint 183

is satisfied trivially for m1 . Compare it with the corners (0, 4) , (16, 0) and verify that

f (0, 4) = 0 = f (16, 0) , f (8, 2) = 16.

The solution then is x = (8, 2).

Example 19.9.

max f (x1 , x2 ) = ln x1 + ln x2

x∈R2+

subject to x1 + 4x2 = 16 or 16 − x1 − 4x2 = 0.

Here the necessity route does not work as the objective function is not defined at the corners

of the constraint set, x = (16, 0) or x = (0, 4) as ln y is not defined for y = 0. Weierstrass Theorem

cannot be applied. Let us use the sufficiency route. Since ln is not defined at the corners, the

problem can be modified as follows

max f (x1 , x2 ) = ln x1 + ln x2

x∈R2++

subject to 16 − x1 − 4x2 = 0.

The Lagrangian and the FOCs are

L (x, λ) = ln x1 + ln x2 − λ (16 − x1 − 4x2 )

∂L (x, λ) 1

= + λ = 0 → λx1 = −1

∂x1 x1

∂L (x, λ) 1

= + 4λ = 0 → 4λx2 = −1

∂x2 x2

∂L (x, λ)

= −(16 − x1 − 4x2 ) = 0.

∂λ

So x1 = 4x2 from the first two FOCs. Substituting it in the third FOC, we get x1 = 8, x2 = 2, λ = − 18 .

The Hessian is

− x12 0

H f (x) = 1

0 − x12

2

1 1

D1 = − < 0, D2 = 2 2 > 0, ∀x ∈ R2++ .

x12 x1 x2

Hence H f (x) is negative definite ∀x ∈ R2++ , so f is concave. Also g (x) = 16 − x1 − 4x2 is linear,

hence concave. Lastly −λ > 0. So L (x, λ) is concave and the FOCs are sufficient for maximum.

Hence x∗ = (8, 2) is the solution.

Example 19.10. Application: Arithmetic mean Geometric mean inequality Consider

max f (a, b) = ab

(19.9) (a,b)∈R2+

subject to a + b = 2.

184 19. Optimization Theory: Equality Constraints

Note the constraint set C = {a > 0, b > 0, a + b = 2} is non empty, (2, 0) is contained in it,

closed since weak inequalities are preserved in the limit, and bounded as

√

(a, b)
6
(2, 2)
= 2 2.

Note that at the solution a > 0, b > 0. Hence we can rewrite the problem as under

max f (a, b) = ab

(a,b)∈R2++

subject to g (a, b) = 2 − a − b = 0.

The Lagrangian and the FOCs are

L (a, b, λ) = ab − λ (2 − a − b)

∂L (x, λ)

= b+λ = 0

∂a

∂L (x, λ)

= a+λ = 0

∂b

∂L (x, λ)

= −(2 − a − b) = 0.

∂λ

Now

a = b → a = b = 1 = −λ

We get one candidate for solution

m1 = (1, 1, −1) .

The constraint qualification

( ) [ ]

∇g x∗ = −1 −1 ̸= 0

is satisfied trivially for m1 . Compare it with the corners (0, 2) , (2, 0) and verify that

f (0, 2) = 0 = f (2, 0) , f (1, 1) = 1.

The solution then is (1, 1). In other words, we have shown that

(19.10) ab 6 1.

Now let x1 > 0, x2 > 0 be arbitrary with

x1 + x2 = x > 0.

Then

2x1 + 2x2 = 2x

2x1 2x2

+ = 2

x x

19.2. Equality Constraint 185

x > 0, b =

> 0 and a + b = 2. So we can apply the result shown above.

2x2

x

( )( )

2x1 2x2

ab = 61

x x

( )

x2 x1 + x2 2

x1 x2 6 =

4 2

√ x1 + x2

x1 x2 6

2

which is the Arithmetic mean Geometric mean inequality.

Chapter 20

Optimization Theory:

Inequality Constraints

The more general constrained optimization problem deals with inequality constraint. Note that the

equality constraint g (x) = 0 can be expressed as g (x) > 0 and g (x) 6 0.

The constrained maximization problem with which we are concerned is the following:

max f (x)

subject to g j (x) ≥ 0 for j = 1, · · · , m

and x∈ Rn+ .

continuously differentiable functions from X to R.

Gm+ j (x) = x j for j = 1, · · · , n.

187

188 20. Optimization Theory: Inequality Constraints

max f (x)

subject to G j (x) ≥ 0 for j = 1, · · · , m + n

and x ∈ X.

C = {x ∈ X : G(x) ≥ 0}

where, G(x) = [G1 (x), · · · , Gm+n (x)].

Definition 20.1. Kuhn-Tucker Conditions: Let X be an open set in Rn , and f , G j ( j = 1, · · · , m +

n) be continuously differentiable on X. A pair (x∗ , λ∗ ) in X × Rm+n

+ satisfies the Kuhn-Tucker

conditions if

m+n

(i) Di f (x∗ ) + ∑ λ∗j · Di G j (x∗ ) = 0; i = 1, · · · , n

j=1

(ii) G(x∗ ) > 0 and λ∗ · G(x∗ ) = 0.

Theorem 20.1. Let X be an open set in Rn , and f , G j ( j = 1, · · · , m + n) be continuously differ-

entiable on X. Suppose a pair (x∗ , λ∗ ) ∈ X × Rm+n + , satisfies the Kuhn-Tucker conditions. If X

is convex and f , G j ( j = 1, · · · , m + n) are concave on X, then x∗ is a point of constrained global

maximum.

We illustrate the application of this Theorem through examples. First we take a linear objective

function.

Example 20.1. Solve

max f (x, y) = ax + by

(x,y)∈R2+

subject to p1 x + p2 y 6 M.

where a, b, p1 , p2 and M are positive parameters. Find a solution to the problem for the following

parameter configurations

a p1 a p1

(i) > (ii) < ,

b p2 b p2

using Kuhn Tucker sufficiency theorem.

(i) Let { }

X = (x, y) ∈ R2 | x > −1, y > −1 .

20.1. Inequality Constraint 189

{ }

X C = (x, y) ∈ R2 | x 6 −1, or y 6 −1

is closed.

(ii) Function f (x, y) is continuous as ax and by are continuous and f (·, ·) is obtained by taking

sum of two continuous functions.

Let g1 (x, y) = M − p1 x − p2 y, g2 (x, y) = x, g3 (x, y) = y are linear and hence continuous

functions. Further fx (x, y) = a, fy (x, y) = b are continuous functions. Hence f , g j ( j =

1, · · · , 3) are continuously differentiable on X.

x1 > −1, x2 > −1 → λx1 + (1 − λ) x2 > −1∀λ ∈ (0, 1)

y1 >−1, y2 > −1 → λy1 + (1 − λ) y2 > −1∀λ ∈ (0, 1)

( )

⇒ λx1 + (1 − λ) x2 , λy1 + (1 − λ) y2 ∈ X.

Function f (x, y) is concave as sum of two concave functions and g j ( j = 1, · · · , 3) are concave being

linear functions. Hence for the following problem

max f (x, y) = ax + by

(x,y)∈X

subject to p1 x + p2 y 6 M, x > 0, y > 0.

all conditions of Kuhn-Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈

X × R3+ , that satisfies the Kuhn-Tucker conditions.

m

(i) Di f (x∗ ) + ∑ λ∗j · Di g j (x∗ ) = 0; i = 1, · · · , n,

j=1

(ii) g(x ) > 0 and λ∗ · g(x∗ ) = 0.

∗

They are

a − λ1 p1 + λ2 = 0

b − λ1 p2 + λ3 = 0

M − p1 x − p2 y > 0, λ1 (M − p1 x − p2 y) = 0

x > 0, λ2 x = 0; y > 0, λ3 y = 0

If λ1 = 0, then a − λ1 p1 + λ2 = 0 → λ2 = −a < 0 which contradicts λ2 > 0. Hence

λ1 > 0 → M − p1 x − p2 y = 0.

So x = y = 0 is ruled out. We now consider the two cases.

190 20. Optimization Theory: Inequality Constraints

x2

M

p2

M x1

( ) p1

a p1 M

Figure 20.1. Case (i): b > p2 : Optimal Consumption Bundle = p1 , 0

p1

Case (i) a

b > p2 . Consider x > 0, y = 0. Note λ2 = 0, x = M

p1 ,

a a

= λ1 , b − p2 + λ3 = 0,

p1 p1

( )

a a p2

λ3 = p2 − b = b − 1 > 0,

p1 b p1

a p1 a p2

since b > p2 or b p1 > 1. Hence

( )

M a a p2

x = , y = 0, λ1 = , λ2 = 0, λ3 = b −1 > 0

p1 p1 b p1

is a solution.

p1

Case (ii) a

b < p2 . Consider x = 0, y > 0. Note λ3 = 0, y = M

p2 ,

b b

= λ1 , a − p1 + λ2 = 0

p2 p2

( )

b b p1

λ2 = p1 − a = a −1 > 0

p2 a p2

a p1 b p1

since b < p2 or 1 < a p2 . Hence

( )

M b b p1

x = 0, y = , λ1 = , λ2 = a − 1 > 0, λ3 = 0

p2 p2 a p2

is a solution.

20.1. Inequality Constraint 191

x2

M

p2

M x1

( ) p1

a p1

Figure 20.2. Case (ii): b < p2 : Optimal Consumption Bundle = 0, M

p2

x

max f (x, y) = 1+x +y

(x,y)∈R2+

subject to x + 4y 6 16.

using Kuhn Tucker sufficiency theorem.

(i) Let { }

X = (x, y) ∈ R2 | x > −1, y > −1 .

Then X is open as its complement

{ }

X C = (x, y) ∈ R2 | x 6 −1, y 6 −1

is closed.

(ii) Function f (x, y) is continuous as x, y, 1 + x are continuous, 1 + x > 0 and f (·, ·) is obtained

by taking quotient of two continuous functions x and 1 + x, with non-vanishing denominator

192 20. Optimization Theory: Inequality Constraints

1

are linear and hence continuous. Further fx (x, y) = , f (x, y) = 1 are continuous func-

(1+x)2 y

tions. Hence f , g j ( j = 1, · · · , 3) are continuously differentiable on X.

y1 > −1, y2 > −1 → λy1 + (1 − λ) y2 > −1∀λ ∈ (0, 1)

( )

→ λx1 + (1 − λ) x2 , λy1 + (1 − λ) y2 ∈ X.

Function f (x, y) is concave as sum of two concave functions (exercise) and g j ( j = 1, · · · , 3) are

concave being linear functions. Hence for the following problem

x

max f (x, y) = 1+x +y

(x,y)∈X

subject to x + 4y 6 16, x > 0, y > 0.

all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈

X × R3+ , that satisfies the Kuhn-Tucker conditions. They are

1

− λ1 + λ2 = 0

(1 + x)2

1 − 4λ1 + λ3 = 0

16 − x − 4y > 0, λ1 (16 − x − 4y) = 0

x > 0, λ2 x = 0; y > 0, λ3 y = 0.

λ1 > 0 → 16 − x − 4y = 0

1 4

2

= λ1 ; 1 − + λ3 = 0

(1 + 16) 289

285

λ3 = − <0

289

This contradicts λ3 > 0.

20.1. Inequality Constraint 193

1

= λ1 ; 1 − λ1 + λ2 = 0

4

1 3

1 − + λ2 = 0; λ2 = − < 0.

4 4

This contradicts λ2 > 0.

1

= λ1 ;

(1 + x)2

1

1 − 4λ1 = 0, λ1 = > 0;

4

(1 + x)2 = 4 → x = 1 > 0

15

16 − x − 4y = 0 → y = > 0.

4

( )

Note that all conditions are satisfied. The Theorem asserts that 1, 15

4 is a global maxi-

mum and therefore solves both the problem.

Example 20.3.

In the above example, let the price of good y be p > 0 and income be I > 0. We can redo the

exercise by going over the Kuhn Tucker conditions again. They are

1

− λ1 + λ2 = 0

(1 + x)2

1 − pλ1 + λ3 = 0

I − x − py > 0, λ1 (I − x − py) = 0

x > 0, λ2 x = 0; y > 0, λ3 y = 0.

If λ1 = 0, then 1 − pλ1 + λ3 = 0 → λ3 = −1 < 0 which contradicts λ3 > 0. Hence

λ1 > 0 → I − x − py = 0

and x = y = 0 is ruled out because I > 0. There are three remaining cases.

1

= λ1

(1 + I)2

p p

1− 2

+ λ3 = 0 → λ3 = − 1.

(1 + I) (1 + I)2

194 20. Optimization Theory: Inequality Constraints

( )

p 2

If − 1 > 0 → p > (I + 1) , then λ3 > 0. So solution is I, 0, 1

, 0, p 2 − 1 if

(1+I)2 (1+I)2 (1+I)

p > (I + 1)2 .

1

= λ1

p

1

1 − λ1 + λ2 = 0 → 1 − + λ2 = 0

p

1

λ2 = − 1.

p

( )

If 1

p − 1 > 0 → 1 > p, then λ2 > 0. So solution is 0, pI , 1p , 1p − 1, 0 if p 6 1.

1

= λ1 , 1 − pλ1 = 0,

(1 + x)2

√

(1 + x)2 = p → x = p − 1 > 0

√

I +1− p

I − x − py = 0 → y = > 0.

p

√ (√ √ )

I+1− p

Hence for p > 1 and I + 1 > p, the solution is p − 1, p , 1p , 0, 0 .

( )

Combining them the solution x∗ , y∗ , λ∗1 , λ2∗ , λ∗3 is

( )

I, 0, 1

, 0, p

− 1 if p > (I + 1)2

( (1+I)2 (1+I)2

)

0, I 1 1

, , − 1, 0 if p 6 1, and

(

p p p

√ )

√ I+1− p

p − 1, p , 1p , 0, 0 if 1 < p < (I + 1)2 .

The Kuhn Tucker Sufficiency Theorem asserts that this solution is a global maximum and therefore

solves both the problem.

We know from the definitions, that if x̂ is a point of global maximum, then x̂ is also a point of local

maximum. The situations under which the converse is true are given by the following theorems.

Theorem 20.2. Suppose A is an open convex set in Rn , and f is a function from A to R.

20.2. Global maximum and constrained local maximum 195

global maximum of f on A.

is the unique point of global maximum of f on A.

(i) B(x̄, δ) ⊂ A, and

(ii) x̄ is the unique point of maximum of f on B(x̄, δ).

If f is quasi-concave on A, then x̄ is the unique point of global maximum of f on A.

(a) Assume that x̄ is not a global maximum of f on A. Then there exists another point x̂ ∈ A such

that x̂ ̸= x̄ and f (x̂) > f (x̄).

Since x̄ is a point of local maximum, there exists δ > 0 such that f (x̄) ≥ f (x) for all

x ∈ A ∩ B(x̄, δ).

Consider a point x ∈ A on the line joining the two points x̂ and x̄, i.e.,

x = λx̂ + (1 − λ)x̄,

for some λ ∈ [0, 1]. Since A is convex, we know x ∈ A. By concavity of f , we have for all

λ ∈ [0, 1]

f (λx̂ + (1 − λ)x̄) ≥ λ f (x̂) + (1 − λ) f (x̄).

Since f (x̂) > f (x̄), we also have for all λ ∈ (0, 1] that

f (λx̂ + (1 − λ)x̄) ≥ λ f (x̂) + (1 − λ) f (x̄) > λ f (x̄) + (1 − λ) f (x̄) = f (x̄).

We wish to take λ sufficiently close to zero (but not equal to zero) so that

x′ ≡ λx̂ + (1 − λ)x̄ ∈ B(x̄, δ).

For this, let us denote d(x̂, x̄) = d and note

d(x′ , x̄) = d(λx̂ + (1 − λ)x̄, x̄) = |λ|d(x̂, x̄) = λ · d.

δ

If we set λ = 2d , then we know

δ δ

d(x′ , x̄) = λ · d = ·d = ,

2d 2

or x′ ∈ B(x̄, δ).

196 20. Optimization Theory: Inequality Constraints

Also x′ ∈ A since A is a convex set. Therefore, we have found a point x′ ∈ A ∩ B(x̄, δ) such

that f (x′ ) > f (x̄), which contradicts that x̄ was a point of local maximum. It follows that x̄

must be a global maximum of f on A.

(b) Assume that x̄ is not a point of global maximum of f on A. Then there exists another point

x̂ ∈ A such that x̂ ̸= x̄ and f (x̂) > f (x̄).

Since x̄ is a point of local maximum, there exists δ > 0 such that f (x̄) ≥ f (x) for all

x ∈ A ∩ B(x̄, δ).

Consider a point x ∈ A on the line joining the two points x̂ and x̄, i.e.,

x = λx̂ + (1 − λ)x̄,

for some λ ∈ [0, 1]. Since A is convex, we know x ∈ A. Since f is strictly quasi-concave, we

have for all λ ∈ (0, 1)

{ }

f (λx̂ + (1 − λ)x̄) ≥ min f (x̂), f (x̄) = f (x̄).

We wish to take λ > 0 sufficiently small so that

x′ ≡ λx̂ + (1 − λ)x̄ ∈ B(x̄, δ).

For this, let us denote d(x̂, x̄) = d and note

d(x′ , x̄) = d(λx̂ + (1 − λ)x̄, x̄) = |λ|d(x̂, x̄) = λ · d.

δ

If we set λ = 2d , then we know

δ δ

d(x′ , x̄) = λ · d = ·d = ,

2d 2

or x′ ∈ B(x̄, δ).

Also x′ ∈ A since A is a convex set. Therefore, we have found a point x′ ∈ A ∩ B(x̄, δ) such

that f (x′ ) > f (x̄), which contradicts that x̄ was a point of local maximum. It follows that x̄

must be a global maximum of f on A.

To show uniqueness, if not, then there exists x′′ ∈ A such that

f (x̄) = f (x′′ ).

But then, since f is strictly quasi-concave and A is convex,

{ }

f (0.5x̄ + 0.5x′′ ) > min f (x̄), f (x′′ ) = f (x′′ ) = f (x̄).

This contradicts the fact that x̄ is a point of global maximum.

(c) Assume that x̄ is not the unique point of global maximum of f on A. Then there exists another

point x̂ ∈ A such that x̂ ̸= x̄ and f (x̂) > f (x̄).

Since x̄ is the unique point of local maximum in the open ball B(x̄, δ), f (x̄) > f (x) for all

x ∈ A ∩ B(x̄, δ).

Consider a point x ∈ A on the line joining the two points x̂ and x̄, i.e.,

x = λx̂ + (1 − λ)x̄,

20.2. Global maximum and constrained local maximum 197

for some λ ∈ [0, 1]. Since A is convex, we know x ∈ A. Since f is quasi-concave, we have for

all λ ∈ (0, 1)

{ }

f (λx̂ + (1 − λ)x̄) ≥ min f (x̂), f (x̄) = f (x̄).

We wish to take λ > 0 sufficiently small so that

x′ ≡ λx̂ + (1 − λ)x̄ ∈ B(x̄, δ).

For this, let us denote d(x̂, x̄) = d and note

d(x′ , x̄) = d(λx̂ + (1 − λ)x̄, x̄) = |λ| d(x̂, x̄) = λ · d.

δ

If we set λ = 2d , then we know

δ δ

d(x′ , x̄) = λ · d = ·d = ,

2d 2

or x′ ∈ B(x̄, δ).

Also x′ ∈ A since A is a convex set. Therefore, we have found a point x′ ∈ A ∩ B(x̄, δ) such

that f (x′ ) ≥ f (x̄), which contradicts that x̄ was the unique point of local maximum. It follows

that x̄ must be the unique point of global maximum of f on A.

This theorem shows that there is an important difference between concavity and quasi-concavity

in going from the local maximum property to the global maximum property. With quasi-concavity,

we need something more (some “strictness”) to make the arguments work. In (b), this additional

condition takes the form of strict quasi-concavity. In (c), it takes the form of assuming that the

point of local maximum is unique. This underlying theme (that one needs something in addition

to quasi-concavity to make the arguments and results work) recurs in Arrow- Enthoven’s theory

of quasi-concave programming, where the attempt is made to replace the concavity conditions of

Kuhn-Tucker with quasi-concavity.

The following example shows that in Theorem 20.2(a), we cannot replace concavity of f by

quasi-concavity of f , and still preserve the conclusion.

Example 20.4. Let A be the interval (0, 6) in R. Clearly, A is an open, convex set. Let f : A → R

be defined as follows:

x for x ∈ (0, 2)

f (x) = 2 for x ∈ [2, 4]

x−2 for x ∈ (4, 6)

Then, f is a non-decreasing function on A, and therefore quasi-concave. The point x̄ = 3 is clearly

a point of local maximum, since f (x̄) = 2 ≥ f (x) for all x ∈ A ∩ B(x̄, 1). However, x̄ is not a point

of global maximum of f on A, since (for example), f (5) = 3 > 2 = f (x̄).

198 20. Optimization Theory: Inequality Constraints

Following theorem describes the conditions in which a point of constrained local maximum, x̂

is also a point of constrained global maximum.

Theorem 20.3. Let X be a convex set in Rn . Let f , g j ( j = 1, · · · , m) be concave functions on

X. Suppose x̂ is a point of constrained local maximum. Then, x̂ is a point of constrained global

maximum.

{ }

C = x ∈ X : g j (x) ≥ 0 .

Since x̂ is a point of constrained local maximum, there is δ > 0, such that for all x ∈ B(x̂, δ) ∩C, we

have f (x) ≤ f (x̂).

Now, if x̂ is not a point of constrained global maximum, then there is some x̄ ∈ C, such that

f (x̄) > f (x̂). One can choose 0 < θ < 1 with θ sufficiently close to zero, such that

x̃ ≡ [θ x̄ + (1 − θ)x̂] ∈ B(x̂, δ).

For this, we need

∥θ x̄ + (1 − θ)x̂ − x̂∥ < δ.

This implies if

δ

θ<

∥x̄ − x̂∥

then x̃ ∈ B(x̂, δ). Take

δ

θ=

2∥x̄ − x̂∥

so that x̃ ∈ B(x̂, δ). Since X is convex and g j ( j = 1, · · · , m) are concave, we claim that C is a convex

set, and x̃ ≡ [θ x̄ + (1 − θ)x̂] ∈ C.

Let y ∈ C and y′ ∈ C be two arbitrary points. By definition of the constraint set C, y and y′ are

in X and therefore, ŷ ≡ [λ y + (1 − λ)y′ ] ∈ X for all λ ∈ [0, 1]. Also by concavity of the constraint

functions,

g j (ŷ) = g j (λ y + (1 − λ)y′ ) ≥ λ g j (y) + (1 − λ)g j (y′ ) ≥ λ · 0 + (1 − λ) · 0 = 0,

for all j = 1, · · · , m.

Therefore, x̃ ≡ [θ x̄ + (1 − θ)x̂] ∈ C.

Thus

x̃ = [θ x̄ + (1 − θ)x̂] ∈ B(x̂, δ) ∩C.

20.2. Global maximum and constrained local maximum 199

f (x̃) = f (θ x̄ + (1 − θ)x̂) ≥ θ f (x̄) + (1 − θ) f (x̂) > θ f (x̂) + (1 − θ) f (x̂) = f (x̂).

But this contradicts the fact that x̂ is a point of constrained local maximum.

Observe that we did not need to assume that the objective function is differentiable on the

domain X in this proof.

Chapter 21

Problem Set 8

Let C = (c1 , c2 , c3 ) be a non-zero vector in R3 . Consider the following constrained maxi-

mization problem:

max ∑3i=1 ci xi

(21.1) subject to ∑3i=1 xi2 = 1

and (x1 , x2 , x3 ) ∈ R3

(a) Show, by using Weierstrass theorem, that there exists x̄ ∈ R3 which solves (31.1).

(b) Use Lagrange’s theorem to show that

3

(21.2) ∑ ci x̄i = ∥C∥.

i=1

(c) Let p, q be arbitrary non-zero vectors in Rn . Using result in (b), show that |p·q| ≤ ∥p∥·∥q∥.

Solve the following constrained optimization problems.

(2) Let f : R2 → R.

max f (x, y) = x2 − 3xy

(21.3) (x,y)∈R2+

subject to x + 2y = 10.

1 2

max f (x, y) = x 3 y 3

(21.4) (x,y)∈R2+

subject to 2x + y = 4.

201

202 21. Problem Set 8

√

max f (x, y) = xy

(21.5)

subject to x + y 6 6, x > 0, y > 0.

max f (x, y) = x + ln(1 + y)

(21.6)

subject to x ≥ 0, y ≥ 0 and x + py ≤ m.

(6) Let X be a non-empty, convex set in R2 . Let g be a continuous function from X to R, and

let f be a strictly quasi-concave function from X to R. Consider the following constrained

optimization problem.

max f (x)

(21.7) subject to g(x) ≥ 0

and x∈X

}

max f (x)

(21.8)

subject to x ∈ X

(a) Suppose that x̄ is a solution to (31.26), and g(x̄) > 0. Is x̄ also a solution to problem

(31.25)? Explain.

(b) Suppose that x̄ is a solution to (31.26), but x̄ is not a solution to (31.25). Show that if x̂ is

any solution to (31.25), then we must have g(x̂) = 0.

(7) Suppose that a consumer has the utility function U(x, y) = xa yb and faces the budget constraint

px x + py y ≤ I.

(A) Utility Maximization

(a) What are the first order conditions for utility maximization?

(b) Solve for the consumer’s demands for goods x and y.

(c) Solve for the value of λ. What is the economic interpretation of λ? When is λ an

increasing, decreasing or constant function of income?

(d) Show that the second order conditions hold?

(e) Show that the implicit function theorem value of dx dI is identical to the value of taking

∗

the partial derivative of x with respect to I.

21. Problem Set 8 203

and income. Use x∗ and y∗ to solve for the indirect utility function. Is it true that the

partial of the indirect utility function with respect to income equals λ?

(B) Expenditure Minimization:

Now consider the “dual ”of the utility maximization problem. The dual problem is to min-

imize expenditures, Px x + Py y, subject to reaching a given level of utility, u0 (the constraint

is therefore U0 − xa yb = 0).

(a) What are the first order conditions for expenditure minimization?

(b) Use the first order conditions to solve for x∗ and y∗ (these are called the Hicksian or

compensated demand functions).

(c) Check the second order conditions.

(d) Write the level of income, I, necessary to reach U0 as a function of U0 , prices, and

parameters. How does this expenditure function relate to the indirect utility function?

(e) To avoid confusion, let us call solution for utility maximization of good x as x∗ and

solution for good x in expenditure minimization as h∗ . Prove that

∂x∗ ∂h∗ ∂x∗

= − x∗ .

∂Px ∂Px ∂I

Interpret this answer.

(8) Suppose a consumer has the utility function U = a ln(x − x0 ) + b ln(y − y0 ) where a, b, x0 and

y0 are positive parameters. Assume that the usual budget constraint applies.

(a) Solve for the consumer’s demand for good x.

(b) Find the elasticities of demand for good x with respect to income and prices.

(c) Show that the utility function U = 45(x − x0 )3.5a (y − y0 )3.5a would have yielded the same

demand for good x.

Suppose a consumer has the utility function,

where a > 0, b > 0 and c > 0 are such that a + b + c = 1. The budget constraint is

px + qy + rz ≤ I.

In other words, the prices of good x, y and z are p, q and r respectively and the consumer has

an income I. The prices and income are positive.

In addition, the consumer faces a rationing constraint. He is not allowed to buy more than

k > 0 units of good x.

(a) Solve the optimization problem.

(b) Under what condition on the various parameters, is the rationing constraint binding?

204 21. Problem Set 8

(c) Show that when the rationing constraint binds, the income that the consumer would have

liked to spend on good x but cannot do so now is split between good y and z in proportions

b : c.

(d) Would you expect rationing of bread purchases to affect demand for butter and rice in this

way? If not, how would you expect the bread-butter-rice case to differ from the result in

(c)?

Chapter 22

Envelope Theorem

Let f (x, α) be a continuously differentiable function of x ∈ Rn and a parameter α. For each choice

of α, consider the unconstrained maximization problem:

max f (x, α)

where choice variable is x. It is of interest to us as to how the maximizer value x∗ changes as the

parameter value α changes.

Theorem 22.1. Let x∗ (α) be a solution of this problem and also assume that x∗ (α) is a continuously

differentiable function of α. Then,

d ∂

f (x∗ (α), α) = f (x∗ (α), α)

dα ∂α

d ∂ d ∗ ∂

f (x∗ (α), α) = ∑ f (x∗ (α), α) · xi (α) + f (x∗ (α), α)

dα i ∂x i dα ∂α

or

d ∂

f (x∗ (α), α) = f (x∗ (α), α)

dα ∂α

∂

since ∂xi f (x∗ (α), α) = 0 for i = 1, · · · , n by the First Order conditions for the solution.

205

206 22. Envelope Theorem

Example 22.1. Consider the problem of maximizing the function f (x, a) = −2x2 + 2ax + 4a2 with

respect to x for any given value of a. What is the effect of a unit increase in the value of a on the

maximum value of f (x, a).

This can be done directly by computing the x∗ which maximizes f . The first order condition

yields

f ′ (x) = −4x + 2a = 0.

So x∗ = 0.5a. We can plug this into f (x, a) which leads to

Observe that f (x∗ (a), a) increases at the rate of 9a as a increases. Alternatively we could apply the

Envelope Theorem to get

d f ∗ ∂ f (x∗ (a), a)

= = 2x∗ + 8a = 9a

da ∂a

since x∗ (a) = 0.5a.

(x)∈R+

Let us denote the input level at which the maximum profit is attained by x∗ . We observe that x∗ is a

function of the parameters p and w. The maximum profit is the value function of this exercise and

we call it the profit function.

By Envelope Theorem

∂π∗ (p, w)

= f (x∗ (p, w)) > 0.

∂p

Thus the profit function is increasing in the price of the output. Also

∂π∗ (p, w)

= −x∗ (p, w) < 0.

∂w

So the profit function is decreasing in the price of the input. Further, it also shows that

∂π∗ (p, w)

x∗ (p, w) = − .

∂w

The profit maximizing input stock can be obtained by taking partial derivative of the profit function

with respect to w (a result known as Hotelling’s Lemma).

22.2. Meaning of the Lagrange multiplier 207

In this section we will see that the multipliers measure the sensitivity of the optimal value of the

objective function to the changes in the right-hand sides (parameters) of the constraints. In this

sense, they provide a natural measure of the value for scarce resources in economic maximization

problems.

Consider a simple maximization problem with two variables and one equality constraint. Let

f R2 → R be denoted as f (x, y).

:

max f (x, y)

(22.2) (x,y)∈R2+

subject to h(x, y) = a.

Let (x∗ (a), y∗ (a)) be a solution to the above problem for any given parameter value a. Thus

f (x∗ (a), y∗ (a)) is the corresponding optimal value of the objective function. Let the Lagrange

multiplier be denoted by λ∗ (a). Following theorem shows that λ∗ (a) measures the rate of change

of the optimal value of the objective function f with respect to a.

Theorem 22.2. Let f and h be continuously differentiable functions of two variables. For any fixed

value of the parameter a, let (x∗ (a), y∗ (a)) be the solution of the optimization problem (22.2) with

the corresponding Lagrange multiplier λ∗ (a). Assume that x∗ (a), y∗ (a) and λ∗ (a) are continuously

differentiable functions of a and the constraint qualification holds at (x∗ (a), y∗ (a)). Then,

d f (x(a), y(a))

λ∗ (a) = .

da

L ≡ f (x, y) − λ(h(x, y) − a)

where a is a parameter. The solution of this problem, (x∗ (a), y∗ (a)), λ∗ (a) satisfies the First Order

conditions.

∂L (x∗ (a),y∗ (a),λ∗ (a)) ∂ f (x∗ (a),y∗ (a),λ∗ (a)) ∗ ∗ ∗

∂L (x∗ (a),y∗ (a),λ∗ (a)) ∗ ∗ ∗

∂ f (x (a),y (a),λ (a)) ∗ ∗ ∗

∂L (x∗ (a),y∗ (a),λ∗ (a))

∂λ = h(x∗ (a), y∗ (a)) − a = 0

for all values of a. Also, since h(x∗ (a), y∗ (a)) = a for all a, we get,

∂h(x∗ (a), y∗ (a), λ∗ (a)) dx∗ (a) ∂h(x∗ (a), y∗ (a), λ∗ (a)) dy∗ (a)

+ =1

∂x da ∂y da

208 22. Envelope Theorem

for all a. Now we can use the Chain Rule and the two First Order conditions,

d f (x∗ (a),y∗ (a) ∂ f (x∗ (a),y∗ (a)) dx∗ (a) ∗ ∗ (a)) dy∗ (a)

da = ∂x · da + ∂ f (x (a),y ∂y · da

∂h(x ∗ (a),y∗ (a),λ∗ (a)) dx∗ (a) ∂h(x ∗ (a),y∗ (a),λ∗ (a)) dy∗ (a)

= λ∗ ∂x da + λ

∗

∂y da

∂h(x ∗ (a),y∗ (a),λ∗ (a)) dx∗ (a) ∂h(x ∗ (a),y∗ (a),λ∗ (a)) dy∗ (a)

= λ∗ [ ∂x da + ∂y da ]

= ∗

λ ·1 = λ ∗

The general envelope theorem arises in the case of constrained optimizations where both the ob-

jective function as well as the constraint functions are functions of some parameters. Consider for

example the optimization exercise as follows:

max f (x, a)

subject to g j (x, a) = 0 for j = 1, · · · , m

(22.3) and x∈ Rn+ .

In this case, the objective function f as well as the constraints g1 , · · · , gm depend on the

parameter a. Following theorem shows that the rate of change of f (x∗ (a), a) with respect to a

equals the partial derivative with respect to a, not of f but of the corresponding Lagrangian function

L.

Theorem 22.3. Let f , g1 , · · · , gm be continuously differentiable functions and let

x∗ (a) = (x1∗ (a), x2∗ (a), · · · , xn∗ (a))

denote the solution of the optimization problem (22.3) for any fixed value of the parameter a.

Assume that x∗ (a), and the Lagrange multipliers λ∗1 (a), · · · , λ∗m (a) are continuously differentiable

functions of a and the constraint qualification condition holds. Then,

d f (x∗ (a), a) ∂L (x∗ (a), λ∗ (a), a)

(22.4) =

da ∂a

∂ f (x∗ (a), a) ∂g1 (x∗ (a), a) ∂gm (x∗ (a), a)

(22.5) = − λ∗1 − · · · − λ∗m .

∂a ∂a ∂a

Chapter 23

Elementary Concepts in

Probability

Probability theory deals with random events, events whose occurrence cannot be predicted with

certainty. There are at least three sources of randomness. Firstly by nature many features of our

world are stochastic. Evolution of such a diverse variety of life is witness to unpredictability in

the universe and environment. Second source of randomness: Many events are the result of a very

large number of actions and decisions. Third source of randomness: Some variables may appear

random because they are measured with error.

Even though we are not sure about the outcomes of a random event, we can attach to each

outcome a number called probability.

We first describe the set of outcomes of a random event, i.e., a set whose elements are all possible

outcomes of a random event. It is known as the sample space and denoted by Ω.

Example 23.1. The set of possible outcomes of flipping a fair coin is

Ω = {H, T }.

Ω = {1, 2, 3, 4, 5, 6}

209

210 23. Elementary Concepts in Probability

Ω = {HT, T H, T T, HH}.

It is easy to list the set of outcomes for flipping n coins, but very soon the list becomes too long.

(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)

(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)

(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)

Ω=

(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)

(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)

(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)

where the outcome (i, j) is said to occur if i appeared on the first die, and j appeared on the second

die.

The set of outcomes for measuring the lifetime of a car, consists of non-negative real numbers.

Ω = [0, ∞).

Next we form the set F that contains all elements of the set Ω as well as their unions and

complements. Thus if A and B are in F , so does A ∪ B, Ac , and Bc . The set F , which is closed

under the operations of union and complements, is known as algebra.

Example 23.2. The algebra for the outcomes of flipping a fair coin is

/ Ω, {H}, {T }}.

F = {0,

The algebra for the outcomes for flipping two coins is

/ Ω, {T T }, {HH}, {HT, T H}, {HH, T T }, {HH, HT, T H}, {T T, HT, T H}}.

F = {0,

We can now define a probability measure by assigning to each element of sample space Ω, a

probability P.

Definition 23.1. The set function P is called a probability measure if

/ = 0;

(i) P(0)

(ii) P(Ω) = 1;

23.1. Discrete Probability Model 211

/

(iii) P(A ∪ B) = P(A) + P(B) for all A, B ∈ Ω and A ∩ B = 0.

The three conditions listed above are the axioms of probability theory.

Example 23.3. For the outcomes of flipping two fair coins,

P(HH) = P(HT ) = P(T T ) = P(T H) = 0.25.

The triple of the set of outcomes, the algebra, and the probability measure (Ω, F , P) is referred

to as a probability model.

In next step, we assign probabilities to the random events. Three sources of attaching proba-

bilities to the outcomes of random events are (a) equally likely events, (b) long run frequencies and

(c) degree of confidence (subjective or Bayesian approach). Observe that even though we assign

probabilities to different events, the mathematical theory for dealing with the random events and

their probabilities remain the same.

We define random variable next. The rule that specifies a real number to the outcomes is called

a random variable. More formally,

Definition 23.2. A random variable is a set function that maps the set of outcomes of a random

event to the set of real numbers.

Such a function is not unique and depending on the the purpose at hand, we may define one or

many random variables to the same random event.

Example 23.4. For the outcomes of flipping two fair coins, let us define a random variable X as

the number of heads. Then, we have

X(HH) = 2; X(HT ) = X(T H) = 1, X(T T ) = 0.

We could have defined the random variable X as the number of tails. Then, we have

X(HH) = 0; X(HT ) = X(T H) = 1, X(T T ) = 2.

In collecting labor statistics, we are interested in the characteristics of the respondents. For

example, we may ask if a person is in the labor force or not, employed or unemployed. We could

also be interested to learn the demographic characteristics of the respondents like gender, race, age

etc. For each of these answers we can define one or more binary variables. For example let X = 1 if

a respondent who is in the labor force is unemployed and X = 0 if employed. We can define Y = 1

if the respondent is a woman and employed, Y = 0 otherwise.

212 23. Elementary Concepts in Probability

Example 23.5. For the outcomes of flipping a fair coin three times, let us define a random variable

X as the number of heads. The set of outcomes for flipping a fair coin three times is

Ω = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }.

Then, the probability distribution is

P(X = 0) = 0.125; P(X = 1) = 0.375; P(X = 2) = 0.375; P(X = 3) = 0.125.

Probability distributions become unwieldy as the number of outcomes becomes large or infi-

nite. One way to summarize the information about a probability distribution is through its moments

as mean which measure the central tendency, and variance, which measures the dispersion or vari-

ability of the distribution. Another moment reflects the skewness of the distribution to the left or

to the right and kurtosis which is an indicator of the bundling of the outcomes near the mean : the

more values are concentrated near the mean, the taller is the peak of the distribution.

The first moment of the distribution which is the expected value or the mean of the distribution

is defined as

n

E(X) = µ = ∑ xi P(xi ).

i=1

Example 23.6. For the distribution of the number of heads in three flips of a coin, we have,

µ = 0 · P(X = 0) + 1 · P(X = 1) + 2 · P(X = 2) + 3 · P(X = 3).

which yields the mean as

µ = 0 + 0.375 + 0.750 + 0.375 = 1.50

n

E(X r ) = mr = ∑ xir P(xi ).

i=1

Example 23.7. For the distribution of the number of heads in three flips of a coin, the second

moment is

E(X 2 ) = 02 · P(X = 0) + 12 · P(X = 1) + 22 · P(X = 2) + 32 · P(X = 3).

which yields the second moment as

µ = 0 + 0.375 + 1.50 + 1.125 = 3

Another measure (which is of great importance) is the variance or the second moment around

the mean :

n

E(X − µ)2 = σ2 = ∑ (xi − µ)2 P(xi ).

i=1

23.2. Marginal and Conditional Distribution 213

The formula for the variance can be rewritten using the binomial expansion as

n

E(X − µ)2 = ∑ (xi − µ)2 P(xi )

i=1

n n

= ∑ xi2 P(xi ) − 2µ ∑ xi P(xi ) + µ2

i=1 i=1

n

= ∑ xi2 P(xi ) − µ2

i=1

Example 23.8. For the distribution of the number of heads in three flips of a coin, the variance is

Mean is a measure of central tendency of a distribution showing its center of gravity whereas

the variance and its square root, called the standard deviation measure the dispersion or the volatil-

ity of the distribution. The advantage of using the standard deviation is that it measures the disper-

sion in the same measurement units as the original variable. In finance, variance of returns of an

asset is used as a measure of risk.

As we have observed before, a random event may give rise to a number of random variables each

defined by a different set function whose domains are the same set. In the Table below we present

such a situation where random variables X and Y and their probabilities are reported. Think of

Y as the annual income in units of thousand dollars of a profession and X as gender, with X = 0

denoting men and X = 1 denoting women. The information contained in the table is probability

of joint events, i.e., the probability of X and Y each taking a particular value. For instance the

probability of X = 1 and Y = 120 is 0.11, which is denoted as

Such a probability is referred to as joint probability because it shows the probability of a woman

earning $120000 a year.

214 23. Elementary Concepts in Probability

X Y P

0 60 0.02

0 70 0.04

0 80 0.07

0 90 0.09

0 100 0.10

0 110 0.06

0 120 0.03

0 130 0.02

0 140 0.01

0 150 0.01

1 70 0.01

1 80 0.02

1 90 0.04

1 100 0.08

1 110 0.11

1 120 0.11

1 130 0.09

1 140 0.05

1 150 0.03

1 160 0.01

If we are interested only in X, then we can sum up the overall relevant values of Y and get the

marginal probability of X. For example,

P(X = 1) = P(X = 1,Y = 70) + · · · + P(X = 1,Y = 160) = 0.01 + 0.02 + · · · + 0.03 + 0.01 = 0.55.

n

P(X = xk ) = ∑ P(X = xk ,Y = y j )

j=1

In similar manner, we can calculate the probability of X = 0 which would be 0.45. Thus the

marginal distribution of X is

X P(X)

0 0.45

1 0.55

23.2. Marginal and Conditional Distribution 215

Observe that in this example, the marginal distribution of X shows the distribution of men and

women in that profession (45% men and 55% women), whereas the marginal distribution of Y

would show the distribution of income for both men and women, i.e., profession as a whole.

Sometimes we may be interested to know the probability of Y = 110 when we already know

that X = 1. Thus we want to know the conditional probability of Y = 110, given that X = 1.

P(Y = 110, X = 1) 0.11

P(Y = 110|X = 1) = = = 0.20

P(X = 1) 0.55

In general,

P(Y = y j , X = xk )

P(Y = y j |X = xk ) = .

P(X = xk )

We have computed the conditional distribution of Y |X = 0 and Y |X = 1.

60 0.044 70 0.018

70 0.089 80 0.036

80 0.156 90 0.073

90 0.2 100 0.145

100 0.222 110 0.200

110 0.133 120 0.200

120 0.067 130 0.164

130 0.044 140 0.091

140 0.022 150 0.055

150 0.022 160 0.018

A conditional distribution has a mean, variance and other moments. The mean is

n

E(Y |X = xk ) = ∑ y j P(y j |X = xk ).

j=1

Variance and other higher moments of the conditional distribution can be computed similarly.

n

E(Y |X = 0) = ∑ y j P(y j |X = 0)

j=1

= 60 × 0.044 + 70 × 0.089 + 80 × 0.156 + 90 × 0.2 + 100 × 0.222

+ 110 × 0.133 + 120 × 0.067 + 130 × 0.044 + 140 × 0.022 + 150 × 0.022

= 101.4.

. We can compute the conditional mean for X = 1 to be

216 23. Elementary Concepts in Probability

E(Y |X = 1) = 111.4.

n

E(Y ) = EX E(Y |X) = ∑ E(Y |X = x j )P(X = x j )

j=1

E(Y ) = E(Y |X = 0)P(X = 0) + E(Y |X = 1)P(X = 1)

= 101.4 × 0.45 + 111.4 × 0.55 = 107.9

It is easy to infer that if E(Y |X = x j ) = 0 for all values of x, i.e., the conditional expectation of Y

equals zero, then the unconditional expectation E(Y ) = EX E(Y |x) = 0. However, the reverse is not

true. E(Y ) = 0 does not imply that E(Y |X) = 0 for all values of x.

Many variables we come across in economics are continuous in nature as against discrete. In

assigning probabilities to continuous variables, we face the problem that no matter how small is

the interval of values of the continuous variable, there are infinitely many points in it. If we assign

positive probabilities to each point, the sum of such probabilities would diverge which violates the

axiom of probability theory, i. e., the sum of probabilities should add up to one.

This problem is circumvented by assigning probabilities to the segments of the interval within

which the random variable is defined.

P(X ≤ 5), or P(−4 < X ≤ 2)

Example 23.9. A simple example of a continuous random variable is the uniform distribution.

Variable X can take any value between a and b and the probability of X falling within the segment

[a, c] is proportional to the length of the interval compared to the interval [a, b].

c−a

P(a < X ≤ c] =

b−a

F(x) = P(X ≤ x)

and has to conform to the following conditions:

23.4. Continuous Random Variables 217

F(x1 ) ≤ F(x2 ), if x1 < x2 .

(c)

F(−∞) = lim F(x) = 0, and F(∞) = lim F(x) = 1.

x→−∞ x→∞

These conditions are the counterpart of the discrete case and entail that probability is always posi-

tive and the sum of probabilities adds to one.

Now we define the probability model for continuous random variables. Consider the extended

real line R = R ∪ {−∞, ∞} which shall play the same role for the continuous variables as Ω plays

for the discrete variables, (the set of all possible outcomes). Consider the half closed intervals on

R,

(a, b] = [x ∈ R : a < x ≤ b}]

and form finite sums of such intervals provided the intervals are disjoint:

n

A = ∑ (a j , b j ], n < ∞.

j=1

A set consisting of all such sums plus the empty set 0/ is an algebra, but it is not a σ-algebra. The

smallest σ-algebra that contains this set is called the Borel set and is denoted by B(R). Finally we

define the probability measure as

F(x) = P(−∞, x].

The triple (R, B(R), P) is our probability model for continuous random variables.

Chapter 24

Solution to PS 1

table.

A B A ∧ B A ∨ B ∼ (A ∧ B) ∼ A ∼ B ∼ A∨ ∼ B ∼ (A ∨ B) ∼ A∧ ∼ B

1 2 3 4 5 6 7 8 9 10

T T T T F F F F F F

T F F T T F T T F F

F T F T T T F T F F

F F F F T T T T T T

(b) Claim (b) is proved by comparing columns 9 and 10.

(c) ∼ (A ⇒ B) ⇔ A ∧ ∼ B

A B A ⇒ B ∼ (A ⇒ B) ∼ B A∧ ∼ B

1 2 3 4 5 6

T T T F F F

T F F T T T

F T T F F F

F F T F T F

219

220 24. Solution to PS 1

A B C A ⇒ C B ⇒ C A ∨ B (A ⇒ C) ∧ (B ⇒ C) (A ∨ B) ⇒ C

1 2 3 4 5 6 7 8

T T T T T T T T

T T F F F T F F

T F T T T T T T

T F F F T T F F

F T T T T T T T

F T F T F T F F

F F T T T F T T

F F F T T F T T

(e) This claim is true.

If n is even, then n + 1 is odd. If n is odd, then n + 1 is even. Hence both cannot be even.

(f) Let n = 1, then n2 = 1 = n.

(g) Let x > 1.

x ∈ NO ⇔ ∃n ∈ N x = 2n + 1,

x 2

= (2n + 1)2

= 4n2 + 4n + 1

( )

= 2 2n2 + 2n + 1

(24.1) ⇒ x 2 ∈ NO .

For x = 1, x2 = 1 which is odd.

(a) Set S is closed and bounded, and S is not compact.

(b) Set S is compact, and S is either not closed or unbounded.

(c) Function f is continuous and not differentiable.

(b) If there does not exist a y such that xy = 1, then x = 0.

(4) (a) The mistake is in assuming the same value of k for m and n. The correct proof should be

Proof. Since m is an even integer and n is an odd integer, m = 2k and n = 2p + 1 for some

integers k and p. Therefore, 2m + 3n = 2(2k) + 3(2p + 1) = 4k + 6p + 3 = 2(2k + 3p +

1) + 1 = 2l + 1; where l = 2k + 3p + 1. Since k, p ∈ Z, l ∈ Z. Hence, 2m + 3n = 2l + 1 for

some integer l, whence 2m + 3n is an odd integer.

(b) The mistake is in showing the claim for one particular value of n. The claim holds for all

positive integers. The correct proof should be

24. Solution to PS 1 221

all positive integers.

where m ∈ N, then y = 2 (2m). Hence y is divisible by 2.

(ii) Contradiction There exist a number y which is not divisible by 2 but is divisible

by 4. Since y = 4m where m ∈ N, we know that y = 2 (2m) and so y is divisible by 2.

This contradicts our initial assumption.

(b) There is no greatest negative real number.

Proof. Assume, to the contrary, that there is a greatest negative real number x. Then, x ≥ y

for every negative real number y. Consider the number 2x . Since x is a negative real number,

so too is 2x . Multiplying both sides of the inequality 12 < 1 by x, which is negative, gives

x x

2 > x. Hence, 2 is a negative real number that is greater than x, which is a contradiction.

Hence our assumption that there is a greatest negative real number is false. Thus there is

no greatest negative real number.

(c) The product of an irrational number and a nonzero rational number is irrational.

Proof. Assume, to the contrary, that there exists a non-zero rational number p and an

irrational number q whose product is a rational number. Thus, by definition of rational

numbers, p = ab and p · q = r = dc for some integers a; b; c and d witha ̸= 0, b ̸= 0 and

d ̸= 0. Hence,

c

r d bc

q= = a =

p b ad

r ∈ Q, which is a contradiction. Hence our assumption that there exists a non-zero rational

number and an irrational number whose product is a rational number is false. Thus, the

product of a rational number and an irrational number is irrational.

When n = 1, the statement P(1) : 1 = 12 holds trivially.

(ii) For every integer n > 1, let P(n) be the statement P(n) : 1 + 3 + · · · + (2n − 1) = n2 .

For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1

and assume that P(k) is true; that is, assume that 1 + 3 + · · · + (2k − 1) = k2 . For the

inductive step, we need to show that P(k + 1) is true. That is, we show that

222 24. Solution to PS 1

1 + 3 + · · · + (2k − 1) + (2k + 1) = (1 + 3 + · · · + (2k − 1)) + (2k + 1)

= k2 + (2k + 1) (by the inductive hypothesis)

= (k + 1)2 ;

thus verifying that P(k + 1) is true.

(iii) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1;

that is,

1 + 3 + · · · + (2n − 1) = n2

is true for every positive integer n.

(b) (i) Base of induction:

When n = 1, the statement P(1) : 1 = 1(1+1) 2 is certainly true since 1(1+1)

2 = 22 = 1.

This establishes the base case when n = 1.

(ii) For every integer n > 1, let P(n) be the statement P(n) : 1 + 2 + · · · + n = n(n+1)

2 . For

the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1 and

assume that P(k) is true; that is, assume that 1 + · · · + k = k(k+1)

2 . For the inductive

step, we need to show that P(k + 1) is true. That is, we show that

(k + 1)(k + 2)

1 + 2 + · · · + k + (k + 1) = .

2

Evaluating the left-hand side of this equation, we have

1 + 2 + · · · + k + (k + 1) = (1 + 2 + · · · + k) + (k + 1)

k(k + 1)

= + (k + 1) (by the inductive hypothesis)

2

k(k + 1) 2(k + 1)

= +

2 2

(k + 1)(k + 2)

= ;

2

thus verifying that P(k + 1) is true.

(iii) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1;

that is,

n(n + 1)

1+2+···+n =

2

is true for every positive integer n.

(c) (i) Base of induction:

[ ]2

When n = 1, the statement P(1) : 1 = 1(1+1) 2 = 1 is certainly true since 1(1+1)

2 =

2

= 1. This establishes the base case when n = 1.

2 [ ]2

(ii) For every integer n > 1, let P(n) be the statement P(n) : 13 +23 +· · ·+n3 = n(n+1)

2 .

For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k > 1

24. Solution to PS 1 223

[ ]2

and assume that P(k) is true; that is, assume that 13 + · · · + k3 = k(k+1)

2 . For the

inductive step, we need to show that P(k + 1) is true. That is, we show that

[ ]

(k + 1)(k + 2) 2

1 + 2 + · · · + k + (k + 1) =

3 3 3 3

.

2

Evaluating the left-hand side of this equation, we have

13 + · · · + k3 + (k + 1)3 = (13 + · · · + k3 ) + (k + 1)3

[ ]

k(k + 1) 2

= + (k + 1)3 (by the inductive hypothesis)

2

[ ]

2 4(k + 1)

2 k

= (k + 1) +

4 4

[ ] [ ]

(k + 1) 2 2 (k + 1) 2

= [k + 4k + 4] = (k + 2)2

2 2

[ ]

(k + 1)(k + 2) 2

= ;

2

thus verifying that P(k + 1) is true.

(iii) Hence, by the principle of mathematical induction, P(n) is true for all integers n ≥ 1;

that is,

[ ]

n(n + 1) 2

1 +···+n =

3 3

2

is true for every positive integer n.

(d) It is an example of an arithmetic - geometric series. Let us denote the sum by S, i.e.

Multiplying both sides by q, we get

qS = aq + (a + r)q2 + (a + 2r)q3 + · · · + (a + (n − 1)r)qn

Subtracting it from S, we get

All terms except the first and the last term on the right hand side constitute a geometric

series with first term being rq, the common ratio being q and the number of terms being

n − 1.

a − (a + (n − 1)r)qn rq + rq2 + · · · + rqn−1

S= +

1−q 1−q

224 24. Solution to PS 1

qr(1 − qn−1 )

.

1−q

We substitute this for the sum and get S as

rq(1−qn−1 )

a − (a + (n − 1)r)qn (1−q)

S= +

1−q 1−q

a − (a + (n − 1)r)qn rq(1 − qn−1 )

= + .

1−q (1 − q)2

(7) To show that the formula holds for n = 0, we must show that

0

r0+1 − 1

∑ ri = r−1

.

i=0

The left-hand side of this equation is ∑0i=0 ri = r0 = 1, while the right-hand side is r r−1−1 = 1,

0+1

since r ̸= 1. Hence the formula holds for n = 0. For the inductive hypothesis, let k be an

arbitrary (but fixed) integer such that k ≥ 0 and assume that ∑ki=0 ri = r r−1−1 . For the inductive

k+1

k+1 i rk+2 −1

step, we need to show that ∑i=0 r = r−1 . Evaluating the left-hand side of this equation, we

have

k+1 k

∑ ri = ∑ ri + rk+1 (writing the (k + 1)st term separately)

i=0 i=0

rk+1 − 1

= + rk+1 (by the inductive hypothesis)

r−1

rk+1 − 1 (r − 1)rk+1

= +

r−1 r−1

r − 1 + r − rk+1

k+1 k+2

=

r−1

r −1

k+2

= ;

r−1

thus verifying the claim. Hence, by the principle of mathematical induction, the formula is true

for all integers n ≥ 0.

In the limiting case of n → ∞, the sum is well-defined for |r| < 1. Also the sum is 1−r 1

in

this case. In case of |r| ≥ 1 it is not well defined in case of n → ∞, though it is defined for all

n ∈ N.

(8) (a) We proceed by mathematical induction. When n = 2, the result is true since in this case

n3 − n = 23 − 2 = 8 − 2 = 6 and 6 is divisible by 6. Hence, the base case when n = 2 is

true. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that k ≥ 2

24. Solution to PS 1 225

and assume that the property holds for n = k, i.e., suppose that k3 − k is divisible by 6. For

the inductive step, we must show that the property holds for n = k + 1. That is, we must

show that (k + 1)3 − (k + 1) is divisible by 6. Since k3 − k is divisible by 6, there exists,

by definition of divisibility, an integer r such that k3 − k = 6r. Now, by the laws of algebra

and the inductive hypothesis, it follows that

= (k3 − k) + 3(k2 + k)

= 6r + 3k(k + 1)

Now, k(k + 1) is a product of two consecutive integers, and is therefore even. Hence,

k(k + 1) = 2s for some integer s. Thus, 6r + 3k(k + 1) = 6r + 3(2s) = 6(r + s), and so, by

substitution, (k + 1)3 − (k + 1) = 6(r + s), which is divisible by 6. Therefore, (k + 1)3 −

(k + 1) is divisible by 6, as desired. Hence, by the principle of mathematical induction, the

property holds for all integers n ≥ 2.

(b) We proceed, as before, by mathematical induction. When n = 3, the inequality holds since

in this case 2n = 23 = 8 and 2n + 1 = 2 · 3 + 1 = 7, and 8 > 7. Hence, the base case when

n = 3 is true. For the inductive hypothesis, let k be an arbitrary (but fixed) integer such that

k > 3 and assume that the inequality holds for n = k, i.e., suppose that 2k > 2k + 1. For

the inductive step, we must show that the inequality holds for n = k + 1. That is, we must

show that 2k+1 > 2(k + 1) + 1. Now,

2k+1 = 2 · 2k

> 2 · (2k + 1) (by the inductive hypothesis)

= 2(k + 1) + 2k

> 2(k + 1) + 1 (since k ≥ 3),

as desired. Hence, by the principle of mathematical induction, the inequality holds for all

integers n ≥ 3.

(9) By the Quotient Remainder theorem, for d = 6, for all natural number m, m = 6n + r where

n is integer and r ∈ {0, 1, 2, 3, 4, 5}. Since m is prime, it cannot be of the form 6n (divisible

by 6), 6n + 2, or 6n + 4 (divisible by 2) or 6n + 3 (divisible by 3). Thus the only remaining

possibilities are 6n + 1 and 6n + 5.

226 24. Solution to PS 1

|9 − 5x| ≤ 11

9 − 5x ≤ 11 or − (9 − 5x) ≤ 11

(10) 9 − 5x − 9 ≤ 11 − 9 −9 + 5x ≤ 11

−5x ≤ 2 −9 + 5x + 9 ≤ 11 + 9

1 1

− · −5x ≥ − · 2 5x ≤ 20

5 5

2 1 1

x≥− · 5x ≤ · 20

5 5 5

Chapter 25

Solution to PS 2

(1) We need to verify that it satisfies three conditions of the distance function.

(a) (i) Non-negativity is obvious as the absolute value is non-negative. If x = y, then d(x, y) =

0. Also if

n

d (x, y) = ∑ | xi − yi |= 0

i=1

then xi − yi = 0 for all i = 1, · · · , n. This implies that x = y.

(ii) Symmetry is obvious too since absolute value function is symmetric,

| a − b |=| b − a | .

| xi − zi |6| xi − yi | + | yi − zi |

holds for all i = 1, 2, · · · , n. Hence

n n n

∑ |xi − zi | 6 ∑ |xi − yi | + ∑ |yi − zi |

i=1 i=1 i=1

or

d(x, z) 6 d(x, y) + d(y, z).

Hence it is a distance function.

(b) (i) Non-negativity is obvious as the maximum of two absolute values is non-negative. If

x = y, d(x, y) = 0. Also

d (x, y) = max{|x1 − y1 | , |x2 − y2 |} = 0

⇒ |x1 − y1 | = 0 = |x2 − y2 | ⇒ x = y.

227

228 25. Solution to PS 2

| a − b |=| b − a | .

(iii) Triangle Inequality I: Note that max{a, b} > a and max{a, b} > b. Using this we

have

d(x, y) > |x1 − y1 | and d(x, y) > |x2 − y2 |

d(y, z) > |y1 − z1 | and d(y, z) > |y2 − z2 |

d(x, y) + d(y, z) > |x1 − y1 | + |y1 − z1 | > |x1 − z1 |

d(x, y) + d(y, z) > |x2 − y2 | + |y2 − z2 | > |x2 − z2 | .

It follows that

d(x, y) + d(y, z) > max{|x1 − z1 | , |x2 − z2 |} = d (x, z) .

Hence it is a distance function.

(iv) Triangle Inequality II: Consider the case when d(x, z) = |x1 − z1 |, i.e., |x1 − z1 | ≥

|x2 − z2 |.

Then using triangle inequality for the absolute value function,

d(x, z) = |x1 − z1 | ≤ |x1 − y1 | + |y1 − z1 |

≤ d(x, y) + d(y, z)

The inequality in second line follows from the fact that either d(x, y) = |x1 − y1 | or

d(x, y) > |x1 − y1 |. Similar observations hold for d(y, z). The second case of d(x, z) =

|x2 − z2 | will be similar. Hence it is a distance function.

(c) (i) Non-negativity: d(x, y) ≥ 0 for all x, y in Rn , and thus 1 + d(x, y) ≥ 1 for all x, y in

Rn . As a result, d1 (x, y) ≥ 0 for all x, y in Rn .

By the definition of d1 (x, y), d1 (x, y) = 0 if and only if d(x, y) = 0. But d(x, y) = 0 if

and only if x = y.

(ii) Since d(x, y) = d(y, x), it is straightforward to see that d1 (x, y) = d1 (y, x).

(iii) Triangle Inequality I

d1 (x, z) ≤ d1 (x, y) + d1 (y, z) ⇔

d(x, z) d(x, y) d(y, z)

≤ + ⇔

1 + d(x, z) 1 + d(x, y) 1 + d(y, z)

d(x, z)[1 + d(x, y)][1 + d(y, z)] ≤ d(x, y)[1 + d(x, z)][1 + d(y, z)]

+ d(y, z)[1 + d(x, y)][1 + d(x, z)] ⇔

d(x, z) ≤ d(x, y) + d(y, z) + 2d(x, y)d(y, z) + d(x, y)d(y, z)d(x, z)

Since d(x, y) + d(y, z) ≥ d(x, z), d(a, b) ≥ 0 for any (a, b) ∈ Rn × Rn , the last inequal-

ity is always true. Thus d1 (x, z) ≤ d1 (x, y) + d1 (y, z) for all x, y, z in Rn .

25. Solution to PS 2 229

We use notation a ≡ d(x, z) and b ≡ d(x, y) + d(y, z). Then

a ≤ b → a + ab ≤ b + ab

a b

a(1 + b) ≤ b(1 + a) → ≤

1+a 1+b

d(x, z) d(x, y) + d(y, z)

≤

1 + d(x, z) 1 + d(x, y) + d(y, z)

d(x, z) d(x, y) d(y, z)

≤ +

1 + d(x, z) 1 + d(x, y) 1 + d(y, z)

d1 (x, z) ≤ d1 (x, y) + d1 (y, z).

∪∞ [ ]

(2) It is bounded. Take B = 2, ∥x∥ 6 2, ∀x ∈ n=1

1 2

n, n . But it is NOT closed as

∞ [ ]

∪ 1 2

, = (0, 2].

n=1

n n

So it is not compact.

(3)

(A ∪ B)c ⊆ Ac ∪ Bc is TRUE.

Let, x ∈ (A ∪ B)c

⇒ x∈

/ (A ∪ B)

⇒ x∈

/ A∧x ∈

/B

⇒ x ∈ AC ∧ x ∈ BC

⇒ x ∈ AC ∪ BC

(A ∪ B)c ⊇ Ac ∪ Bc is FALSE.

Let, x ∈ Ac ∪ Bc and let x ∈ Ac ∧ x ∈

/ Bc

⇒ x∈

/ A∧x ∈ B ⇒ x ∈ A∪B

/ (A ∪ B)C .

⇒ x∈

/ B. Since A ∪C = B ∪C,

and we know x ∈ A ∪ C we get x ∈ B ∪ C. This also means that x ∈ C since we have assumed

x∈/ B. Then x ∈ A ∩C and since A ∩C = B ∩C, x ∈ B ∩C which implies x ∈ B a contradiction.

Hence A = B.

230 25. Solution to PS 2

X ∈ P (A). Also X ⊂ B and X ∈ P (B). Hence X ∈ P (A) ∩ P (B). Next let X ∈ P (A) ∩ P (B)

be an arbitrary element. Then X ∈ P (A) which implies X ⊂ A and X ∈ P (B) which implies

X ⊂ B. Both taken together imply that X ⊂ A ∩ B and X ∈ P (A ∩ B).

(b) Let X ∈ P (A) be an arbitrary element. Then X ∈ P (A) ∪ P (B). Since X ⊂ A, X ⊂ A ∪ B

which means X ∈ P (A ∪ B). Thus we have P (A) ⊂ P (A ∪ B). The same argument works

to show that P (B) ⊂ P (A ∪ B). Hence P (A) ∪ P (B) ⊂ P (A ∪ B).

(c) Let A = {1} and B = {2}. Then

/

P (A) = {{1}, 0},

and

/

P (B) = {{2}, 0}.

So

/

P (A) ∪ P (B) = {{1}, {2}, 0}.

However A ∪ B = {1, 2}. So

/

P (A ∪ B) = {{1}, {2}, {1, 2}, 0}.

(6) It is enough to show that one of the properties of the vector space is not satisfied by this space.

Take scalar multiplication by 2. Let (x1 , x2 ) ∈ C and let α = 2 be a scalar. Then

(2x1 , 2x2 ) ∈ R2 : (2x1 )2 + (2x2 )2 = 4.

Hence (2x1 , 2x2 ) ∈

/ C and so C is not a vector space.

(7) In this case the commutative property of the sum of vectors does not hold. Consider a = (2, 3)

and b = (4, 5).

Then a + b = (2 + 4, 3 − 5) = (6, −2) and b + a = (4 + 2, 5 − 3) = (6, 2). Hence

(2, 3) + (4, 5) ̸= (4, 5) + (2, 3).

So V is not a vector space.

(8) In this case also, the commutative property of the sum of vectors does not hold. Consider as

before, a = (2, 3) and b = (4, 5).

Then a + b = (2 + 2 × 4, 3 + 3 × 5) = (10, 18) and b + a = (4 + 2 × 2, 5 + 3 × 3) = (8, 14).

Hence

(2, 3) + (4, 5) ̸= (4, 5) + (2, 3).

So V is not a vector space.

We split the proof in two parts.

25. Solution to PS 2 231

(i)

(25.1) (J ∩ K)c ⊆ J c ∪ K c .

Let

x ∈ (J ∩ K)c

⇒ x∈

/ (J ∩ K)

⇒ x∈

/ J ∨x ∈

/K

⇒ x ∈ JC ∨ x ∈ K C

⇒ x ∈ JC ∪ K C .

(ii) Next

Jc ∪ Kc ⊆ Jc ∩ Kc

x ∈ JC ∪ K C

⇒ x ∈ JC ∨ x ∈ K C

⇒ x∈

/ J ∨x ∈

/K

⇒ x∈

/ J ∩K

⇒ x ∈ (J ∩ K)C .

(b) (J ∪ K)c = J c ∩ K c .

(25.2) (J ∪ K)c ⊆ J c ∩ K c

Let

x ∈ (J ∪ K)c

⇒ x∈

/ (J ∪ K)

⇒ x∈

/ J ∧x ∈

/K

⇒ x ∈ JC ∧ x ∈ K C

⇒ x ∈ JC ∩ K C .

Next

J c ∩ K c ⊆ (J ∪ K)c

Let

x ∈ Jc ∩ Kc

⇒ x ∈ Jc ∧ x ∈ Kc

⇒ x∈

/ J ∧x ∈

/K

⇒ x∈

/ J ∪K

⇒ x ∈ (J ∪ K)c .

232 25. Solution to PS 2

∀n > N, (xn + yn ) − (x + y) < ε.

Note

|xn + yn − x − y| = |xn − x + yn − y| 6 |xn − x| + |yn − y| ,

by triangle inequality. Since

ε

xn → x, ∃N1 s.t. ∀n > N1 , |xn − x| <

2

ε

yn → y, ∃N2 s.t. ∀n > N2 , |yn − y| < .

2

Let N = max {N1 , N2 }. Hence

ε ε

∀n > N, |xn − x| + |yn − y| < + = ε,

2 2

⇒ |xn + yn − x − y| < ε.

So {xn + yn } → x + y.

(11) We know that if a sequence is convergent then it is bounded. The contrapositive statement

will be, “If a sequence is not bounded then it is not convergent.”. The sequence xn = n, n ∈ N

is NOT bounded. No matter which B we choose as a bound, there will be a natural number

greater than it. We now use the contrapositive to conclude that {xn }∞

n=1 is not convergent.

(12) Since {xn } is a Cauchy sequence, for ∀ε > 0, there exist N ∈ N such that ∀m, n > N implies

that |xn − xm | < ε. Choose ε = 1, m = N, then

Let { }

B = max |x1 | , |x2 | , · · · , 1 + |xN | ,

then xn 6 B, ∀n ∈ N.

} to 2 being sum of a constant sequence {xn } =

{2, 2, · · · } and the sequence {yn } = − n . We have already seen in the class notes that the

1

second sequence {yn } converges to zero. Hence, the sequence being some of two convergent

sequences converges to the sum of the limits which is equal to 2 + 0 = 2. Since limit of

convergent sequence is unique, 1 cannot be a limit.

(14) We consider monotone increasing sequence xn 6 xn+1 . Proof is analogous for the monotone

decreasing case. Let {xn } be a convergent sequence and let lim xn = x. From the definition of

n→∞

convergence, with ε = 1, we get N ∈ N such that ∀n > N implies that |xn − x| < 1. Then,

xn < 1 + |x| , ∀n > N.

25. Solution to PS 2 233

Let { }

B = max |x1 | , |x2 | , · · · , 1 + |x| ,

then xn 6 B, ∀n ∈ N. Now let the sequence be bounded. Let x be the least upper bound. Then

xn 6 x ∀n ∈ N. For every ε > 0, there exists a N ∈ N, such that x − ε < xN 6 x. Otherwise x − ε

would be an upper bound for the sequence. Since xn is increasing, n > N implies

x − ε < xn 6 x

which shows that xn converges to x.

(15) (i) S = (0, 1) Open: For any x ∈ (0, 1), open ball with radius min {x, 1 − x} is contained in S.

(ii) S = [0, 1] Closed: Use the theorem: A set S ⊆ Rn is closed if and only if every convergent

sequence of points {xn } ∈ S has its limit x ∈ A. Let {xn } be a convergent sequence with

limit x contained in S, then for all n, xn > 0, and xn 6 1. Since weak inequalities are

preserved in the limit, x 6 1 and x > 0. So x ∈ S and S is closed.

(iii) S{ = [0, }

1): Neither open nor closed: It is not closed since the limit of convergent sequence

1 − n is not contained in S and is not open since x = 0 is contained in S but it is not

1

(iv) S = R; Both open and closed: Use the result in the notes that empty set is both open and

closed and R is complement of the empty set.

(v) Let An , Bn and Cn be the intervals in R defined by

[ ] ( ] ( )

1 1 1

An = 0, , Bn = 0, , Cn = − , n ,

n n n

where n is a positive integer. Since

∪

N ∩

N

1

An = [0, 1], and An = [0, ] for all N ∈ N,

n=1 n=1

N

we get

∞

∪ ∞

∩

An = [0, 1], and An = {0}.

n=1 n=1

Similarly

∞

∪ ∞

∩

Bn = (0, 1], / and

Bn = 0,

n=1 n=1

∞

∪ ∞

∩

Cn = (−1, ∞), and Cn = [0, 1).

n=1 n=1

Chapter 26

Solution to PS 3

(1)

[ ] 9 6 5 4

1 −1 7

AB = . 1 −2 −3 3

0 8 10

0 1 −1 2

[ ]

1·9−1·1+7·0 1·6−1·2+7·1 1·5+1·3−7·1 1·4−1·3+7·2

=

0 · 9 + 8 · 1 + 10 · 0 0 · 6 − 8 · 2 + 10 · 1 0 · 5 − 8 · 3 − 10 · 1 0 · 4 + 8 · 3 + 10 · 2

[ ]

8 15 1 15

(26.1) =

8 −6 −34 44

( ) ( ) ( )

1 1 0

λ1 + λ2 =

2 3 0

{

λ1 + λ2 = 0

⇔

2λ1 + 3λ2 = 0

{

λ1 = −λ2

(26.2) ⇔

λ1 = − 23 λ2

(26.3) λ1 = 0, λ2 = 0.

235

236 26. Solution to PS 3

(3) Recall the property of determinant: If we multiply any column of the matrix by scalar k, then

the determinant of the new matrix is k times the determinant of the original matrix.

Since the matrix −2A is obtained by multiplying each column of the matrix A (having

five rows and five columns) by −2, the determinant of matrix −2A would be (−2)5 . Thus

det (−2A) = (−2)5 det A = (−32)(−1) = 32.

(4) Recall the rank of a matrix A is the number of linearly independent column vectors of A. It is

also equal to the number of linearly independent row vectors of A.

3 2 1

A= 0 1 7

5 4 −1

3 2 0

λ1 0 + λ2 1 = 0

5 4 0

3λ1 + 2λ2 = 0

⇔ λ2 = 0

5λ1 + 4λ2 = 0

(26.4) ⇔ λ1 = 0, λ2 = 0

is the only solution. So the first two columns are linearly independent. Now lets take all three

columns,

3 2 1 0

λ1 0 + λ2 1 + λ3 7 = 0

5 4 −1 0

3λ1 + 2λ2 + λ3 = 0 (i)

⇔ λ2 + 7λ3 = 0 (ii)

5λ1 + 4λ2 − λ3 = 0 (iii)

(i) − 2 (ii) : 3λ1 − 13λ3 = 0

(26.5) ⇔ (iii) − 4 (ii) : 5λ1 − 29λ3 = 0

So, λ1 = 0, λ3 = 0 → λ2 = 0

is the only solution. So all three columns are linearly independent. This implies that the rank

of matrix A is 3.

26. Solution to PS 3 237

(26.6) A · x = b

3×3 3×1 3×1

has a solution if and only if

(26.7) rank (A) = rank (Ab )

and the solution, if it exists, is unique if and only if

(26.8) rank (A) = rank (Ab ) = 3#of unknowns

In this question

1 1 1 1 1 1 6

A = 1 2 3 , Ab = 1 2 3 10

1 2 λ 1 2 λ µ

We can verify that the rank of A is at least 2 since the first two rows of A are linearly inde-

pendent. Similarly the rank of Ab is at least 2 since the first two rows of Ab are also linearly

independent.

(a) For no solution to exist, the ranks of A and Ab need to be different which will be possible

only in case rank of A is 2 and rank of Ab is 3. This is because if rank of A is 3 then so is

the rank of Ab .

For rank of A to be 2, λ = 3. Also for rank of Ab to be equal to 3, µ ̸= 10.

(b) For unique solution, the rank of A and Ab must be equal to 3. Rank of A is 3 if and only if

λ ̸= 3. In this case, rank of Ab is 3 for every value of µ ∈ R. Thus for λ ̸= 3 and µ ∈ R we

get unique solution.

(c) For infinitely many solution, rank of A and Ab need to be equal to 2. This is possible if and

only if λ = 3 and µ = 10.

You might consider writing down the solutions in the last two cases in terms of λ and µ

values.

(6)

A11 = 2 > 0, A11 A22 − A12 A21 = 2 · 1 − 1 = 1 > 0: PD

B11 > 0, B22 > 0, B11 B22 − B12 B21 = 2 · 8 − 16 = 0: PSD

C11 < 0, C11C22 −C12C21 = −3 · 5 − 16 < 0 : Indefinite

D11 < 0, D11 D22 − D12 D21 = −3 · (−6) − 16 > 0: ND

(7) Let Et and Ut denote the number of people who have employment and unemployed people in

some period t. The transition probabilities are defined as follows.

pAA ≡ probability that a current A remains an A,

pAB ≡ probability that a current A moves to B,

pBB ≡ probability that a current B remains a B,

238 26. Solution to PS 3

The distribution of employees at time t is denoted by the vector xt′ = [At Bt ] and the transition

probabilities in matrix form as

[ ] [ ]

pAA pAB 0.9 0.1

(26.9) M= = .

pBA pBB 0.7 0.3

Then the distribution of employees across the two locations next period (t + 1) is xt′ · M = xt+1

′ ,

which is

[ ]

0.9 0.1

[At Bt ] = [(0.9At + 0.7Bt ) (0.1At + 0.3Bt )] = [At+1 Bt+1 ].

0.7 0.3

In the similar manner we can determine the distribution of employees after two periods.

′ ′

xt+1 · M = xt+2

[ ]

0.9 0.1

[At+1 Bt+1 ] = [At+2 Bt+2 ]

0.7 0.3

[ ][ ]

0.9 0.1 0.9 0.1

[At Bt ] = [At+2 Bt+2 ]

0.7 0.3 0.7 0.3

[ ]2

0.9 0.1

[At Bt ] = [At+2 Bt+2 ]

0.7 0.3

In general, for n periods,

[ ]n

0.9 0.1

(26.10) [At Bt ] = [At+n Bt+n ]

0.7 0.3

The initial distribution of employees across two states at time t = 0 as

x0′ = [A0 B0 ] = [0 2000]

Then the distribution of employees in the next period t = 1 is

[ ]

0.9 0.1

[0 2000] = [1400 600] = [A1 B1 ].

0.7 0.3

The distribution after two periods is

[ ]2 [ ]

0.9 0.1 0.88 0.12

[0 2000] = [0 2000] = [1680 320] = [A2 B2 ]

0.7 0.3 0.84 0.16

The distribution after four periods is

[ ]4 [ ]

0.9 0.1 0.8752 0.1248

[0 2000] = [0 2000] = [1747 253] = [A4 B4 ]

0.7 0.3 0.8736 0.1264

26. Solution to PS 3 239

[ ]6 [ ]

0.9 0.1 0.875005 0.124992

[0 2000] = [0 2000] = [1749 251]. = [A6 B6 ]

0.3 0.7 0.874944 0.125056

[ ]8 [ ]

0.9 0.1 0.8750 0.1250

[0 2000] = [0 2000] = [1750 250] = [A8 B8 ]

0.3 0.7 0.8750 0.1250

[ ]10 [ ]

0.9 0.1 0.8750 0.1250

[0 2000] = [0 2000] = [1750 250] = [A10 B10 ].

0.3 0.7 0.8750 0.1250

Observe that when the transition matrix is raised to higher powers, the new transition matrix

converges to a matrix whose rows are identical. This is referred to as the steady state. In this

example, the steady state would be

[ ]

7 1

M̄ = 8 8 .

7 1

8 8

0.9 0.1

[A B] × = [A B],

0.7 0.3

gives

0.9A + 0.7B = A,

and

A + B = 2000.

Then we get

7 1

A = 7B, or A = · (2000), B = · (2000).

8 8

det AB = det A × det B.

Since the matrix A is nilpotent, we know

det Ak = [det A]k = det O = 0.

Hence, det A = 0.

240 26. Solution to PS 3

(b) Note

det A′ = det A.

Also, matrix −A is obtained by multiplying each row (or each column) of matrix A by −1.

Hence,

det(−A) = (−1)n det A = − det A,

if n is an odd number. Thus

det A′ = det A = − det A.

This leads to

det A = 0,

and therefore A is not invertible.

(c) Note

det A′ = det A.

and

det[AA′ ] = det A × det A′ = det A × det A = [det A]2 = det I = 1,

we get

det A = ±1.

(d) As we have seen in part (b), for n an odd integer,

det AB = det A × det B = (−1)n det BA = (−1)n det B det A = − det A × det B,

implies

det A × det B = 0.

This means either det A = 0 (i. e., A is not invertible) or det B = 0 (i.e., B is not invertible).

(e) Since

det AB = det A × det B = det I = 1,

det A ̸= 0 and therefore A is invertible. Pre-multiplying both sides by A−1 , we get

A−1 AB = IB = B = A−1 I = A−1 ,

showing that A−1 = B.

(9) (a) The characteristic polynomial is obtained by taking the determinant of the matrix

4−λ 4 4

A − λI = −2 −3 − λ −6

1 3 6−λ

This is equal to

(4 − λ)(−18 − 3λ + λ2 + 18) − 4(−12 + 2λ + 6) + 4(−6 + 3 + λ) = 0

On simplification we get

(4 − λ)(−3λ + λ2 ) − 4(−6 + 2λ) + 4(−3 + λ) = 0,

26. Solution to PS 3 241

or

−12λ + 4λ2 + 3λ2 − λ3 + 24 − 8λ − 12 + 4λ = 0,

12 − 16λ + 7λ2 − λ3 = 0,

(b) The characteristic polynomial is of degree three and hence has three solutions (possibly

repeated). The solutions are λ1 = 3, λ2 = 2 and λ3 = 2.

(c) Eigenvector for λ = 3:

1 4 4 x1 0

[A − λI]x = −2 −6 −6 x2 = 0 .

1 3 3 x3 0

It is easy to check that x1 = 0, x2 = −1, x3 = 1 is a solution. Hence the eigenvector family

is given by

x1 0

x2 = t −1 , t ̸= 0.

x3 1

Eigenvector for λ = 2:

2 4 4 x1 0

[A − λI]x = −2 −5 −6 x2 = 0 .

1 3 4 x3 0

It is easy to check that x1 = 2, x2 = −2, x3 = 1 is a solution. Hence the eigenvector family

is given by

x1 2

x2 = t −2 , t ̸= 0.

x3 1

(10) (a) We use the Result 7.1 to prove this. The determinant of the upper triangular matrix is equal

to the product of all the diagonal terms. By definition of eigenvalue, it is clear that if we

take λi = aii , then the determinant of the matrix [A − λi I] is zero since the diagonal entry

in row i or column i is zero.

Similar arguments can be used to prove the result for the lower triangular matrix.

(b) Since A is an invertible matrix, A−1 exists and we can pre-multiply the equation (A−λI)x =

0 by (A−1 . This yields (I − λA−1 )x = 0 or ( λ1 I − A−1 )x = 0 or (A−1 − λ1 I)x = 0 as desired.

Thus for an invertible matrix A, λ is an eigenvalue of A if and only if λ1 is an eigenvalue of

A−1 .

(c) Assume λ is the eigenvalue for the eigenvector x, we know

Ax = λx.

Pre-multiplying both sides by A, we get

A × Ax = A × λx = (λ)Ax = (λ)λx = (λ)2 x.

242 26. Solution to PS 3

A2 x = (λ)2 x.

Hence x is an eigenvector of A2 and the corresponding eigenvalue is λ2 . Using the exercise

in part (b) and similar argument, we can show that x is an eigenvector of A−2 and the

corresponding eigenvalue is λ−2 .

Chapter 27

Solution to PS 4

(1) (a)

[ ]1

2x + 1 2

f (x) =

x−1

[ ] 1

1 2x + 1 − 2 (x − 1) 2 − (2x + 1) 1

f ′ (x) =

2 x−1 (x − 1)2

[ ]1

3 x−1 2 1

= −

2 2x + 1 (x − 1)2

3 1

(27.1) = −

2 (2x + 1) 12 (x − 1) 32

(b)

1

f ′ (x) = (6x − 5)

3x2 − 5x

6x − 5

(27.2) = .

3x2 − 5x

y = f (x0 ) + f ′ (x0 ) (x − x0 ) .

243

244 27. Solution to PS 4

y = 24 + 23 (x − 2) ,

y = −22 + 23x.

(3) (a)

lim f (x) = lim+ f (x) ̸= f (x0 )

x→x0− x→x0

x→0− x→0

lim− f (x) ̸= lim+ f (x)

x→x0 x→x0

(b)

lim g (x) = 3 · 2 − 2 = 4; lim+ g (x) = −2 + 6 = 4 = g (x0 ) .

x→2− x→2

Hence g (x) is continuos at x = 2.

f (0) = g (0) = 0,

we can use L’Hospital rule to find the limit.

( )

f ′ (x) = 2x · exp x2 − exp (−x) ⇒ f ′ (0) = −1

(27.3) g′ (x) = 2 ⇒ g′ (0) = 2.

Hence

( )

exp x2 + exp (−x) − 2 −1

(27.4) lim = .

x→0 2x 2

(5)

[ ]

∇ f (x, y) = 2xy + y2 − 2y + 3 x2 + 2xy − 2x

[ ]

2y 2x + 2y − 2

H f (x, y) =

2x + 2y − 2 2x

[ ]

4 4

(27.5) H f (1, 2) =

4 2

27. Solution to PS 4 245

{

xy

x2 +y2

if (x, y) ̸= (0, 0)

f (x, y) =

0 otherwise.

Show that the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point in R2 , although f

is not continuous at (0, 0).

(a) Observe that for all (x, y) ̸= (0, 0), we get

D1 f (x, y) =

(x2 + y2 )2

y(y2 − x2 )

=

(x2 + y2 )2

and

(x2 + y2 )(x) − xy(2y)

D2 f (x, y) =

(x2 + y2 )2

x(x2 − y2 )

=

(x2 + y2 )2

Further,

f (h, 0) − f (0, 0)

D1 f (0, 0) = lim

h→0 h

0

= =0

h

and

f (0, h) − f (0, 0)

D2 f (0, 0) = lim

h→0 h

0

= =0

h

Thus the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point (x, y) ∈ R2 .

(b) Consider y = x. The function f (x, x) = 12 for all points x ̸= 0 and therefore f (0, 0) = 0 ̸=

limh→0 f (h, h). Hence it is not continuous at (0, 0).

(7) This exercise gives an example of a function with D12 f (x, y) ̸= D21 (x, y). Let f (x, y) be defined

as

{

xy(x2 −y2 )

x2 +y2

if (x, y) ̸= (0, 0)

f (x, y) =

0 otherwise.

246 27. Solution to PS 4

(x2 + y2 )(3x2 y − y3 ) − (x3 y − xy3 )2x

D1 f (x, y) =

(x2 + y2 )2

y(x4 + 4x2 y2 − y4 )

=

(x2 + y2 )2

and

(x2 + y2 )(x3 − 3xy2 ) − (x3 y − xy3 )2y

D2 f (x, y) =

(x2 + y2 )2

x(x4 − 4x2 y2 − y4 )

=

(x2 + y2 )2

Further,

f (h, 0) − f (0, 0)

D1 f (0, 0) =

h

0

= =0

h

and

f (0, h) − f (0, 0)

D2 f (0, 0) =

h

0

= =0

h

Further,

( )

y(x4 + 4x2 y2 − y4 ) 2x2 y2 − 2y4

D1 f (x, y) = = y 1+ 4

(x2 + y2 )2 x + 2x2 y2 + y4

( ) ( )

2x2 y2 2x2 y2

≤ y 1+ 4 = y 1+ 2 2 ≤ y(1 + 1).

x + 2x2 y2 + y4 2x y + (x4 + y4 )

So D1 f (x, y) ≤ 2|y|. It is easy to verify that D2 f (x, y) ≤ 2|x| on similar lines. This shows

that D1 f (x, y)(x,y)→(0,0) → 0 = D1 f (0, 0) as lim(x,y)→(0,0) 2|y| = 0. Similarly, D2 f (x, y)(x,y)→(0,0) →

0 = D2 f (0, 0) as lim(x,y)→(0,0) 2|x| = 0. For all (x, y) ∈ R2 \ (0, 0) the partial derivatives

D1 f (x, y) and D2 f (x, y) are continuous functions being a ratio of two polynomials with

non-vanishing denominator.

Thus the partial derivatives D1 f (x, y) and D2 f (x, y) exist at every point (x, y) ∈ R2 .

(b) Observe that

hy(h2 −y2 )

(h2 +y2 )

−0

D1 f (0, y) = lim

h→0 h

y(h2 − y2 )

= lim 2 = −y

h→0 (h + y2 )

27. Solution to PS 4 247

and

xh(x2 −h2 )

(x2 +h2 )

−0

D2 f (x, 0) = lim

h→0 h

x(x2 − h2 )

= lim 2 = x.

h→0 (x + h2 )

Therefore, the partial derivatives D1 f (x, y) and D2 f (x, y) are continuous at every point

in R2 . Since the real-valued function f has continuous partial derivatives at every point

(x, y) ∈ R2 , it is continuous at every point (x, y) ∈ R2 .

(c) Since f (x, y) is a rational function with non-zero denominator for (x, y) ̸= (0, 0), the second

order cross partial derivatives D12 f (x, y) and D21 f (x, y) exist at every point in R2 and are

continuous everywhere in R2 except at (0, 0).

(d) Given D2 f (x, 0) = x we get D21 f (0, 0) = +1 and from D1 f (0, y) = −y we get D12 f (0, 0) =

−1.

Chapter 28

Solution to PS 5

(1) Let f (x) and g (x) be two concave functions and let h (x) = f (x) + g (x). Concavity of f and g

imply, ∀x, y ∈ D, ∀λ ∈ [0, 1]

[ ]

λ f (x) + (1 − λ) f (y) 6 f λx + (1 − λ) y

[ ]

λg (x) + (1 − λ) g (y) 6 g λx + (1 − λ) y .

[ ] [ ]

λ f (x) + (1 − λ) f (y) + λg (x) + (1 − λ) g (y) 6

f λx + (1 − λ) y + g λx + (1 − λ) y

( ) ( ) [ ] [ ]

λ f (x) + g (x) + (1 − λ) f (y) + g (y) 6 f λx + (1 − λ) y + g λx + (1 − λ) y

[ ]

λh (x) + (1 − λ) h (y) 6 h λx + (1 − λ) y .

(2) (a) False. Consider A, B ⊆ R, A = [0, 2] , B = [4, 6] .A ∪ B = [0, 2] ∪ [4, 6]. Then 1 ∈ A ∪ B, 5 ∈

A ∪ B, but 12 · 1 + 12 · 5 = 3 ∈

/ A ∪ B.

(b) True. If A and B are convex sets, then A ∩ B is convex. Let x ∈ A ∩ B, y ∈ A ∩ B. Then,

(28.1) λx + (1 − λ) y ∈ A as x, y ∈ A

(28.2) λx + (1 − λ) y ∈ B as x, y ∈ B

(28.3) ⇒ λx + (1 − λ) y ∈ A ∩ B.

Hence A ∩ B is convex.

(c) True. Let z, z′ ∈ C, and let 0 ≤ λ ≤ 1. By definition of C, there exist x, x′ ∈ A and y, y′ ∈ B,

such that z = x + y and z′ = x′ + y′ . We will show that λz + (1 − λ)z′ belongs to C. This

will establish that C is a convex set in Rn .

249

250 28. Solution to PS 5

a convex set, we have λy + (1 − λ)y′ ∈ B. By definition of C, we have:

(28.4) [λx + (1 − λ)x′ ] + [λy + (1 − λ)y′ ] ∈ C

We can rewrite (28.4) as:

λ(x + y) + (1 − λ)(x′ + y′ ) ∈ C

Since z = x + y and z′ = x′ + y′ , this means [λz + (1 − λ)z′ ] ∈ C.

(3) Let x, y ∈ [0, 1] and let 0 ≤ λ ≤ 1. Clearly [λx + (1 − λ)y] ∈ [0, 1]. In order to prove the claim,

we will show that:

h(λx + (1 − λ)y) ≤ λh(x) + (1 − λ)h(y)

Using the definition of h,

(28.5) h(λx + (1 − λ)y) = f (λx + (1 − λ)y)g(λx + (1 − λ)y)

Since f and g are convex functions on [0, 1]

f (λx + (1 − λ)y) ≤ λ f (x) + (1 − λ) f (y), and

g(λx + (1 − λ)y) ≤ λg(x) + (1 − λ)g(y).

Since f and g are non-negative valued functions,

f (λx + (1 − λ)y)g(λx + (1 − λ)y) ≤ [λ f (x) + (1 − λ) f (y)][λg(x) + (1 − λ)g(y)]

= λ2 f (x)g(x) + λ(1 − λ){ f (x)g(y) + g(x) f (y)}

(28.6) + (1 − λ)2 f (y)g(y)

Since f and g are increasing functions on [0, 1],

{ f (x) − f (y)}{g(x) − g(y)} ≥ 0

and so:

(28.7) f (x)g(x) + f (y)g(y) ≥ f (x)g(y) + f (y)g(x)

Using (28.7) in (28.6),

f (λx + (1 − λ)y)g(λx + (1 − λ)y) ≤ λ2 f (x)g(x) + λ(1 − λ){ f (x)g(x) + f (y)g(y)} + (1 − λ)2 f (y)g(y)

= λ f (x)g(x) + (1 − λ) f (y)g(y)

(28.8) = λh(x) + (1 − λ)h(y)

Using (28.6) in (28.8), we obtain:

h(λx + (1 − λ)y) ≤ λh(x) + (1 − λ)h(y)

which is the desired result.

(4) (a) Recall a monotone function of one variable is quasi-concave. Since f (x) = 3x + 4 is mono-

tone increasing, it is quasi-concave.

28. Solution to PS 5 251

0 y exp (x) exp (x)

B (x, y) = y exp (x) y exp (x) exp (x)

exp (x) exp (x) 0

[ ]

( ) 0 y exp (x)

det B1 (x, y) = det

y exp (x) y exp (x)

= −y2 exp (2x) < 0;

0 y exp (x) exp (x)

det B2 (x, y) = det y exp (x) y exp (x) exp (x)

exp (x) exp (x) 0

= y exp (3x) > 0.

( )

(−1)r det Br (x) > 0, ∀r = 1, 2, · · · , n; ∀x ∈ D.

(c)

0 −2xy3 −3x2 y2

B (x, y) = −2xy3 −2y3 −6xy2

−3x2 y2 −6xy2 −6x2 y

[ ]

( ) 0 −2xy3

det B1 (x, y) = det

−2xy3 −2y3

(28.9) = −4x2 y6 6 0

( ) 0 −2xy 3 −3x2 y2

det B2 (x, y) = det −2xy3 −2y3 −6xy2

−3x2 y2 −6xy2 −6x2 y

(28.10) = −30x4 y7

( )

Note the sign of det B2 (x, y) is not positive. Hence it is not quasi-concave.

252 28. Solution to PS 5

f (x)

g(x)

(5) Let

0 for x 6 0

x for 0 6 x 6 21

(28.11) f (x) =

1 − x for 12 6 x 6 1

0 for x > 1

0 for x 6 1

x − 1 for 1 6 x 6 23

(28.12) g (x) = and

2 − x for 32 6 x 6 2

0 for x > 2

(28.13) h (x) = f (x) + g (x)

In the figures, Fig. 1 and Fig. 2 functions are quasiconcave (each of them is first non-

decreasing, then non-increasing), whereas Fig. 3 function, which is the sum of the top and

middle functions, is not quasiconcave (it is not non-decreasing, is not non-increasing, and is

not non-decreasing then non-increasing.

28. Solution to PS 5 253

f (x) + g(x)

(6) (i)

[ ]

∇ f (x, y, z) = 24x2 + 2y2 4xy −3z2

48x 4y 0

(28.14) H f (x, y) = 4y 4x 0

0 0 −6z

Then f (x, y) is not concave as the principal minor D1 = 48x > 0. The bordered Hessian

is

0 24x2 + 2y2 4xy −3z2

24x2 + 2y2 48x 4y 0

B (x, y) =

4xy 4y 4x 0

−3z2 0 0 −6z

[ ]

( ) 0 24x2 + 2y2

det B1 (x, y) = det 2 2

24x + 2y 48x

(28.15) = −576x4 − 96x2 y2 − 4y4 6 0

0 24x 2 + 2y2 4xy

( )

det B2 (x, y) = det 24x2 + 2y2 48x 4y

4xy 4y 4x

(28.16) = −2304x5 − 384x3 y2 + 48xy4

which could take both positive or negative values. Hence f (x, y, z) is neither quasiconcave

nor quasiconvex.

(ii)

[ ]

∇g (x, y) = 1 − exp (x) − exp (x + y) 1 − exp (x + y)

[ ]

− exp (x) − exp (x + y) − exp (x + y)

(28.17) Hg (x, y) = .

− exp (x + y) − exp (x + y)

254 28. Solution to PS 5

(28.18) D1 = − exp (x) − exp (x + y) < 0, D2 = ex ex+y > 0

implies that g (x, y) is concave. Hence it is also quasi-concave.

Chapter 29

Solution to PS 6

[ ]

x

[ ]

1 3 1 −2 y 1

(29.1) · =

2 6 −2 −4

z

3

w

(a) The rank of matrix A can be at most 2. This means that there can be at most two endogenous

variables. Also the second column of matrix A is a multiple (three times) of the first column

and the fourth column is a multiple of column one (−2 times). The remaining two columns

are one and three. The sub-matrix consisting of columns one and three has full rank as it’s

determinant is −4. So we can choose x and z as endogenous variables and the remaining

two y and w as exogenous variables.

(b) The system of linear equations can be rewritten as under (with the exogenous and endoge-

nous variables choice made above.

[ ] { } [ ]

1 1 x 1 − 3y + 2w

(29.2) · =

2 −2 z 3 − 6y + 4w

Multiply the first equation by two and add it to the second to get,

(29.3) 4x = 5 − 12y + 8w

5 − 12y + 8w 5

(29.4) x= = − 3y + 2w

4 4

Substitute the value of x in the first equation to get

( )

5 1

z = 1 − 3y + 2w − − 3y + 2w = − .

4 4

255

256 29. Solution to PS 6

x

−1 3 −1 1 0

y

(29.5) 4 −1 1 1 · = 3

z

7 1 1 3

w

6

The rank of matrix A can be at most 3. However, we observe that the third row is equal to the

sum of twice the second row and the first row. This means that the rank of matrix A cannot be

three. The sub matrix obtained by eliminating the third row of A (call it matrix B) is

[ ]

−1 3 −1 1

(29.6)

4 −1 1 1

The determinant of the sub matrix of B obtained by eliminating the third and fourth column is

−11 which is non-zero. This sub-matrix has full rank. So we can choose x and y as endogenous

variables and the remaining two z and w as exogenous variables.

We can solve the set of equations to obtain

[ ] { } [ ]

−1 3 x z−w

(29.7) · =

4 −1 y 3−z−w

Solving the two equations we get

9 − 2z − 4w

x= ,

11

and

3 + 3z − 5w

y= .

11

F(x, y) = x2 − xy3 + y5 − 17 = 0,

which is a continuous function being polynomial. Also,

D2 F(x, y) = −3xy2 + 5y4 = −3(5)(4) + 5(2)4 = 20 ̸= 0.

Hence, by Implicit Function Theorem, there exist a function y = f (x) in terms of x, which is

continuously differentiable, in the neighborhood of (x, y) = (5, 2). Further,

( ) ( )

′ D1 F(x, y) 2x − y3 2 1

f (x)x=5 = − =− =− =− .

D2 F(x, y) (x,y)=(5,2) −3xy + 5y

2 4 20 10

(x,y)=(5,2)

Then

( )

′ 1 199

y = f (4.9) = f (5) + (5 − 4.9) · f (x)x=5 = 2 + (0.1) · − = .

10 200

29. Solution to PS 6 257

(a) Check that z = 3 satisfies the equation f (x, y, z) = 0 for x = 6 and y = 3.

(b) Observe that

Function Theorem (IFT), there exist a function z = h(x, y) in terms of x and y, which is

continuously differentiable, in the neighborhood of (x, y) = (6, 3).

(c) By IFT, we have

( ) ( ) ( )

dz D1 f (x, y, z) 2x 2(6) 4

=− =− =− =− ,

dx (6,3,−3) D3 f (x, y, z) (6,3,−3) 3z2 (6,3,−3) 3(9) 9

and

( ) ( ) ( )

dz D2 f (x, y, z) −2x −2(3) 2

=− =− =− = .

dy (6,3) D3 f (x, y, z) (6,3,−3) 3z2 (6,3,−3) 3(9) 9

( ) ( )

dz dz

z = g(6, 3) + (6.1 − 6) + (2.8 − 3)

dx (6,3) dy (6,3)

( ) ( )

4 2 135

= −3 + − · (0.1) + · (−0.2) = − .

9 9 45

(5) Consider the profit maximizing firm described in the Example 13.2. If p increases by ∆p and

w increases by ∆w, what will be the change in the optimal input amount x?

Note the first order condition for the profit maximization is

p f ′ (x) − w = 0.

p f ′′ (x) < 0 since f (x) is strictly concave. Also, F(p, w, x) is a continuously differentiable func-

tion. Hence we can apply IFT to claim that there exists a function x = f (p, w) which is contin-

uously differentiable, in the neighborhood of (p, w, x∗ ), where x∗ is the profit maximizing input

258 29. Solution to PS 6

quantity. Then

) ( ( )

∗dx dx

x=x + · ∆p + · ∆w

dp dw

( ) ( )

∗ D1 F(p, w, x) D2 F(p, w, x)

=x − · ∆p − · ∆w

D3 F(p, w, x) D3 F(p, w, x)

( ′ ) ( )

f (x) −1

= x∗ − ′′

· ∆p − · ∆w

p f (x) p f ′′ (x)

( ′ ) ( )

∗ f (x) 1

=x − · ∆p + · ∆w.

p f ′′ (x) p f ′′ (x)

(6) Consider 3x2 yz + xyz2 = 96 as defining x as an implicit function of y and z around the point

x = 2, y = 3, z = 2.

(a) Let F(x, y, z) = 3x2 yz + xyz2 − 96 = 0. Then

D1 F(x, y, z) = 6xyz + yz2 = 6(2)(3)(2) + 3(4) = 84 ̸= 0,

and F(x, y, z) is a continuously differentiable function (being polynomial). Hence we can

apply IFT to claim that there exists a function x = f (y) in terms of y which is continuously

differentiable, in the neighborhood of (x, y, z) = (2, 3, 2). Also

( ) ( ) ( )

dx D2 F(x, y, z) 3x2 z + xz2

=− =−

dy (2,3,2) D1 F(x, y, z) 6xyz + yz2

( )

3x2 + xz 3(4) + 2(3) 3

=− =− =− ,

6xy + yz 6(2)(3) + 3(2) 7

Then ( ) ( )

dx 3 137

x = 2+ (3.1 − 3) = 2 + − (0.1) = .

dy (2,3,2) 7 70

(b) √

z2

√ √

− 3z ± 9 + 128

yz z z2 32 1 1 16

x= =− ± + =− ± + ,

2 6 36 yz 3 9 y

which implies that

√

1 1 16

x=− + +

3 9 y

in the neighborhood of (2, 3, 2).

(c)

( ) ( ) ( )

dx 1 16 1 16 8

= √ · − 2 = 7· − =− .

dy (2,3,2) 2 19 + 16 y 2· 3 9 21

y

29. Solution to PS 6 259

Then ( ) ( )

8 8 412

x = 2+ − (3.1 − 3) = 2 − − = .

21 210 210

(d) The second method involves more computations.

( )

x+y

f (x + y) = f x·

x

x+y

= f (x)

x

x

f (x) = f (x + y) .

x+y

Similarly,

y

f (y) = f (x + y) .

x+y

So

x y

f (x) + f (y) = f (x + y) + f (x + y)

x+y x+y

x+y

= f (x + y)

x+y

= f (x + y) .

If x is zero, then

f (2x) = 2 f (x) = 2 f (0) and

f (2x) = f (0) ⇒ 2 f (0) = f (0)

f (0) = 0.

Then

f (x + y) = f (0 + y) = f (y) = 0 + f (y) = f (0) + f (y) = f (x) + f (y) .

Same arguments hold if both x and y are zero.

Another method of proof is as follows: Let x > 0 and y > 0. Then x = yt for some t > 0.

f (x + y) = f (yt + y) = f [(1 + t)y] = (1 + t) f (y)

= f (y) + t f (y) = f (y) + f (ty) = f (y) + f (x)

The remaining cases of x = 0 or y = 0 are considered as in the earlier proof.

f (2x) = 2 f (x) = 2 f (0) and

f (2x) = f (0) ⇒ 2 f (0) = f (0)

f (0) = 0.

260 29. Solution to PS 6

( )

Take x and x′ such that f (x) = y > 0 and f x′ = y′ > 0. Then

f (x) = y

1

f (x) = 1

y

( )

x

f = 1.

y

Similarly

( ′)

x

f = 1.

y′

Take λ ∈ (0, 1) and define

λy

θ= .

λy + (1 − λ) y′

Then

(1 − λ) y′

1−θ =

λy + (1 − λ) y′

and θ ∈ (0, 1) . Function f is quasi-concave. So

( ( ) ( ′ )) { ( ) ( )}

x x x x′

f θ + (1 − θ) ′ > min f ,f

y y y y′

( ( ) ( ′ ))

λy x (1 − λ) y′ x

f ′

+ > min {1, 1}

λy + (1 − λ) y y λy + (1 − λ) y y′

′

( )

λx (1 − λ) x′

f + >1

λy + (1 − λ) y′ λy + (1 − λ) y′

( )

λx + (1 − λ) x′

f >1

λy + (1 − λ) y′

1 ( ′

)

f λx + (1 − λ) x >1

λy + (1 − λ) y′

( )

f λx + (1 − λ) x′ > λy + (1 − λ) y′

( )

= λ f (x) + (1 − λ) f x′

( )

it is concave. If f x′ is zero, since f is non-decreasing,

( )

f λx + (1 − λ) x′ > f (λx)

= f (λx) + 0

( )

= λ f (x) + (1 − λ) f x′ .

29. Solution to PS 6 261

( )

If both f (x) and f x′ are zero, then

( ) { ( )}

f λx + (1 − λ) x′ > min f (x) , f x′

( )

= 0 = λ f (x) + (1 − λ) f x′ .

(9) Since function f is homogeneous of degree m and is twice continuously differentiable, each of

the partial derivatives are homogeneous of degree m − 1.

Further, the partial derivatives are also continuously differentiable and are homogeneous

of degree m − 2 > 0.

Applying Euler’s theorem for second order partial derivatives of the partial derivative

D1 f (x), we get,

In general, applying Euler’s theorem to the second order partial derivatives of the partial

derivative Di f (x), we get,

for i = 1, · · · , n.

We can write these n equalities in matrix notation as

x1 D11 f (x) + x2 D12 f (x) + · · · + xn D1n f (x) (m − 1)D1 f (x)

··· ···

x1 Di1 f (x) + x2 Di2 f (x) + · · · + xn Din f (x) = (m − 1)Di f (x)

··· ···

x1 Dn1 f (x) + x2 Dn2 f (x) + · · · + xn Dnn f (x) (m − 1)Dn f (x)

This is equivalent to

D11 f (x) D12 f (x) · · · D1n f (x)

x1

D1 f (x)

··· ··· ··· ··· ···

···

Di1 f (x) Di2 f (x) · · · Din f (x) · xi = (m − 1) Di f (x)

··· ··· ··· ···

···

···

Dn1 f (x) Dn2 f (x) · · · Dnn f (x) xn Dn f (x)

The n × n square matrix on the left hand side is the Hessian matrix H f (x) for the function f .

Thus the LHS is H f (x) · x. Pre-multiplying both sides by the row vector x′ , we get

[ ]

x′ H f (x) · x = (m − 1) x1 D1 f (x) + · · · + xn Dn f (x) .

Applying Euler’s theorem to the sum on the RHS, we get

x′ H f (x) · x = (m − 1)[m f (x)] = m(m − 1) f (x).

262 29. Solution to PS 6

Chapter 30

Solution to PS 7

(1)

[ ]

∇g (x, y) = 3x2 − 3 3y2 − 2

[ ]

6x 0

(30.1) Hg (x, y) =

0 6y

Then

[ ] [ ]

(30.3) ∇g (x, y) = 3x2 − 3 3y2 − 2 = 0 0

√

2

(30.4) x∗ ∗

= 1, y =

3

( √ ) the theorem on convexity and global minima, g (x, y) attains

Using

global minimum at 1, 23 . Consider the function g defined for all x > 0, y > 0 by

( √ ) √

∗ 2 4 2

g 1, = −2 − = −3.09.

3 3 3

263

264 30. Solution to PS 7

4

3

−1 1 2 3

(2) We know that f ′ (x) = 0 is a necessary condition for f to have a local maxima or minima. Find

all the local maxima and minima of

(30.5) f ′ (x) = 4x3 − 12x2 + 8x = 0

( )

(30.6) 4x x2 − 3x + 2 = 0,

(30.7) x = 0, x = 1, x = 2

If we plot the graph of this function, we can see that x = 0, and x = 2 are local minima and

x = 1 is local maxima. Also x = 0, and x = 2 are global minima and there is no global maxima.

π(Q1 , Q2 ) = Q1 (100 − 5Q1 ) + Q2 (50 − 10Q2 ) − (50 + 10Q1 + 10Q2 ).

The first order conditions for the profit maximization are

D1 π(Q1 , Q2 ) = 100 − 10Q1 − 10 = 0, or Q1 = 9,

D2 π(Q1 , Q2 ) = 50 − 20Q1 − 10 = 0, or Q2 = 2.

We need to check the second order conditions. Note

D11 π = −10, D22 π = −20, andD12 π = D21 π = 0,

which gives the first order leading principal minor to be −10 and the second order leading

principal minor to be 200. So the Hessian is negative definite for all outputs in the positive

orthant. Therefore, the function π is concave function. Then Q1 = 9 and Q2 = 2 is a profit

30. Solution to PS 7 265

maximizing supply plan for the firm. The maximum profit is π∗ = 9 × 55 + 2 × 30 − 50 − 110 =

395.

(4) (a) The profit for the firm, when it uses K and L units of capital and labor to produce output

Q = La K b , given the out and input prices (P,w,r) is

Π(K, L) = (P · Q − wL − rK).

The firm maximizes it’s profit by choosing K and L such that both the FOC and SOC are

satisfied.

The FOCs are as under.

dΠ ( )

= P · aLa−1 K b − w = 0

dL

P · aLa−1 K b = w;

dΠ ( )

= P · La bK b−1 − r = 0,

dK

P · L bK b−1 = r.

a

The FOC with respect to L leads to the condition that the value of the marginal product

of labor is equal to the wage rate w. Similarly, the FOC with respect to K leads to the

condition that the value of the marginal product of capital is equal to the rental rate r.

(b) To solve for the optimal levels of L and K, we divide the first FOC by the second and get,

= = = = ;

P · MPK MPK dL P · La bK b−1 r

aK w

= ;

bL r

wb

K= L.

ra

Observe that the ratio of the MPL and MPK is the MRTS (Marginal rate of technical substi-

tution, i.e., the rate at which one can substitute labor for capital along an iso-quant.) The

266 30. Solution to PS 7

value of K can be substituted in any of the two FOC to get the expression for L.

P · aLa−1 K b = w;

( )b

a−1 wb

P · aL L = w;

ra

( )b

a+b−1 wb w

P·L = ;

ra a

( )1−b ( )b

a b

P· = L1−a−b

w r

( ) 1−a−b

1−b ( ) b

∗ a b 1−a−b 1

L = P 1−a−b .

w r

We compute the optimal value of K ∗ from the last equation as under:

wb

K∗ = L;

ra

( ) 1−b ( ) b

wb a 1−a−b b 1−a−b 1

= P 1−a−b ;

ra w r

( ) 1−a−b

1−b

−1 ( ) b

a b 1−a−b +1 1

= P 1−a−b

w r

( ) 1−a−b

a ( ) 1−a−b

1−a

a b 1

= P 1−a−b .

w r

(c) For SOC, we first write down the Hessian (the matrix of second order partial derivatives

using the FOCs.

[ ] [ ]

PFLL PFLK Pa(a − 1)La−2 K b PabLa−1 K b−1

H= = .

PFKL PFKK PabLa−1 K b−1 Pb(b − 1)La K b−2

For the SOC to be satisfied, the leading principal minor of order one needs to be negative

and the leading principal minor of order two needs to be positive. Thus, Pa(a−1)La−2 K b <

0, which implies that a − 1 < 0 or a < 1. The LPM of order two is the determinant of the

Hessian matrix.

[ ]

Pa(a − 1)La−2 K b PabLa−1 K b−1

det H = det

PabLa−1 K b−1 Pb(b − 1)La K b−2

= P2 ab(a − 1)(b − 1)L2a−2 K 2b−2 − (PabLa−1 K b−1 )2 ,

= P2 ab[(a − 1)(b − 1) − ab]L2a−2 K 2b−2 ;

= P2 ab[1 − a − b]L2a−2 K 2b−2 > 0,

30. Solution to PS 7 267

which holds true if and only if 1 − a − b > 0. Note that this condition also implies that

b < 1.

Thus the production function is such that it displays diminishing marginal product in each

of the two inputs (a < 1 and b < 1) and also it displays diminishing returns to scale as the

production function is homogeneous of degree a + b < 1.

(d) We use the expression for L∗ derived earlier to find the partial derivatives.

( ) ( ) 1−a−b

1−b ( ) b

∂L∗ 1 a b 1−a−b

P 1−a−b −1 > 0,

1

=

∂P 1−a−b w r

( ) ( ) 1−a−bb

∂L∗ 1−b 1−b

− 1−a−b

1−b

−1 b 1

= − (a) 1−a−b (w) P 1−a−b < 0,

∂w 1−a−b r

( ) ( ) 1−a−b1−b

∂L∗ b a

(b) 1−a−b (r)− 1−a−b −1 P 1−a−b < 0.

b b 1

= −

∂r 1−a−b w

(e) The output is obtained by noting that the profit maximizing inputs are K ∗ and L∗ .

Q∗ = (L∗ )a (K ∗ )b ,

a b

( ) 1−a−b 1−b ( ) b ( ) 1−a−b

a ( ) 1−a−b

1−a

a b 1−a−b a b

= P 1−a−b P 1−a−b ,

1 1

w r w r

( ) a(1−b)+ab ( ) ab+b(1−a)

a 1−a−b b 1−a−b a+b

= P 1−a−b ,

w r

( ) 1−a−b

a ( ) 1−a−b

b

a b a+b

= P 1−a−b ,

w r

[( ) ( ) ] 1−a−b

1

a a b b a+b

= P .

w r

For computing the price elasticity of supply with respect to out put price, note that

[( ) ( ) ] 1−a−b

1

a b

a b

Q∗ = Pa+b ,

w r

a+b

= AP 1−a−b ,

[( ) ( ) ] 1−a−b

1

a b

a b

where A = w r is a constant independent of P. It is easy to see that the

elasticity will be εP = a+b

1−a−b . [Note that for Q = APb , εP = dQ

dP · QP = AbPb−1 QP = b.]

268 30. Solution to PS 7

Similarly, εw = − 1−a−b

a

and εr = − 1−a−b

b

. Thus,

a+b −a −b

εP + εw + εr = + + ,

1−a−b 1−a−b 1−a−b

a+b−a−b

=− = 0.

1−a−b

The economic interpretation is that if we change all the prices by same factor, then the

profit maximizing quantity does not change. In other words, the profit maximizing output

is homogeneous of degree zero in the prices (P,w,r).

(f) You may like to write down the expression for the profit function explicitly in terms of P,

w and r, on your own.

(5) (a) The profit for the firm, when it uses K, L and R units of capital, labor and natural resources

to produce output Q = ALa K b + ln R, given the output and input prices (P,w,v,r), is

Π(K, L) = P · Q − wL − rK − vR = P · ALa K b + P ln R − wL − rK − vR.

The firm maximizes it’s profit by choosing K, L and R such that both the FOC and SOC

are satisfied.

The FOCs are as under.

dΠ

= P · aLa−1 K b − w = PFL − w = 0,

dL

P · AaLa−1 K b = w;

dΠ

= P · ALa bK b−1 − r = PFK − r = 0,

dK

P · ALa bK b−1 = r,

dΠ P

= − v = PFR − v = 0,

dR R

P

= v.

R

The FOC with respect to L leads to the condition that the value of the marginal product of

labor is equal to the wage rate w̄. Similarly, the FOC with respect to K leads to the condition

that the value of the marginal product of capital is equal to the rental rate r. Lastly, the FOC

with respect to R leads to the condition that the value of the marginal product of natural

resource is equal to the price of the natural resource v.

Now take A = 3, a = b = 31 for remainder of the problem.

(b) With the given parameter values, the FOCs are, (note Aa=1=Ab)

P · L− 3 K 3 = w;

2 1

P · L 3 K − 3 = r,

1 2

P

= v.

R

30. Solution to PS 7 269

For SOC, we first write down the Hessian (the matrix of second order partial derivatives

using the FOCs.

− 2 P · L− 3 K 3 − 23 − 23

5 1

3P·L

1

PFLL PFLK PFL R K 0

1 3 −2 −2

− 3 P · L 3 K− 3

1 5

H = PFKL PFKK PFK R = P · L 3 K 3

3

2

0 .

PFRL PFRK PFR R 0 0 − RP2

For the SOC to be satisfied, the leading principal minor of order one needs to be negative,

the leading principal minor of order two needs to be positive and the leading principal

minor of order three needs to be negative.

The LPM of order 1 is negative as, − 32 P · L− 3 K 3 < 0 (given that P > 0 and K > 0, L > 0).

5 1

The LPM of order two is the determinant of the matrix obtained by removing the third row

and the third column.

[ ]

− 23 P · L− 3 K 3 31 P · L− 3 K − 3

5 1 2 2

det H2 = det 1 − 23 − 23

− 23 P · L 3 K − 3

1 5

3P·L K

4 1

= P2 · L− 3 K − 3 − P2 · L− 3 K − 3 ,

4 4 4 4

9 9

1

= P2 · L− 3 K − 3 > 0.

4 4

3

The LPM of order three is the determinant of the Hessian matrix. We compute the deter-

minant using the third row to get,

− 32 P · L− 3 K 3 13 P · L− 3 K − 3

5 1 2 2

0

det H = det 13 P · L− 3 K − 3 − 23 P · L 3 K − 3

2 2 1 5

0

0 0 − RP2

[ ]

P 4 2 −4 −4 1 2 −4 −4

=− 2 P ·L K − P ·L K

3 3 3 3 ,

R 9 9

1 P3 − 4 − 4

=− · L 3 K 3 < 0.

3 R2

Hence the SOC is satisfied.

(c) To solve for the optimal levels of L and K, we divide the first FOC by the second and get,

(note a = b = 13 )

P · MPL MPL dK P · aLa−1 K b w

= = = = ;

P · MPK MPK dL P · La bK b−1 r

aK w

= ;

bL r

w

K = L.

r

Observe that the ratio of the MPL and MPK is the MRTS (Marginal rate of technical substi-

tution, i.e., the rate at which one can substitute labor for capital along an iso-quant.) The

270 30. Solution to PS 7

value of K can be substituted in any of the two FOC to get the expression for L.

P · L− 3 K 3

2 1

= w;

( ) 13

− 23 w

P·L L = w;

r

( ) 13

w 1

P· = wL 3 ;

r

( )1

1 3 1

P· = L3 ;

rw2

P3

L∗ =

.

rw2

Taking the derivative of L with respect to r we obtain

dL∗ P3

= − 2 2.

dr r w

2 1

dP · L− 3 K 3 − P · L− 3 K 3 dL + P · L− 3 K − 3 dK = dw

2 1 5 1 2 2

3 3

1 2

dP · L 3 K − 3 + P · L− 3 K − 3 dL − P · L 3 K − 3 dK = dr

1 2 2 2 1 5

3 3

dP P

− dR = dv.

R2 R3

We can write this in matrix form as under:

− 32 P · L− 3 K 3 13 P · L− 3 K − 3

5 1 2 2

0

dL

dw − dP · L − 23 31

K

A = 13 P · L− 3 K − 3 − 23 P · L 3 K − 3 0 , q = dK b = dr − dP · L 3 K − 3 .

2 2 1 5 1 2

0 0 − RP2 dR dv − dP

R2

Then Aq = b. Note that the matrix A is same as the Hessian. Solving for dL, when

dP = dw = dv = 0 and dr ̸= 0, using Cramer’s Rule, we get,

0 13 P · L− 3 K − 3

2 2

0

det dr − 23 P · L 3 K − 3

1 5

0

0 0 − RP2

dL =

− 23 P · L− 3 K 3 31 P · L− 3 K − 3

5 1 2 2

0

det 13 P · L− 3 K − 3 − 23 P · L 3 K − 3

2 2 1 5

0

0 0 − RP2

(− RP2 )(−dr) 13 P · L− 3 K − 3

2 2 2 2

dr · L 3 K 3

= =− < 0.

− 13 PR2 · L− 3 K − 3

3 4 4

P

30. Solution to PS 7 271

dL∗

2 2

L3 K 3

=− < 0.

dr P

dL∗

Thus, L∗ decreases as r increases. To see that we obtain identical expression for dr

as in the previous part, observe,

P3 P6 P4

K∗ = ; L ∗

· K ∗

= ; (L ∗

· K ∗ 32

) =

r2 w r3 w3 r2 w2

dL∗ (L∗ K ∗ ) 3

2

P3

=− = 2 2

dr P r w

dL ∗ 2

L3 K 3

2

P3

=− = − 2 2.

dr P r w

(ii) Solving for dL, when dP = dw = dr = 0 and dv ̸= 0, using Cramer’s Rule, we get,

0 13 P · L− 3 K − 3

2 2

0

det 0 − 23 P · L 3 K − 3

1 5

0

dv 0 − RP2

dL =

− 23 P · L− 3 K 3 13 P · L− 3 K − 3

5 1 2 2

0

det 13 P · L− 3 K − 3 − 23 P · L 3 K − 3

2 2 1 5

0

0 0 − RP2

(− RP2 )(0) 31 P · L− 3 K − 3

2 2

=

− 13 RP2 · L− 3 K − 3

3 4 4

0

=− 4 = 0.

− 13 PR2 · L− 3 K − 3

3 4

dL∗

= 0.

dv

Since L∗ does not depend on v, this conclusion is obvious.

Chapter 31

Solution to PS 8

{ ( ) }

C = (x1 , . . . , x3 ) ∈ R3 : d 2 (0, 0, 0) , (x1 , x2 , x3 ) = 1 ,

therefore C is ( )

(i) bounded since C ⊂ B (0, 0, 0) , 2 : indeed, x ∈ C ⇒ d(x, 0) = 1 < 2 ⇒ x ∈

B(0, 2),

(ii) closed in R3 since it is defined as a level set in R3 of polynomial and therefore con-

tinuous function ∑3i=1 xi2 (use the characterization of closed set in terms of convergent

sequences),

(iii) non-empty since (1, 0, 0) ∈ C.

Since objective function ∑3i=1 ci xi is linear, and therefore continuous on R3 , Weierstrass

theorem is applicable and yields x̄ ∈ C such that ∑3i=1 ci xi ≤ ∑3i=1 ci x̄i for any (x1 , x2 , x3 ) ∈ C.

(b) The optimization problem can be rewritten as

max f (x)

(31.1) subject to g(x) = 0

and x ∈ R3

where

3 3

f (x) = ∑ ci xi and g(x) = ∑ xi2 − 1.

i=1 i=1

Both functions f and g are polynomial and therefore continuously differentiable on an open

set R3 . Since x̄ is a point of global maximum of f subject to the constraint g(x) = 0, it is

also a local maximum of f subject to the constraint g(x) = 0. Since g(0) = −1 ̸= 0 we have

273

274 31. Solution to PS 8

x̄ ̸= 0. Now

∇g(x) = 2 (x1 , x2 , x3 )′ ̸= 0 for x ̸= 0,

and x̄ ̸= 0, hence constraint qualification ∇g(x̄) ̸= 0 holds. Therefore by Lagrange’s theorem

there exists λ ∈ R such that ∇ f (x̄) = λ∇g (x̄), or

(31.2) (c1 , c2 , c3 )′ = λ2 (x̄1 , x̄2 , x̄n )′

If we premultiply (31.2) by the row vector (x̄1 , x̄2 , x̄n ), we will get

3 3 ( )

(31.3) ∑ ci x̄i = 2λ ∑ x̄i2 = 2λ g(x̄) + 1 = 2λ (0 + 1) = 2λ

i=1 i=1

If we premultiply (31.2) by row vector (c1 , c2 , c3 ), equation (31.3) yields

( )2

3 3 3

(31.4) ∥c∥2 = ∑ c2i = 2λ ∑ ci x̄i = ∑ ci x̄i

i=1 i=1 i=1

3

To conclude that the result holds we only need to show that

|c |

(c1 , c2 , c3 ) ̸= (0, 0, 0), we have ci ̸= 0 for some i. Since g ei cii = 0 and x̄ solves (31.1 ),

by definition of the solution to the constrained maximization problem

( )

i |ci |

3

∑ iic x̄ = f (x̄) ≥ f e

ci

= |ci | > 0.

i=1

Now taking square roots in (31.4) yields the results.

q

(c) Let us define c = p, and consider x̂ = ∥q∥ . Then ∥x̂∥ = 1, hence g (x̂) = 0 and the definition

of the solution of the constrained maximization problem yields

3 3

1 3 1 3 pq

∥p∥ = ∥c∥ = ∑ ci x̄i = f (x̄) ≥ f (x̂) = ∑ ci x̂i = ∑ ci qi = ∑ pi qi = ∥q∥

i=1 i=1 ∥q∥ i=1 ∥q∥ i=1

q

Analogously, for x̌ = − ∥q∥

we have ∥x̌∥ = 1 hence g (x̌) = 0 and the definition of the solution

of the constrained maximization problem yields

3 3

1 3

∥p∥ = ∥c∥ = ∑ ci x̄i = f (x̄) ≥ f (x̌) = ∑ ci x̌i = − ∑ ci qi

i=1 i=1 ∥q∥ i=1

3

1 pq

= − ∑

∥q∥ i=1

pi qi = −

∥q∥

.

−∥p∥ ∥q∥ ≤ pq ≤ ∥p∥ ∥p∥ ⇔ |pq| ≤ ∥p∥ ∥q∥.

{ }

2. Necessity Route : Function f (x, y) = x2 −3xy is continuous and the constraint set (x, y) ∈ R2+ | x + 2y = 10

which we denote by G(x, y) is non-empty, (10, 0) is contained in it, closed

as the set

√ is defined

by weak inequalities which are preserved in the limit and bounded as (x, y) 6 102 + 52 =

31. Solution to PS 8 275

√

125. So the constraint set is compact and non-empty and the objective function f is contin-

uous, hence Weierstrass theorem is applicable and a solution exists. The Lagrangian and the

FOCs are

(31.5) L (x, y, λ) = x2 − 3xy + λ (2y + x − 10)

∂L (x, y, λ)

(31.6) = 2x − 3y + λ = 0

∂x

∂L (x, y, λ) 3

(31.7) = −3x + 2λ = 0 → λ = x

∂y 2

∂L (x, λ)

(31.8) = 2y + x − 10 = 0.

∂λ

Now

3 7 7

2x − 3y + λ = 2x − 3y + x = 0 → x = 3y → y = x

2 2 6

7 10

2y + x − 10 = 0 → x + x − 10 = 0 → x = 10 → x = 3

3 3

7 7 9

y = ·3 = ,λ = .

6 2 2

We get an interior candidate for solution

( )

7 9

m1 = 3, , .

2 2

The constraint qualification

( ) [ ]

∇g x∗ , y∗ = 1 2 ̸= 0

( )

7 −45

f (10, 0) = 100, f (0, 5) = 0, f 3, = .

2 2

The solution then is (x∗ , y∗ ) = (10, 0). Note that we cannot use sufficiency route since f is not

concave.

3. Necessity Route: A solution exists by arguments similar to the earlier problem. The La-

grangian and the FOCs are

1 2

(31.9) L (x, y, λ) = x 3 y 3 + λ (4 − 2x − y)

∂L (x, y, λ) 1 −2 2

(31.10) = x y − 2λ = 0

3 3

∂x 3

∂L (x, y, λ) 2 1 −1

(31.11) = x3 y 3 −λ = 0

∂y 3

∂L (x, λ)

(31.12) = 4 − 2x − y = 0.

∂λ

276 31. Solution to PS 8

Now

1 − 23 23

3x y 2λ y

= → = 2 → y = 4x

2 3 − 13

1

λ 2x

3x y

( ) 13

2 8 2 1

4 − 2x − y = 4 − 2x − 4x = 0 → x = , y = , λ =

3 3 3 4

We get an interior candidate for solution

( ) 13

2 8 2 1

m1 = , , .

3 3 3 4

( ) [ ]

∇g x∗ , y∗ = −2 −1 ̸= 0

( ) ( ) 13 ( ) 23

2 8 2 8

f (2, 0) = 0 = f (0, 4) = 0, f , = > 0.

3 3 3 3

( )

The solution then is (x∗ , y∗ ) = 2 8

3, 3 .

Sufficiency route:

[ ]

∇ f (x, y) = 1 − 23 32 2 13 − 13

3x y 3x y

[ ]

− 29 x− 3 y 3 2 − 23 − 13

5 2

H f (x, y) = 9x y

.

2 − 23 − 31

− 92 x 3 y− 3

1 4

9x y

The determinant of Principal minors of order one,

2 5 2

− x− 3 y 3 6 0,

9

2 1 −4

− x3 y 3 6 0

9

and principal minor of order two

0>0

for ∀ (x, y) ∈ R2+ . Hence f is concave. The constraint is linear and so concave.

( ) λ > 0. L (x, y, λ)

is concave and FOC are sufficient for maximum. Therefore (x∗ , y∗ ) = 2 8

3, 3 which satisfies the

FOC is the solution.

4. Let f : R2 → R

√

max f (x, y) = xy

(31.13)

subject to x + y 6 6, x > 0, y > 0.

31. Solution to PS 8 277

This problem has inequality constraint and so we will use Kuhn Tucker Sufficiency theorem.

We need to check all conditions of the Theorem are satisfied.

(i) Let

{ }

X = (x, y) ∈ R2++ .

Then X is open as its complement

{ }

X C = (x, y) ∈ R2 | x 6 0, y 6 0

is closed.

(ii) Function f (x, y) is continuous as x, and y are continuous, and f (·) is obtained by taking the

product of these two continuous functions. Let g1 (x, y) = 6 − x − y, 2 3

√g (x, y) = x, g (x, y) =

√

1 y 1 x

y are linear and hence continuous functions. Further fx (x, y) = 2 x , fy (x, y) = 2 y are

continuous functions. Hence f , g j ( j = 1, · · · , 3) are continuously differentiable on X.

(iii) The set X is convex as (x1 , y1 ), (x2 , y2 ) ∈ X,then

x1 > 0, x2 > 0 → λx1 + (1 − λ) x2 > 0∀λ ∈ (0, 1)

y1 >0, y2 > 0 → λy1 + (1 − λ) y2 > 0∀λ ∈ (0, 1)

( )

→ λx1 + (1 − λ) x2 , λy1 + (1 − λ) y2 ∈ X.

(iv) Function f (x, y) is concave as

[ √ √ ]

y

∇ f (x, y) = 1

2 x

1

2

x

y

√ √

y

− 1

4√ x3

1

4

1

xy

H f (x, y) = 1 1 √ .

4 xy − 41 x

y3

√

1 y

− 6 0,

4 x3

√

1 x

− 6 0

4 y3

and principal minor of order two

0>0

for ∀ (x, y) ∈ X. Hence f is concave. Further, g j ( j = 1, · · · , 3) are concave being linear

functions.

Hence for the following problem

√

max f (x, y) = xy

(x,y)∈X

subject to x + y 6 6, x > 0, y > 0.

278 31. Solution to PS 8

all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈

X × R3+ , that satisfies the Kuhn-Tucker conditions.

m

(i) Di f (x∗ ) + ∑ λ∗j Di g j (x∗ ) = 0; i = 1, · · · , n,

j=1

(ii) g(x ) > 0 and λ∗ · g(x∗ ) = 0.

∗

They are

√

1 y

(31.14) − λ1 + λ2 = 0

2 x

√

1 x

(31.15) − λ1 + λ3 = 0

2 y

(31.16) 6 − x − y > 0, λ1 (6 − x − y) = 0

(31.17) x > 0, λ2 x = 0; y > 0, λ3 y = 0

√ √

If λ1 = 0, then 1

2

x

y − λ 1 + λ 3 = 0 → λ 3 = − 1

2 y < 0 which contradicts λ3 > 0. Hence

x

λ1 > 0 → 6 − x − y = 0

Since x > 0, y > 0, λ2 = 0, λ3 = 0,

√ √ √ √

1 y 1 x 1 x 1 y

− λ1 + λ2 = − λ1 + λ3 = 0 → = λ1 =

2 x 2 y 2 y 2 x

→ x = y → 6 − x − y = 0 → x = y = 3 > 0.

Note that all conditions are satisfied. Hence it is a global maximum on X. Observe that it is also

a global maximum on R2+ as

f (x, y) = 0 for (x, y) = R2+ \ X

and f (3, 3) > 0. Hence, (3, 3) solves the optimization problem.

5. Let f : R2 → R

max f (x, y) = x + ln (1 + y)

(31.18)

subject to x ≥ 0, y ≥ 0 and x + py ≤ m.

Again we will use Kuhn Tucker Sufficiency theorem. We need to check all conditions of the

Theorem are satisfied.

(i) Let { }

X = (x, y) ∈ R2 | x > −1, y > −1 .

Then X is open as its complement

{ }

X C = (x, y) ∈ R2 | x 6 −1, y 6 −1

is closed.

31. Solution to PS 8 279

(ii) Function f (x, y) is continuous as x and ln (1 + y), for y > −1 are continuous, and f (·) is

sum of two continuous functions. Let g1 (x, y) = m − x − py, g2 (x, y) = x, g3 (x, y) = y are

1

linear and hence continuous functions. Further fx (x, y) = 1, fy (x, y) = 1+y are continuous

functions. Hence f , g j ( j = 1, · · · , 3) are continuously differentiable on X.

(iii) The set X is convex as (x1 , y1 ), (x2 , y2 ) ∈ X, then

x1 > −1, x2 > −1 → λx1 + (1 − λ) x2 > −1∀λ ∈ (0, 1)

y1 > −1, y2 > −1 → λy1 + (1 − λ) y2 > −1∀λ ∈ (0, 1)

( )

→ λx1 + (1 − λ) x2 , λy1 + (1 − λ) y2 ∈ X.

(iv) Function f (x, y) is concave as

[ ]

∇ f (x, y) = 1 1

1+y

[ ]

0 0

H f (x, y) = 0 − 1 .

(1+y)2

0 6 0,

1

− 6 0

(1 + y)2

and principal minor of order two

0>0

for ∀ (x, y) ∈ X. Hence f is concave. g j ( j = 1, · · · , 3) are concave being linear functions.

Hence for the following problem

max f (x, y) = x + ln (1 + y)

(x,y)∈X

subject to x + py 6 m, x > 0, y > 0.

all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair ((x∗ , y∗ ) , λ∗ ) ∈

X × R3+ , that satisfies the Kuhn-Tucker conditions.

m

(i) Di f (x∗ ) + ∑ λ∗j Di g j (x∗ ) = 0; i = 1, · · · , n, and

j=1

(ii) g(x ) > 0 and λ∗ · g(x∗ ) = 0.

∗

They are

(31.19) 1 − λ1 + λ2 = 0

1

(31.20) − pλ1 + λ3 = 0

1+y

(31.21) m − x − py > 0, λ1 (m − x − py) = 0

(31.22) x > 0, λ2 x = 0; y > 0, λ3 y = 0.

280 31. Solution to PS 8

λ1 > 0 → m − x − py = 0

and x = y = 0 is ruled out because m > 0. There are three remaining cases.

(i) x > 0, y = 0. Note λ2 = 0, x = m,

1 = λ1

1 − p + λ3 = 0

λ3 = p − 1.

(ii) x = 0, y > 0. Note λ3 = 0, y = mp ,

1 1

( )= == λ1

p 1 + mp p+m

1 − λ1 + λ2 = 0

1

1− + λ2 = 0

p+m

1

λ2 = − 1.

p+m

( )

1

If p+m −1 > 0 → 1 > p+m, then λ2 > 0. So solution is 0, mp , p+m

1 1

, p+m − 1, 0 if p+m 6

1.

(iii) x > 0, y > 0. Note λ2 = 0, λ3 = 0,

1 1

(31.23) 1 = λ1 ,

= p → y = −1 > 0

1+y p

(31.24) m − x − py = 0 → x = m − 1 + p > 0

( )

Hence for 1 > p > 1 − m, the solution is m − 1 + p, 1p − 1, 1, 0, 0 . Combining them the

( )

solution x∗ , y∗ , λ∗1 , λ∗2 , λ∗3 is

(m, 0, 1, 0, p −)1) if p > 1

( m 1 1

0, p , p+m , p+m − 1, 0 if p 6 1 − m and

( )

m − 1 + p, 1p − 1, 1, 0, 0 if 1 − m < p < 1.

The Kuhn Tucker Sufficiency Theorem asserts that this solution is a global maximum and

therefore solves both the problem.

31. Solution to PS 8 281

max f (x)

(31.25) subject to g(x) ≥ 0

and x∈X

}

max f (x)

(31.26)

subject to x ∈ X

(a) We claim that x̄ is also a solution to problem (31.25). For, if this is not the case, then since

x̄ is in the constraint set {x ∈ X : g(x) ≥ 0} of problem (31.25), there is some x′ ∈ X, with

g(x′ ) ≥ 0, such that f (x′ ) > f (x̄). But, since x′ ∈ X and is therefore in the constraint set

of problem (31.26), this means that x̄ is not a solution to problem (31.26), a contradiction.

This establishes our claim. [Note that we are not given the information that problem (31.25)

has a solution, and so we do not make use of this information in the answer].

(b) Let x̂ be any solution to problem (31.25). Note that since both x̂ and x̄ are in X, the constraint

set of problem (31.26), and x̄ solves problem (31.26), we have

(31.27) f (x̄) ≥ f (x̂)

We claim that g(x̂) = 0. For if g(x̂) ̸= 0, we must have g(x̂) > 0, since x̂ is a solution to

problem (31.25), and must therefore be in the constraint set {x ∈ X : g(x) ≥ 0} of problem

(31.25).

Since x̄ is not a solution to problem (31.25), and x̄ ∈ X, it must be the case that g(x̄) < 0.

For if g(x̄) ≥ 0, then, given (31.27), x̄ would also solve problem (31.25).

Since g(x̄) < 0, continuity of g on the convex set X [using the intermediate value theorem]

implies that we can find λ ∈ (0, 1), such that:

(31.28) g(λx̂ + (1 − λ)x̄) = 0

Denote (λx̂+(1−λ)x̄) by z. Then z ∈ X and g(z) = 0 by (31.28), so z satisfies the constraints

of problem (31.25).

Since f is strictly quasi-concave on X, then we can use x̂ ̸= x̄ [recall that g(x̄) < 0 while

g(x̂) > 0], and λ ∈ (0, 1), to obtain:

f (z) = f (λx̂ + (1 − λ)x̄) > min{ f (x̂), f (x̄)} = f (x̂)

using (31.27). But this contradicts the fact x̂ solves (31.25), and establishes our claim.

7. Suppose that a consumer has the utility function U(x, y) = xa yb and faces the budget constraint

px x + py y ≤ I.

(A) Utility Maximization

282 31. Solution to PS 8

(a) What are the first order conditions for utility maximization?

Observe that the utility function makes sense only if a > 0 and b > 0. The Lagrangean

for the optimization problem is

L (x, y, λ) = U(x, y) + λ(I − px x − py y)

= xa yb + λ(I − px x − py y)

The first order conditions are,

∂L

= axa−1 yb − λpx = 0

∂x

∂L

= bxa yb−1 − λpy = 0

∂y

∂L

= I − px x − py y = 0

∂λ

(b) Solve for the consumer’s demands for goods x and y.

From the first two FOCs, we get

axa−1 yb = λpx

bxa yb−1 = λpy

Dividing the first equation by the second, we get

axa−1 yb λpx

=

bxa yb−1 λpy

ay px

=

bx py

b

py y = px x.

a

We use this in the third FOC to get,

px x + py y = I

b

px x + px x = I

a

a+b

px x = I

a

a a I

px x ∗ = I → x∗ =

a+b a + b px

This gives

b b I

py y∗ = I → y∗ =

a+b a + b py

31. Solution to PS 8 283

(c) Solve for the value of λ. What is the economic interpretation of λ? When is λ an

increasing, decreasing or constant function of income?

We use the first FOC (with respect to x) to get,

axa−1 yb

axa−1 yb = λpx → λ∗ =

px

( )a−1 ( )b

a I b I

a a+b px a+b py

λ∗ =

px

( )a ( )b ( )a+b−1

a b I

= >0

px py a+b

L ∗ (x∗ , y∗ , λ∗ ) = U(x∗ , y∗ ) + λ∗ (I − p∗x − py y∗ )

= (x∗ )a (y∗ )b + λ∗ · (0)

Suppose the income increased by a dollar. Then the utility goes up by λ∗ .

Lastly the λ∗ is increasing with income if and only if a + b > 1.

(d) Show that the second order conditions hold?

Observe that the second order partial derivatives are,

∂2 L

= a(a − 1)xa−2 yb

∂x2

∂2 L

= abxa−1 yb−1

∂x∂y

∂2 L

= b(b − 1)xa yb−2

∂y2

∂2 L

= −px

∂x∂λ

∂2 L

= −py

∂y∂λ

2

∂ L ∂2 L ∂2 L

∂∂x

2 ∂x∂y ∂x∂λ a(a − 1)x a−2 yb abx a−1 yb−1 −p x

a yb−2 −p .

H = ∂2 L

2L ∂2 L

∂x∂y ∂y 2 ∂y∂λ = abx a−1 yb−1 b(b − 1)x y

∂2 L ∂2 L ∂2 L −px −py 0

∂x∂λ ∂y∂λ ∂λ2

The border preserving leading principal minor of order 2 is the Hessian matrix itself.

For the second order condition to be satisfy, the determinant of the Hessian needs to be

284 31. Solution to PS 8

positive.

− (−py )[(−py )a(a − 1)xa−2 yb − (−px )abxa−1 yb−1 ]

= px [py abxa−1 yb−1 − px b(b − 1)xa yb−2 ] − py [py a(a − 1)xa−2 yb − px abxa−1 yb−1 ]

= 2px py abxa−1 yb−1 − p2x b(b − 1)xa yb−2 − p2y a(a − 1)xa−2 yb

[ ]

2abp p b(b − 1)p 2 a(a − 1)p 2

= (x∗ )a (y∗ )b

x y y

− x

−

xy y2 x2

2abpx py b(b − 1)px a(a − 1)py

2 2

= (x∗ )a (y∗ )b aI bI

− bI

− aI

(a+b)px (a+b)py ( (a+b)p y

)2 ( (a+b)p x

)2

( [ ]2 [ ]2 [ ]2 )

(a + b)p p b − 1 p p a − 1 p p

= (x∗ )a (y∗ )b 2

x y x y x y

− (a + b)2 − (a + b)2

I b I a I

[ ]2 ( )

(a + b)px py b−1 a−1

= (x∗ )a (y∗ )b 2− −

I b a

[ ]2 ( )

∗ a ∗ b (a + b)px py 1 1

= (x ) (y ) 2−1+ −1+

I b a

[ ]2 ( )

(a + b)px py 1 1

= (x∗ )a (y∗ )b + >0

I b a

dx

(e) Show that the implicit function theorem value of dI is identical to the value of taking

the partial derivative of x∗ with respect to I.

Using x∗ , we get

∂x∗ a 1

=

∂I a + b px

31. Solution to PS 8 285

0 abxa−1 yb−1 −px

det 0 b(b − 1)xa yb−2 −py

dx∗ −1 −py 0 −1[(−py )abxa−1 yb−1 − (−px )b(b − 1)xa yb−2 ]

= =

dI det H det H

[py abx y − px b(b − 1)x y ]

a−1 b−1 a b−2

=

det H

[apy y − px (b − 1)x] xpx bxa yb−2 px

= bxa−1 yb−2 = bxa−1 yb−2 =

det H det H det H

bxa yb−2 px

= [ ] ( )

(a+b)px py 2 1

(x∗ )a (y∗ )b I b + 1

a

bpx

= [ ] ( )

(a+b)px py 2 a+b

(y∗ )2 I ab

px 1 a 1

=[ ]2 ( ) =( ) = .

y∗ (a+b)py a+b a+b

px a + b px

bI a p2x a

(f) A consumer’s indirect utility function is defined to be utility as a function of prices and

income. Use x∗ and y∗ to solve for the indirect utility function. Is it true that the partial

of the indirect utility function with respect to income equals λ?

The indirect utility function is

( )a ( )b

∗ ∗ ∗ ∗ a ∗ b aI bI

u = u(x , y ) = (x ) (y ) =

(a + b)px (a + b)py

( )a ( ) b

a b

= I a+b

(a + b)px (a + b)py

Then,

( )a ( )b

∂u∗ a b

= (a + b) I a+b−1

∂I (a + b)px (a + b)py

( )a ( )b ( )a+b−1

a b I

= = λ∗

px py a+b

(B) Expenditure Minimization:

Now consider the “dual ”of the utility maximization problem. The dual problem is to min-

imize expenditures, px x + py y, subject to reaching a given level of utility, u0 (the constraint

is therefore U0 − xa yb = 0).

286 31. Solution to PS 8

(a) What are the first order conditions for expenditure minimization?

First, we write down the minimization problem as

min px x + py y

subject to xa yb ≥ u0 ,

which can be converted into a maximization exercise as under:

max −px x − py y

subject to xa yb ≥ u0 ,

The Lagrangean for the maximization problem is

L (x, y, λ) = −px x − py y + λ(xa yb − u0 )

∂L

= −px + λaxa−1 yb = 0

∂x

∂L

= −py + λbxa yb−1 = 0

∂y

∂L

= xa yb − u0 = 0

∂λ

(b) Use the first order conditions to solve for x∗ and y∗ (these are called the Hicksian or

compensated demand functions).

From the first two FOCs, we get

λaxa−1 yb = px

λbxa yb−1 = py

Dividing the first equation by the second, we get

λaxa−1 yb px

=

λbx y

a b−1 py

ay px

=

bx py

b px

y= x.

a py

We use this in the third FOC to get,

( )b ( ) b

a b a b p x a+b u0 ∗ a py a+b a+b1

x y = u0 ; x x = u0 ; x =( )b ; x = u0

a py b px b px

a py

( ) b ( ) a+b

a

∗ b px ∗ b px a py a+b 1

a+b

b px 1

y = x = u0 = u0a+b

a py a py b px a py

31. Solution to PS 8 287

It is easy to see that the bordered Hessian is same as in the case of utility maximization

exercise. Hence we conclude that the SOC holds in this case.

(d) Write the level of income, I, necessary to reach U0 as a function of U0 , prices, and

parameters. How does this expenditure function relate to the indirect utility function?

( ) a+b

b ( ) a+b

a

a py 1 b px 1

e(px , py , u0 ) = px x∗ + py y∗ = px a+b

u0 + py u0a+b

b px a py

( ) a+b

b ( ) a+b

a

a b

= (pax pby u0 ) a+b

1

+

b a

1

a+b

( )a ( ( ) a+b

b )b

( ) a+b

a

a b a b

I a+b

1

a b a+b

= (px py ) +

(a + b)px (a + b)py b a

[( 1

)a ( )b ] a+b ( ) a+b

b ( ) a+ba

a b a b I = I.

= +

a+b a+b b a

This shows that the minimum expenditure required to attain utility equal to the indirect

utility function is same as the income I. Thus the two approaches are equivalent.

(e) To avoid confusion, let us call solution for utility maximization of good x as x∗ and

solution for good x in expenditure minimization as h∗ . Prove that

∂x∗ ∂h∗ ∂x∗

= − x∗ .

∂Px ∂Px ∂I

Interpret this answer.

1

Observe that we can rewrite h∗ as h∗ = θ(px )− a+b where θ ≡ ( ab py ) a+b u0a+b . This gives

b b

us

( ) ( ) ∗

∂h∗ b − a+b

b

−1 b h

=θ − (px ) = − .

∂px a+b a + b px

Also from the utility maximization, we get,

( )

∂x∗ aI −x∗

= − (px )−2 = .

∂px a+b px

and

∗

( )

∗ ∂x ∗ a

x =x (px )−1 .

∂I a+b

288 31. Solution to PS 8

Therefore,

( )( ∗ ) ( ∗)

∂x∗ ∗ ∂x

∗ −x∗ a x b x ∂h∗

+x = + =− = .

∂px ∂I px a+b px a + b px ∂px

The change in x∗ due to change in own price px (Total effect) is the sum of the substi-

∂h∗ ∗

tution effect ( ∂p x

) and the income effect (−x∗ ∂x

∂I ).

8. Suppose a consumer has the utility function U = a ln(x − x0 ) + b ln(y − y0 ) where a, b, x0 and y0

are positive parameters. Assume that the usual budget constraint applies.

(a) Solve for the consumer’s demand for good x.

Observe that the utility maximization exercise makes sense if consumption bundle (x0 , y0 )

is feasible. Let us denote x − x0 by x′ and y − y0 by y′ . Then the utility function can be

written as U(x′ , y′ ) = a ln(x′ ) + b ln(y′ ). The budget constraint px + qy = I can be written as

px′ +qy′ = I − px0 −qy0 = I ′ . The utility maximization exercise can therefore be formulated

as

max a ln(x′ ) + b ln(y′ )

subject to px′ + qy′ = I ′ .

∂L a

′

= ′ − λpx = 0

∂x x

∂L b

′

= ′ − λpy = 0

∂y y

∂L

= I ′ − px x′ − py y′ = 0

∂λ

a b

′

= λpx ; ′ = λpy

x y

ay′ px b

′

= ; py y′ = px x′ .

bx py a

31. Solution to PS 8 289

px x′ + py y′ = I ′

b

px x′ + px x′ = I ′

a

a+b

px x′ = I ′

a

a ′ a I′

px x′ = I → x′ =

a+b a + b px

This gives

b ′ b I′

py y′ = I → y′ =

a+b a + b py

We need to show that the second order conditions hold for the solution to yield a maximum.

Observe that the second order partial derivatives are,

∂2 L a ∂2 L ∂2 L b ∂2 L ∂2 L

= − ; = 0; = − ; = −p x ; = −py

∂x2 (x′ )2 ∂x∂y ∂y2 (y′ )2 ∂x∂λ ∂y∂λ

Using these, we get the bordered Hessian matrix as under:

2 a

∂ L ∂2 L ∂2 L

∂x ∂x∂y ∂x∂λ − (x′ )2 0 −px

∂2 L ∂2 L ∂2 L

2

H = ∂x∂y ∂y2 ∂y∂λ = 0 − (yb′ )2 −py .

∂2 L ∂2 L ∂2 L −px −py 0

∂x∂λ ∂y∂λ ∂λ2

The border preserving leading principal minor of order 2 is the Hessian matrix itself. For

the second order condition to be satisfy, the determinant of the Hessian needs to be positive.

[ ( )] [ ( )]

b a

det H = (−px ) −(−px ) − ′ 2 − (−py ) (−py ) − ′ 2

(y ) (x )

( ) ( )

bp2x ap2y

= + > 0.

(y′ )2 (x′ )2

Thus SOC holds and we have a maximum. The optimum consumption bundle is

a I − px x0 − py y0

x∗ = x′ + x0 = + x0

a+b px

a I − py y0 b

= + x0

a + b px a+b

b I − px x0 a

y∗ = + y0

a + b py a+b

(b) Find the elasticities of demand for good x with respect to income and prices.

It is easy to compute the price and income elasticity using the definitions. Please let me

know if you have any questions on this.

290 31. Solution to PS 8

(c) Show that the utility function V = 45(x − x0 )3.5a (y − y0 )3.5b would have yielded the same

demand for good x.

If we take positive monotone transformation of the given utility by taking its natural log,

then we get a function which is similar to the utility function in (a).

lnV = ln 45 + 3.5a ln(x − x0 ) + 3.5b ln(y − y0 )

= ln 45 + 3.5(U)

This implies that the consumption bundle (x∗ , y∗ ) will maximize the utility function V also.

U(x, y, z) = a ln(x) + b ln(y) + c ln(z)

where a > 0, b > 0 and c > 0 are such that a + b + c = 1. The budget constraint can be written

as

g1 (x, y, z) = I − px − qy − rz ≥ 0.

The rationing constraint is

g2 (x, y, z) = k − x ≥ 0.

(a) This problem has two inequality constraints and so we will use Kuhn Tucker Sufficiency

theorem.

(i) Let { }

X = (x, y, z) ∈ R2+++ .

Then X is open as its complement

{ }

X C = (x, y, z) ∈ R3 | x 6 0, y 6 0, z 6 0

is closed.

(ii) Function U (x, y, z) is continuous in x, y, and z (being sum of log functions). Let

g1 (x, y, z) = I − px−qy−rz, g2 (x, y, z) = k−x, g3 (x, y, z) = x, g4 (x, y, z) = y, g5 (x, y, z) =

z are linear and hence continuous functions.

It is possible to infer that f , g j ( j = 1, · · · , 5) are twice continuously differentiable on

X and the set X is convex.

(iii) Function U (x, y, z) is concave as

[ ]

∇ f (x, y, z) = a

x

b

y

c

z

− x2

a

0 0

0

H f (x, y, z) = 0 − yb2 .

0 0 − z2 c

The determinant of leading principal minors of order one is − xa2 < 0; of leading prin-

cipal minor of order two is xab

2 y2 > 0; and of leading principal minor of order three

31. Solution to PS 8 291

is − x2abc y2 z2

< 0 for ∀ (x, y, z) ∈ X. Hence f is concave. Further, g j ( j = 1, · · · , 5) are

concave being linear functions.

Hence all conditions of Kuhn -Tucker sufficiency theorem are satisfied. We need to find pair

((x∗ , y∗ , z∗ ) , λ∗ ) ∈ X × R5+ , that satisfies the Kuhn-Tucker conditions.

5

(i) Di f (x∗ , y∗ , z∗ ) + ∑ λ∗j Di g j (x∗ , y∗ , z∗ ) = 0; i = 1, · · · , 3,

j=1

j ∗ ∗ ∗

They are

(31.29) a

x − λ1 p − λ2 + λ3 = 0

(31.30) b

y − λ1 q + λ4 = 0

(31.31) c

z − λ1 r + λ5 = 0

(31.32) I − px − qy − rz > 0, λ1 (I − px − qy − rz) = 0

(31.33) k − x > 0, λ2 (k − x) = 0

(31.34) x > 0, λ3 x = 0; y > 0, λ4 y = 0, z > 0, λ5 z = 0.

If λ1 = 0, then by − λ1 q + λ4 = 0 → λ4 = − by < 0 which contradicts λ4 > 0. Hence

λ1 > 0 → I − px − qy − rz = 0

Also, x > 0, y > 0, and z > 0 for the three FOC to hold with equality. Thus λ3 = 0 = λ4 = λ5 .

(i) If λ2 > 0 then x = k, and

b c

I − pk = qy + rz = + .

λ1 λ1

Thus λ1 = b+c

I−pk which leads to

b(I − pk) c(I − pk)

y= and z = .

q(b + c) q(b + c)

We need to verify λ2 > 0 which will hold if λ2 = ak − (b+c)p

I−pk > 0 or

a pk

> .

b + c I − pk

(ii) If λ2 = 0, then

aI bI cI I

x= ;y = ;z = ; λ1 =

p(a + b + c) q(a + b + c) r(a + b + c) (a + b + c)

satisfies the KT conditions (Please verify).

(b)

a pk

> .

b + c I − pk

292 31. Solution to PS 8

(c)

b(I−pk)

py (b+c) b

= c(I−pk)

= .

rz c

(b+c)

(d) No, it is more likely that one buys more rice and less butter.

Bibliography

Bridges, D., Ray, M., 1984. What is constructive mathematics. The Mathematical Intelligencer

6 (4), 32–38.

Dixit, A. K., 1990. Optimization in Economic Theory, 2nd Edition. Oxford University Press, USA.

Mas-Colell, A., Whinston, M., Green, J., 1995. Microeconomic Theory. Oxford University Press,

USA.

Mitra, T., 2013. Lectures on Mathematical Analysis for Economists. Campus Book Store.

Olsen, L., October 2004. A new proof of Darboux’s Theorem. The American Mathematical

Monthly 111, 173–175.

Royden, H. L., 1988. Real Analysis. Prentice Hall.

Simon, C. P., Blume, L., 1994. Mathematics for Economists. W. W. Norton & Co., New York.

Stricharz, R., 2000. The Way of Analysis. Jones and Bartlett.

Wainwright, E. K., Chiang, A., 2005. Fundamental Methods of Mathematical Economics, 4th Edi-

tion. McGraw Hill, New York.

293

- Grade 10 Math ReviewUploaded bynob
- Grade 9 Math ReviewUploaded byCourseCentral
- MAJOR ReviewerUploaded byIyen Dalisay
- Reviewer MathUploaded byMel-jr Valencia
- IvyGlobal-SAT Math ReviewUploaded byJakBlack
- SAT/ ACT/ Accuplacer Math ReviewUploaded byaehsgo2college
- CPTCollegeMathReviewUploaded byMat0021
- Grade 8 | Math assessmentUploaded byCourier Journal
- DynaMath Reviewer for Basic AlgebraUploaded byAngeli98
- Math Review 2 AlgebraUploaded bynevers23
- Math ReviewUploaded byMauricio Gracia
- April Math AssessmentUploaded bymeemsickle
- Math Review NAT_20151010Uploaded bydramachines
- Writing and Math AssessmentUploaded by1224adh
- Math reviewUploaded bytwsttwst
- Math 120 Final Exam Review (3)Uploaded byBrent Matheny
- Percent Increase and DecreaseUploaded byMr. Peterson
- Let Review 2015Uploaded byJona Addatu
- LET General Math ReviewerUploaded byMarco Rhonel Eusebio
- 100796171-LET-Math-Final-Handout.pdfUploaded byjhonie busarang

- ch6V12Uploaded byAnand Kolam
- Main 2Uploaded byEihab Bashier
- Bridge to Higher MathematicsUploaded bySmurf
- GatesUploaded byVishal Paupiah
- Math LogicUploaded byJester Guballa de Leon
- Verilog TutorialUploaded byKulwant Nagi
- 2 - 3 - Lecture 2 - Logical Combinators (2654)Uploaded byhodgeheg9991234
- Detlovs Podnieks Math Logic yUploaded byjim
- A Comparison of VisualUploaded byPRINCESSSON
- Martin Löf Verificationism Then and NowUploaded byapplicative
- Pragmatics 4Uploaded bytartarin
- Computational PhysicsUploaded byhks18889dude
- Discrete MathsUploaded byMahreen Ilahi
- Gamut 1991 Vol 1Uploaded bymamamedusa
- The Schellingian Alternative to Hegel.. From Bulletin of the Hegel Society of Great BritainUploaded byandrewbowie
- Digital Logic GatesUploaded byClarissa Alarcon
- A Practical Theory of ProgrammingUploaded byAlex Muscar
- Whats So Bad About ContradictionsUploaded bykprotioru
- 07Uploaded bySayeth Saabith
- Wang et al (2013) quantum cognition.pdfUploaded byloguitos123
- Chapter 5 : PROGRAMMINGUploaded byquinnyqueen
- Langauge, Proof, And Symbolic LogicUploaded byZaq Mosher
- Propositional LogicUploaded byRobinfa
- Dummett "Wangs Paradox"Uploaded byapplicative
- The.mit.Press.essential.knowledge the.mind-Body.problem. J.westphal 2016Uploaded byAyala Allan
- Plithogeny, Plithogenic Set, Logic, Probability, and StatisticsUploaded byAnonymous 0U9j6BLllB
- Meaning as an inferential role.pdfUploaded byjamesmacmillan
- CourseUploaded byMinh Minhh
- Knowledge RepresentationUploaded byvishakha_18
- Church-Review Turing 1936Uploaded bylosolores

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.